catscore.Rdcatscore computes CAT scores
(correlation-adjusted t-scores)
between the group centroids and the pooled mean.
catscore(Xtrain, L, lambda, lambda.var, lambda.freqs, diagonal=FALSE, verbose=TRUE)A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables.
A factor with the class labels of the training samples.
Shrinkage intensity for the correlation matrix. If not specified it is
estimated from the data. lambda=0 implies no shrinkage
and lambda=1 complete shrinkage.
Shrinkage intensity for the variances. If not specified it is
estimated from the data. lambda.var=0 implies no shrinkage
and lambda.var=1 complete shrinkage.
Shrinkage intensity for the frequencies. If not specified it is
estimated from the data. lambda.freqs=0 implies no shrinkage (i.e. empirical frequencies)
and lambda.freqs=1 complete shrinkage (i.e. uniform frequencies).
for diagonal=FALSE (the default) CAT scores are computed;
otherwise with diagonal=TRUE t-scores.
Print out some info while computing.
CAT scores generalize conventional t-scores to account for correlation among predictors (Zuber and Strimmer 2009). If there is no correlation then CAR scores reduce to t-scores. The squared CAR scores provide a decomposition of Hotelling's T^2 statistic.
CAT scores for two classes are described in Zuber and Strimmer (2009), for the multi-class case see Ahdesm\"aki and Strimmer (2010).
The scale factors for t-scores and CAT-scores are computed from the estimated frequencies
(for empirical scale factors set lambda.freqs=0).
catscore returns a matrix containing the cat score (or t-score) between
each group centroid and the pooled mean for each feature.
Ahdesm\"aki, A., and K. Strimmer. 2010. Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Ann. Appl. Stat. 4: 503-519. <DOI:10.1214/09-AOAS277>
Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. Bioinformatics 25: 2700-2707. <DOI:10.1093/bioinformatics/btp460>
sda.ranking, carscore,.
# load sda library
library("sda")
#################
# training data #
#################
# prostate cancer set
data(singh2002)
# training data
Xtrain = singh2002$x
Ytrain = singh2002$y
dim(Xtrain)
#> [1] 102 6033
####################################################
# shrinkage t-score (DDA setting - no correlation) #
####################################################
tstat = catscore(Xtrain, Ytrain, diagonal=TRUE)
#> Computing t-scores (centroid vs. pooled mean) for feature ranking
#>
#> Number of variables: 6033
#> Number of observations: 102
#> Number of classes: 2
#>
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.205
#>
dim(tstat)
#> [1] 6033 2
tstat[1:10,]
#> t.cancer t.healthy
#> [1,] 1.56498847 -1.56498847
#> [2,] 3.69436202 -3.69436202
#> [3,] -0.02852474 0.02852474
#> [4,] -1.17034799 1.17034799
#> [5,] -0.14715946 0.14715946
#> [6,] 0.99101324 -0.99101324
#> [7,] 1.09364887 -1.09364887
#> [8,] -1.33904912 1.33904912
#> [9,] -1.28549943 1.28549943
#> [10,] 1.24699390 -1.24699390
########################################################
# shrinkage CAT score (LDA setting - with correlation) #
########################################################
cat = catscore(Xtrain, Ytrain, diagonal=FALSE)
#> Computing cat scores (centroid vs. pooled mean) for feature ranking
#>
#> Number of variables: 6033
#> Number of observations: 102
#> Number of classes: 2
#>
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.205
#>
#> Computing the square root of the inverse pooled correlation matrix
#> Estimating optimal shrinkage intensity lambda (correlation matrix): 0.8924
dim(cat)
#> [1] 6033 2
cat[1:10,]
#> cat.cancer cat.healthy
#> [1,] 1.7582601 -1.7582601
#> [2,] 3.9256188 -3.9256188
#> [3,] -0.3554059 0.3554059
#> [4,] -0.7833080 0.7833080
#> [5,] 0.3337062 -0.3337062
#> [6,] 0.9690564 -0.9690564
#> [7,] 0.8483530 -0.8483530
#> [8,] -1.4097603 1.4097603
#> [9,] -1.0285736 1.0285736
#> [10,] 1.3309377 -1.3309377