Estimate CAT Scores and t-Scores

catscore computes CAT scores (correlation-adjusted t-scores) between the group centroids and the pooled mean.

catscore(Xtrain, L, lambda, lambda.var, lambda.freqs, diagonal=FALSE, verbose=TRUE)

Arguments

Xtrain: A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables.
L: A factor with the class labels of the training samples.
lambda: Shrinkage intensity for the correlation matrix. If not specified it is estimated from the data. lambda=0 implies no shrinkage and lambda=1 complete shrinkage.
lambda.var: Shrinkage intensity for the variances. If not specified it is estimated from the data. lambda.var=0 implies no shrinkage and lambda.var=1 complete shrinkage.
lambda.freqs: Shrinkage intensity for the frequencies. If not specified it is estimated from the data. lambda.freqs=0 implies no shrinkage (i.e. empirical frequencies) and lambda.freqs=1 complete shrinkage (i.e. uniform frequencies).
diagonal: for diagonal=FALSE (the default) CAT scores are computed; otherwise with diagonal=TRUE t-scores.
verbose: Print out some info while computing.

Details

CAT scores generalize conventional t-scores to account for correlation among predictors (Zuber and Strimmer 2009). If there is no correlation then CAR scores reduce to t-scores. The squared CAR scores provide a decomposition of Hotelling's T^2 statistic.

CAT scores for two classes are described in Zuber and Strimmer (2009), for the multi-class case see Ahdesm\"aki and Strimmer (2010).

The scale factors for t-scores and CAT-scores are computed from the estimated frequencies (for empirical scale factors set lambda.freqs=0).

Value

catscore returns a matrix containing the cat score (or t-score) between each group centroid and the pooled mean for each feature.

References

Ahdesm\"aki, A., and K. Strimmer. 2010. Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Ann. Appl. Stat. 4: 503-519. <DOI:10.1214/09-AOAS277>

Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. Bioinformatics 25: 2700-2707. <DOI:10.1093/bioinformatics/btp460>

Author

Verena Zuber, Miika Ahdesm\"aki and Korbinian Strimmer (https://strimmerlab.github.io).

Examples

# load sda library
library("sda")

################# 
# training data #
#################

# prostate cancer set
data(singh2002)

# training data
Xtrain = singh2002$x
Ytrain = singh2002$y
dim(Xtrain)
#> [1]  102 6033


####################################################
# shrinkage t-score (DDA setting - no correlation) #
####################################################

tstat = catscore(Xtrain, Ytrain, diagonal=TRUE)
#> Computing t-scores (centroid vs. pooled mean) for feature ranking
#> 
#> Number of variables: 6033 
#> Number of observations: 102 
#> Number of classes: 2 
#> 
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1 
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.205 
#> 
dim(tstat)
#> [1] 6033    2
tstat[1:10,]
#>          t.cancer   t.healthy
#>  [1,]  1.56498847 -1.56498847
#>  [2,]  3.69436202 -3.69436202
#>  [3,] -0.02852474  0.02852474
#>  [4,] -1.17034799  1.17034799
#>  [5,] -0.14715946  0.14715946
#>  [6,]  0.99101324 -0.99101324
#>  [7,]  1.09364887 -1.09364887
#>  [8,] -1.33904912  1.33904912
#>  [9,] -1.28549943  1.28549943
#> [10,]  1.24699390 -1.24699390


########################################################
# shrinkage CAT score (LDA setting - with correlation) #
########################################################

cat = catscore(Xtrain, Ytrain, diagonal=FALSE)
#> Computing cat scores (centroid vs. pooled mean) for feature ranking
#> 
#> Number of variables: 6033 
#> Number of observations: 102 
#> Number of classes: 2 
#> 
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1 
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.205 
#> 
#> Computing the square root of the inverse pooled correlation matrix
#> Estimating optimal shrinkage intensity lambda (correlation matrix): 0.8924 
dim(cat)
#> [1] 6033    2
cat[1:10,]
#>       cat.cancer cat.healthy
#>  [1,]  1.7582601  -1.7582601
#>  [2,]  3.9256188  -3.9256188
#>  [3,] -0.3554059   0.3554059
#>  [4,] -0.7833080   0.7833080
#>  [5,]  0.3337062  -0.3337062
#>  [6,]  0.9690564  -0.9690564
#>  [7,]  0.8483530  -0.8483530
#>  [8,] -1.4097603   1.4097603
#>  [9,] -1.0285736   1.0285736
#> [10,]  1.3309377  -1.3309377