Various (Regularized) t Statistics
regularizedt.RdThese functions provide a simple interface to a variety of (regularized) t statistics that are commonly used in the analysis of high-dimensional case-control studies.
Arguments
- X
data matrix. Note that the columns correspond to variables (“genes”) and the rows to samples.
- L
factor containing class labels for the two groups.
- method
determines how the smoothing parameter is estimated (applies only to improved SAM statistic
samL1).- plot
output diagnostic plot (applies only to improved SAM statistic
samL1).- verbose
print out some (more or less useful) information during computation.
Details
efront.* computes the t statistic using the 90 % rule of Efron et al. (2001).
sam.* computes the SAM t statistic of Tusher et al. (2001).
Note that this requires the additional installation of the “samr” package.
samL1.* computes the improved SAM t statistic of Wu (2005).
Note that part of the code in this function is based on the R code providec
by B. Wu.
modt.* computes the moderated t statistic of Smyth (2004).
Note that this requires the additional installation of the “limma” package.
All the above statistics are compared relative to each other and relative to the shrinkage t statistic in Opgen-Rhein and Strimmer (2007).
Value
The *.stat functions directly return the respective statistic for each variable.
The corresponding *.fun functions return a function that produces the respective statistics when applied to a data matrix (this is very useful for simulations).
References
Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Statist. Appl. Genet. Mol. Biol. 6:9. <DOI:10.2202/1544-6115.1252>
Author
Rainer Opgen-Rhein and Korbinian Strimmer (https://strimmerlab.github.io).
Examples
# load st library
library("st")
# load Choe et al. (2005) data
data(choedata)
X <- choe2.mat
dim(X) # 6 11475
#> [1] 6 11475
L <- choe2.L
L
#> [1] 1 1 1 2 2 2
# L may also contain some real labels
L = c("group 1", "group 1", "group 1", "group 2", "group 2", "group 2")
# Efron t statistic (90 % rule)
score = efront.stat(X, L)
#> Number of variables: 11475
#> Number of observations: 6
#> Number of classes: 2
#>
#> Specified shrinkage intensity lambda.freq (frequencies): 0
#> Estimating variances (pooled across classes)
#> Specified shrinkage intensity lambda.var (variance vector): 0
#>
#> Fudge factor a0 = 0.08747989
order(score^2, decreasing=TRUE)[1:10]
#> [1] 4790 10979 11068 1022 50 724 5762 43 10936 9939
# [1] 4790 10979 11068 1022 50 724 5762 43 10936 9939
# sam statistic
# (requires "samr" package)
#score = sam.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
#[1] 4790 10979 1022 5762 35 970 50 11068 10905 2693
# improved sam statistic
#score = samL1.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
#[1] 1 2 3 4 5 6 7 8 9 10
# here all scores are zero!
# moderated t statistic
# (requires "limma" package)
#score = modt.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
# [1] 4790 10979 1022 5762 35 50 11068 970 10905 43
# shrinkage t statistic
score = shrinkt.stat(X, L)
#> Number of variables: 11475
#> Number of observations: 6
#> Number of classes: 2
#>
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.3882
#>
order(score^2, decreasing=TRUE)[1:10]
#> [1] 10979 11068 50 1022 724 5762 43 4790 10936 9939
#[1] 10979 11068 50 1022 724 5762 43 4790 10936 9939