Various (Regularized) t Statistics

These functions provide a simple interface to a variety of (regularized) t statistics that are commonly used in the analysis of high-dimensional case-control studies.

Usage

efront.stat(X, L, verbose=TRUE)
efront.fun(L, verbose=TRUE)
sam.stat(X, L)
sam.fun(L)
samL1.stat(X, L, method=c("lowess", "cor"), plot=FALSE, verbose=TRUE)
samL1.fun(L, method=c("lowess", "cor"), plot=FALSE, verbose=TRUE)
modt.stat(X, L)
modt.fun(L)

Arguments

X: data matrix. Note that the columns correspond to variables (“genes”) and the rows to samples.
L: factor containing class labels for the two groups.
method: determines how the smoothing parameter is estimated (applies only to improved SAM statistic samL1).
plot: output diagnostic plot (applies only to improved SAM statistic samL1).
verbose: print out some (more or less useful) information during computation.

Details

efront.* computes the t statistic using the 90 % rule of Efron et al. (2001).

sam.* computes the SAM t statistic of Tusher et al. (2001). Note that this requires the additional installation of the “samr” package.

samL1.* computes the improved SAM t statistic of Wu (2005). Note that part of the code in this function is based on the R code providec by B. Wu.

modt.* computes the moderated t statistic of Smyth (2004). Note that this requires the additional installation of the “limma” package.

All the above statistics are compared relative to each other and relative to the shrinkage t statistic in Opgen-Rhein and Strimmer (2007).

Value

The *.stat functions directly return the respective statistic for each variable.

The corresponding *.fun functions return a function that produces the respective statistics when applied to a data matrix (this is very useful for simulations).

References

Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Statist. Appl. Genet. Mol. Biol. 6:9. <DOI:10.2202/1544-6115.1252>

Author

Rainer Opgen-Rhein and Korbinian Strimmer (https://strimmerlab.github.io).

Examples

# load st library 
library("st")

# load Choe et al. (2005) data
data(choedata)
X <- choe2.mat
dim(X) # 6 11475  
#> [1]     6 11475
L <- choe2.L
L
#> [1] 1 1 1 2 2 2

# L may also contain some real labels
L = c("group 1", "group 1", "group 1", "group 2", "group 2", "group 2")


# Efron t statistic (90 % rule)
score = efront.stat(X, L)
#> Number of variables: 11475 
#> Number of observations: 6 
#> Number of classes: 2 
#> 
#> Specified shrinkage intensity lambda.freq (frequencies): 0 
#> Estimating variances (pooled across classes)
#> Specified shrinkage intensity lambda.var (variance vector): 0 
#> 
#> Fudge factor a0 = 0.08747989 
order(score^2, decreasing=TRUE)[1:10]
#>  [1]  4790 10979 11068  1022    50   724  5762    43 10936  9939
# [1]  4790 10979 11068  1022    50   724  5762    43 10936  9939

# sam statistic
# (requires "samr" package)
#score = sam.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
#[1]  4790 10979  1022  5762    35   970    50 11068 10905  2693

# improved sam statistic
#score = samL1.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
#[1]  1  2  3  4  5  6  7  8  9 10
# here all scores are zero!

# moderated t statistic
# (requires "limma" package)
#score = modt.stat(X, L)
#order(score^2, decreasing=TRUE)[1:10]
# [1]  4790 10979  1022  5762    35    50 11068   970 10905    43

# shrinkage t statistic
score = shrinkt.stat(X, L)
#> Number of variables: 11475 
#> Number of observations: 6 
#> Number of classes: 2 
#> 
#> Estimating optimal shrinkage intensity lambda.freq (frequencies): 1 
#> Estimating variances (pooled across classes)
#> Estimating optimal shrinkage intensity lambda.var (variance vector): 0.3882 
#> 
order(score^2, decreasing=TRUE)[1:10]
#>  [1] 10979 11068    50  1022   724  5762    43  4790 10936  9939
#[1] 10979 11068    50  1022   724  5762    43  4790 10936  9939