Variance estimates for each row (column) in a matrix

Variance estimates for each row (column) in a matrix.

rowVars(x, rows = NULL, cols = NULL, na.rm = FALSE, refine = TRUE,
  center = NULL, dim. = dim(x), ..., useNames = TRUE)

colVars(x, rows = NULL, cols = NULL, na.rm = FALSE, refine = TRUE,
  center = NULL, dim. = dim(x), ..., useNames = TRUE)

Arguments

x: An NxK matrix or, if dim. is specified, an N * K vector.
rows: A vector indicating subset of rows to operate over. If NULL, no subsetting is done.
cols: A vector indicating subset of columns to operate over. If NULL, no subsetting is done.
na.rm: If TRUE, missing values are excluded.
refine: If TRUE, `center` is NULL, and x is numeric, then extra effort is used to calculate the average with greater numerical precision, otherwise not.
center: (optional; a vector or length N (K)) If the row (column) means are already estimated, they can be pre-specified using this argument. This avoid re-estimating them again. _Warning: It is important that a non-biased sample mean estimate is passed. If not, then the variance estimate of the spread will also be biased._ If NULL (default), the row/column means are estimated internally.
dim.: An integer vector of length two specifying the dimension of x, also when not a matrix. Comment: The reason for this argument being named with a period at the end is purely technical (we get a run-time error if we try to name it dim).
...: Additional arguments passed to rowMeans() and rowSums().
useNames: If TRUE (default), names attributes of the result are set, otherwise not.

Value

Returns a numeric vector of length N (K).

Providing center estimates

The sample variance is estimated as

\(n/(n-1) * mean((x - center)^2)\),

where \(center\) is estimated as the sample mean, by default. In matrixStats (< 0.58.0),

\(n/(n-1) * (mean(x^2) - center^2)\)

was used. Both formulas give the same result _when_ `center` is the sample mean estimate.

Argument `center` can be used to provide an already existing estimate. It is important that the sample mean estimate is passed. If not, then the variance estimate of the spread will be biased.

For the time being, in order to lower the risk for such mistakes, argument `center` is occasionally validated against the sample-mean estimate. If a discrepancy is detected, an informative error is provided to prevent incorrect variance estimates from being used. For performance reasons, this check is only performed once every 50 times. The frequency can be controlled by R option `matrixStats.vars.formula.freq`, whose default can be set by environment variable `R_MATRIXSTATS_VARS_FORMULA_FREQ`.

Author

Henrik Bengtsson

Examples

set.seed(1)

x <- matrix(rnorm(20), nrow = 5, ncol = 4)
print(x)
#>            [,1]       [,2]       [,3]        [,4]
#> [1,] -0.6264538 -0.8204684  1.5117812 -0.04493361
#> [2,]  0.1836433  0.4874291  0.3898432 -0.01619026
#> [3,] -0.8356286  0.7383247 -0.6212406  0.94383621
#> [4,]  1.5952808  0.5757814 -2.2146999  0.82122120
#> [5,]  0.3295078 -0.3053884  1.1249309  0.59390132

# Row averages
print(rowMeans(x))
#> [1] 0.004981341 0.261181337 0.056322931 0.194395865 0.435737906
print(rowMedians(x))
#> [1] -0.33569371  0.28674328  0.05854206  0.69850127  0.46170455

# Column averages
print(colMeans(x))
#> [1] 0.12926990 0.13513567 0.03812297 0.45956697
print(colMedians(x))
#> [1] 0.1836433 0.4874291 0.3898432 0.5939013


# Row variabilities
print(rowVars(x))
#> [1] 1.11767161 0.05022969 0.83582537 2.76819528 0.35351857
print(rowSds(x))
#> [1] 1.0571999 0.2241198 0.9142349 1.6637894 0.5945743
print(rowMads(x))
#> [1] 0.5749039 0.2251964 1.1601914 0.7557549 0.5896472
print(rowIQRs(x))
#> [1] 1.0192025 0.2805548 1.4645402 1.1365751 0.5558750

# Column variabilities
print(rowVars(x))
#> [1] 1.11767161 0.05022969 0.83582537 2.76819528 0.35351857
print(colSds(x))
#> [1] 0.9610394 0.6688342 1.4988744 0.4648177
print(colMads(x))
#> [1] 1.2010500 0.3719779 1.4990329 0.5188135
print(colIQRs(x))
#> [1] 0.9559616 0.8811697 1.7461715 0.8374115

# Row ranges
print(rowRanges(x))
#>             [,1]      [,2]
#> [1,] -0.82046838 1.5117812
#> [2,] -0.01619026 0.4874291
#> [3,] -0.83562861 0.9438362
#> [4,] -2.21469989 1.5952808
#> [5,] -0.30538839 1.1249309
print(cbind(rowMins(x), rowMaxs(x)))
#>             [,1]      [,2]
#> [1,] -0.82046838 1.5117812
#> [2,] -0.01619026 0.4874291
#> [3,] -0.83562861 0.9438362
#> [4,] -2.21469989 1.5952808
#> [5,] -0.30538839 1.1249309
print(cbind(rowOrderStats(x, which = 1), rowOrderStats(x, which = ncol(x))))
#>             [,1]      [,2]
#> [1,] -0.82046838 1.5117812
#> [2,] -0.01619026 0.4874291
#> [3,] -0.83562861 0.9438362
#> [4,] -2.21469989 1.5952808
#> [5,] -0.30538839 1.1249309

# Column ranges
print(colRanges(x))
#>             [,1]      [,2]
#> [1,] -0.83562861 1.5952808
#> [2,] -0.82046838 0.7383247
#> [3,] -2.21469989 1.5117812
#> [4,] -0.04493361 0.9438362
print(cbind(colMins(x), colMaxs(x)))
#>             [,1]      [,2]
#> [1,] -0.83562861 1.5952808
#> [2,] -0.82046838 0.7383247
#> [3,] -2.21469989 1.5117812
#> [4,] -0.04493361 0.9438362
print(cbind(colOrderStats(x, which = 1), colOrderStats(x, which = nrow(x))))
#>             [,1]      [,2]
#> [1,] -0.83562861 1.5952808
#> [2,] -0.82046838 0.7383247
#> [3,] -2.21469989 1.5117812
#> [4,] -0.04493361 0.9438362


x <- matrix(rnorm(2000), nrow = 50, ncol = 40)

# Row standard deviations
d <- rowDiffs(x)
s1 <- rowSds(d) / sqrt(2)
s2 <- rowSds(x)
print(summary(s1 - s2))
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -0.19874 -0.04460  0.01411  0.01097  0.07339  0.27371 

# Column standard deviations
d <- colDiffs(x)
s1 <- colSds(d) / sqrt(2)
s2 <- colSds(x)
print(summary(s1 - s2))
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -0.13188 -0.02160  0.02329  0.01879  0.07141  0.15346