Skip to contents

scale_data_frame centers and/or scales the columns of a data frame (or matrix).

Usage

scale_data_frame(x, center = TRUE, scale = TRUE)

Arguments

x

a data frame or a numeric matrix (or vector). For matrices or vectors, scale() is used.

center

either a logical value or numeric-alike vector of length equal to the number of columns of x, where ‘numeric-alike’ means that as.numeric(.) will be applied successfully if is.numeric(.) is not true.

scale

either a logical value or a numeric-alike vector of length equal to the number of columns of x.

Value

For scale.default, the centered, scaled data frame. Non-numeric columns are ignored. Note that logicals are treated as 0/1-numerics to be consistent with scale(). The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale" - but only for the numeric/logical columns.

Details

The value of center determines how column centering is performed. If center is a numeric-alike vector with length equal to the number of numeric/logical columns of x, then each column of x has the corresponding value from center subtracted from it. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.

The value of scale determines how column scaling is performed (after centering). If scale is a numeric-alike vector with length equal to the number of numeric/logiocal columns of x, then each column of x is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done.

The root-mean-square for a (possibly centered) column is defined as \(\sqrt{\sum(x^2)/(n-1)}\), where \(x\) is a vector of the non-missing values and \(n\) is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See also

sweep which allows centering (and scaling) with arbitrary statistics.

Examples

require(stats)
data(iris)
summary(scale_data_frame(iris))
#>   Sepal.Length       Sepal.Width       Petal.Length      Petal.Width     
#>  Min.   :-1.86378   Min.   :-2.4258   Min.   :-1.5623   Min.   :-1.4422  
#>  1st Qu.:-0.89767   1st Qu.:-0.5904   1st Qu.:-1.2225   1st Qu.:-1.1799  
#>  Median :-0.05233   Median :-0.1315   Median : 0.3354   Median : 0.1321  
#>  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
#>  3rd Qu.: 0.67225   3rd Qu.: 0.5567   3rd Qu.: 0.7602   3rd Qu.: 0.7880  
#>  Max.   : 2.48370   Max.   : 3.0805   Max.   : 1.7799   Max.   : 1.7064  
#>        Species  
#>  setosa    :50  
#>  versicolor:50  
#>  virginica :50  
#>                 
#>                 
#>