stdize.Rdstdize standardizes variables by centring and scaling.
stdizeFit modifies a model call or existing model to use standardized
variables.
# Default S3 method
stdize(x, center = TRUE, scale = TRUE, ...)
# S3 method for class 'logical'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = FALSE, ...)
## also for two-level factors
# S3 method for class 'data.frame'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, source = NULL,
prefix = TRUE, append = FALSE, ...)
# S3 method for class 'formula'
stdize(x, data = NULL, response = FALSE,
binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, prefix = TRUE,
append = FALSE, ...)
stdizeFit(object, newdata, which = c("formula", "subset", "offset", "weights",
"fixed", "random", "model"), evaluate = TRUE, quote = NA)a numeric or logical vector, factor, numeric matrix,
data.frame or a formula.
either a logical value or a logical or numeric vector
of length equal to the number of columns of x (see
‘Details’). scale can be also a function to use for
scaling.
specifies how binary variables (logical or two-level factors)
are scaled. Default is to "center" by subtracting the mean
assuming levels are equal to 0 and 1; use "scale" to
both centre and scale by SD, "binary" to centre to 0 /
1, "half" to centre to -0.5 / 0.5, and "omit" to leave
binary variables unmodified.
This argument has precedence over center and scale, unless
it is set to NA (in which case binary variables are treated like
numeric variables).
a reference data.frame, being a result of previous
stdize, from which scale and center values are
taken. Column names are matched. This can be used for scaling new data
using statistics of another data.
column names or numeric indices of columns that should be left unaltered.
either a logical value specifying whether the names of transformed columns should be prefixed, or a two-element character vector giving the prefixes. The prefixes default to “z.” for scaled and “c.” for centred variables.
logical, if TRUE, modified columns are appended to the
original data frame.
logical, stating whether the response should be standardized. By default, only variables on the right-hand side of the formula are standardized.
an object coercible to data.frame, containing the
variables in formula. Passed to, and used by model.frame.
a data.frame returned by stdize, to be used
by the modified model.
for the formula method, additional arguments passed to
model.frame. For other methods, it is silently ignored.
a fitted model object or an expression being a call to
the modelling function.
a character string naming arguments which should be modified.
This should be all arguments which are evaluated in the data
environment. Can be also TRUE to modify the expression as a
whole. The data argument is additionally replaced with that
passed to stdizeFit.
if TRUE, the modified call is evaluated and the
fitted model object is returned.
if TRUE, avoids evaluating object. Equivalent to
stdizeFit(quote(expr), ...). Defaults to NA in which case
object being a call to non-primitive function is quoted.
stdize returns a vector or object of the same dimensions as x,
where the values are centred and/or scaled. Transformation is carried out
column-wise in data.frames and matrices.
The returned value is compatible with that of scale in that the
numeric centring and scalings used are stored in attributes
"scaled:center" and "scaled:scale" (these can be NA if no
centring or scaling has been done).
stdizeFit returns a modified, fitted model object that uses transformed
variables from newdata, or, if evaluate is FALSE, an
unevaluated call where the variable names are replaced to point the transformed
variables.
stdize resembles scale, but uses special rules
for factors, similarly to standardize in package arm.
stdize differs from standardize in that it is used on
data rather than on the fitted model object. The scaled data should afterwards
be passed to the modelling function, instead of the original data.
Unlike standardize, it applies special ‘binary’ scaling only to
two-level factors and logical variables, rather than to any variable with
two unique values.
Variables of only one unique value are unchanged.
By default, stdize scales by dividing by standard deviation rather than twice
the SD as standardize does. Scaling by SD is used
also on uncentred values, which is different from scale where
root-mean-square is used.
If center or scale are logical scalars or vectors of length equal
to the number of columns of x, the centring is done by subtracting the
mean (if center corresponding to the column is TRUE), and scaling
is done by dividing the (centred) value by standard deviation (if corresponding
scale is TRUE).
If center or scale are numeric vectors with length equal
to the number of columns of x (or numeric scalars for vector methods),
then these are used instead. Any NAs in the numeric vector result in no
centring or scaling on the corresponding column.
Note that scale = 0 is equivalent to no scaling (i.e. scale = 1).
Binary variables, logical or factors with two levels, are converted to
numeric variables and transformed according to the argument binary,
unless center or scale are explicitly given.
Gelman, A. 2008 Scaling regression inputs by dividing by two standard deviations. Statistics in medicine 27, 2865–2873.
Compare with scale and standardize or
rescale (the latter two in package arm).
For typical standardizing, model coefficients transformation may be easier, see std.coef.
apply and sweep for arbitrary transformations of
columns in a data.frame.