Skip to contents

This is brain-dead standardization of all variables in the design matrix. It mimics the silly output of SPSS, which standardizes all regressors, even if they represent categorical variables.

Usage

standardize(model)

# S3 method for class 'lm'
standardize(model)

Arguments

model

a fitted lm object

Value

an lm fitted with the standardized variables

a standardized regression object

See also

meanCenter which will center or re-scale only numberic variables

Author

Paul Johnson pauljohn@ku.edu

Examples


library(rockchalk)
N <- 100
dat <- genCorrelatedData(N = N, means = c(100,200), sds = c(20,30), rho = 0.4, stde = 10)
dat$x3 <- rnorm(100, m = 40, s = 4)

m1 <- lm(y ~ x1 + x2 + x3, data = dat)
summary(m1)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -20.8341  -6.6154   0.2133   6.4577  19.6368 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -1.88309   11.19720  -0.168 0.866800    
#> x1           0.18380    0.04849   3.790 0.000263 ***
#> x2           0.22303    0.03302   6.754  1.1e-09 ***
#> x3          -0.08584    0.24056  -0.357 0.722010    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 8.9 on 96 degrees of freedom
#> Multiple R-squared:  0.5375,	Adjusted R-squared:  0.523 
#> F-statistic: 37.19 on 3 and 96 DF,  p-value: 4.913e-16
#> 

m1s <- standardize(m1)
summary(m1s)
#> All variables in the model matrix and the dependent variable
#> were centered. The centered variables have the letter "s" appended to their
#> non-centered counterparts, even constructed
#> variables like `x1:x2` and poly(x1,2). We agree, that's probably
#> ill-advised, but you asked for it by running standardize().
#> 
#> The rockchalk function meanCenter is a smarter option, probably. 
#> 
#> The summary statistics of the variables in the design matrix. 
#>     mean std.dev.
#> ys     0        1
#> x1s    0        1
#> x2s    0        1
#> x3s    0        1
#> 
#> Call:
#> lm(formula = ys ~ -1 + x1s + x2s + x3s, data = stddat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.61676 -0.51337  0.01655  0.50113  1.52385 
#> 
#> Coefficients:
#>     Estimate Std. Error t value Pr(>|t|)    
#> x1s  0.30226    0.07933   3.810 0.000244 ***
#> x2s  0.53727    0.07914   6.789 9.04e-10 ***
#> x3s -0.02484    0.06925  -0.359 0.720619    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.6871 on 97 degrees of freedom
#> Multiple R-squared:  0.5375,	Adjusted R-squared:  0.5232 
#> F-statistic: 37.57 on 3 and 97 DF,  p-value: 3.358e-16
#> 



m2 <- lm(y ~ x1 * x2 + x3, data = dat)
summary(m2)
#> 
#> Call:
#> lm(formula = y ~ x1 * x2 + x3, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -21.0026  -7.1682   0.4095   6.5995  19.1005 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 35.858757  26.471508   1.355    0.179
#> x1          -0.201687   0.250069  -0.807    0.422
#> x2           0.040366   0.120809   0.334    0.739
#> x3          -0.135943   0.240861  -0.564    0.574
#> x1:x2        0.001936   0.001233   1.571    0.120
#> 
#> Residual standard error: 8.832 on 95 degrees of freedom
#> Multiple R-squared:  0.5492,	Adjusted R-squared:  0.5302 
#> F-statistic: 28.93 on 4 and 95 DF,  p-value: 9.935e-16
#> 

m2s <- standardize(m2)
summary(m2s)
#> All variables in the model matrix and the dependent variable
#> were centered. The centered variables have the letter "s" appended to their
#> non-centered counterparts, even constructed
#> variables like `x1:x2` and poly(x1,2). We agree, that's probably
#> ill-advised, but you asked for it by running standardize().
#> 
#> The rockchalk function meanCenter is a smarter option, probably. 
#> 
#> The summary statistics of the variables in the design matrix. 
#>          mean std.dev.
#> ys          0        1
#> x1s         0        1
#> x2s         0        1
#> x3s         0        1
#> `x1:x2s`    0        1
#> 
#> Call:
#> lm(formula = ys ~ -1 + x1s + x2s + x3s + `x1:x2s`, data = stddat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.62984 -0.55626  0.03178  0.51213  1.48224 
#> 
#> Coefficients:
#>          Estimate Std. Error t value Pr(>|t|)
#> x1s      -0.33168    0.40910  -0.811    0.420
#> x2s       0.09724    0.28951   0.336    0.738
#> x3s      -0.03934    0.06933  -0.567    0.572
#> `x1:x2s`  0.93893    0.59459   1.579    0.118
#> 
#> Residual standard error: 0.6818 on 96 degrees of freedom
#> Multiple R-squared:  0.5492,	Adjusted R-squared:  0.5304 
#> F-statistic: 29.24 on 4 and 96 DF,  p-value: 6.738e-16
#> 

m2c <- meanCenter(m2)
summary(m2c)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c"
#> The centers and scale factors were 
#>            x1c      x2c
#> mean  100.2225 200.6972
#> scale   1.0000   1.0000
#> The summary statistics of the variables in the design matrix (after centering). 
#>             mean std.dev.
#> y        57.8482  12.8863
#> x1c       0.0000  21.1921
#> x2c       0.0000  31.0429
#> x3       40.1893   3.7289
#> x1c:x2c 318.0867 753.2115
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m2)
#> 
#> Call:
#> lm(formula = y ~ x1c * x2c + x3, data = stddat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -21.0026  -7.1682   0.4095   6.5995  19.1005 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 62.695755   9.676328   6.479 4.07e-09 ***
#> x1c          0.186941   0.048167   3.881 0.000192 ***
#> x2c          0.234436   0.033568   6.984 3.89e-10 ***
#> x3          -0.135943   0.240861  -0.564 0.573807    
#> x1c:x2c      0.001936   0.001233   1.571 0.119537    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 8.832 on 95 degrees of freedom
#> Multiple R-squared:  0.5492,	Adjusted R-squared:  0.5302 
#> F-statistic: 28.93 on 4 and 95 DF,  p-value: 9.935e-16
#>