Estimate standardized regression coefficients for all variables
standardize.RdThis is brain-dead standardization of all variables in the design matrix. It mimics the silly output of SPSS, which standardizes all regressors, even if they represent categorical variables.
See also
meanCenter which will center or
re-scale only numberic variables
Author
Paul Johnson pauljohn@ku.edu
Examples
library(rockchalk)
N <- 100
dat <- genCorrelatedData(N = N, means = c(100,200), sds = c(20,30), rho = 0.4, stde = 10)
dat$x3 <- rnorm(100, m = 40, s = 4)
m1 <- lm(y ~ x1 + x2 + x3, data = dat)
summary(m1)
#>
#> Call:
#> lm(formula = y ~ x1 + x2 + x3, data = dat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -20.8341 -6.6154 0.2133 6.4577 19.6368
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.88309 11.19720 -0.168 0.866800
#> x1 0.18380 0.04849 3.790 0.000263 ***
#> x2 0.22303 0.03302 6.754 1.1e-09 ***
#> x3 -0.08584 0.24056 -0.357 0.722010
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 8.9 on 96 degrees of freedom
#> Multiple R-squared: 0.5375, Adjusted R-squared: 0.523
#> F-statistic: 37.19 on 3 and 96 DF, p-value: 4.913e-16
#>
m1s <- standardize(m1)
summary(m1s)
#> All variables in the model matrix and the dependent variable
#> were centered. The centered variables have the letter "s" appended to their
#> non-centered counterparts, even constructed
#> variables like `x1:x2` and poly(x1,2). We agree, that's probably
#> ill-advised, but you asked for it by running standardize().
#>
#> The rockchalk function meanCenter is a smarter option, probably.
#>
#> The summary statistics of the variables in the design matrix.
#> mean std.dev.
#> ys 0 1
#> x1s 0 1
#> x2s 0 1
#> x3s 0 1
#>
#> Call:
#> lm(formula = ys ~ -1 + x1s + x2s + x3s, data = stddat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.61676 -0.51337 0.01655 0.50113 1.52385
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1s 0.30226 0.07933 3.810 0.000244 ***
#> x2s 0.53727 0.07914 6.789 9.04e-10 ***
#> x3s -0.02484 0.06925 -0.359 0.720619
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.6871 on 97 degrees of freedom
#> Multiple R-squared: 0.5375, Adjusted R-squared: 0.5232
#> F-statistic: 37.57 on 3 and 97 DF, p-value: 3.358e-16
#>
m2 <- lm(y ~ x1 * x2 + x3, data = dat)
summary(m2)
#>
#> Call:
#> lm(formula = y ~ x1 * x2 + x3, data = dat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -21.0026 -7.1682 0.4095 6.5995 19.1005
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 35.858757 26.471508 1.355 0.179
#> x1 -0.201687 0.250069 -0.807 0.422
#> x2 0.040366 0.120809 0.334 0.739
#> x3 -0.135943 0.240861 -0.564 0.574
#> x1:x2 0.001936 0.001233 1.571 0.120
#>
#> Residual standard error: 8.832 on 95 degrees of freedom
#> Multiple R-squared: 0.5492, Adjusted R-squared: 0.5302
#> F-statistic: 28.93 on 4 and 95 DF, p-value: 9.935e-16
#>
m2s <- standardize(m2)
summary(m2s)
#> All variables in the model matrix and the dependent variable
#> were centered. The centered variables have the letter "s" appended to their
#> non-centered counterparts, even constructed
#> variables like `x1:x2` and poly(x1,2). We agree, that's probably
#> ill-advised, but you asked for it by running standardize().
#>
#> The rockchalk function meanCenter is a smarter option, probably.
#>
#> The summary statistics of the variables in the design matrix.
#> mean std.dev.
#> ys 0 1
#> x1s 0 1
#> x2s 0 1
#> x3s 0 1
#> `x1:x2s` 0 1
#>
#> Call:
#> lm(formula = ys ~ -1 + x1s + x2s + x3s + `x1:x2s`, data = stddat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.62984 -0.55626 0.03178 0.51213 1.48224
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1s -0.33168 0.40910 -0.811 0.420
#> x2s 0.09724 0.28951 0.336 0.738
#> x3s -0.03934 0.06933 -0.567 0.572
#> `x1:x2s` 0.93893 0.59459 1.579 0.118
#>
#> Residual standard error: 0.6818 on 96 degrees of freedom
#> Multiple R-squared: 0.5492, Adjusted R-squared: 0.5304
#> F-statistic: 29.24 on 4 and 96 DF, p-value: 6.738e-16
#>
m2c <- meanCenter(m2)
summary(m2c)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c"
#> The centers and scale factors were
#> x1c x2c
#> mean 100.2225 200.6972
#> scale 1.0000 1.0000
#> The summary statistics of the variables in the design matrix (after centering).
#> mean std.dev.
#> y 57.8482 12.8863
#> x1c 0.0000 21.1921
#> x2c 0.0000 31.0429
#> x3 40.1893 3.7289
#> x1c:x2c 318.0867 753.2115
#>
#> The following results were produced from:
#> meanCenter.default(model = m2)
#>
#> Call:
#> lm(formula = y ~ x1c * x2c + x3, data = stddat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -21.0026 -7.1682 0.4095 6.5995 19.1005
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 62.695755 9.676328 6.479 4.07e-09 ***
#> x1c 0.186941 0.048167 3.881 0.000192 ***
#> x2c 0.234436 0.033568 6.984 3.89e-10 ***
#> x3 -0.135943 0.240861 -0.564 0.573807
#> x1c:x2c 0.001936 0.001233 1.571 0.119537
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 8.832 on 95 degrees of freedom
#> Multiple R-squared: 0.5492, Adjusted R-squared: 0.5302
#> F-statistic: 28.93 on 4 and 95 DF, p-value: 9.935e-16
#>