Skip to contents

meanCenter selectively centers or standarizes variables in a regression model.

Usage

meanCenter(
  model,
  centerOnlyInteractors = TRUE,
  centerDV = FALSE,
  standardize = FALSE,
  terms = NULL
)

# Default S3 method
meanCenter(
  model,
  centerOnlyInteractors = TRUE,
  centerDV = FALSE,
  standardize = FALSE,
  terms = NULL
)

Arguments

model

a fitted regression model (presumably from lm)

centerOnlyInteractors

Default TRUE. If FALSE, all numeric predictors in the regression data frame are centered before the regression is conducted.

centerDV

Default FALSE. Should the dependent variable be centered? Do not set this option to TRUE unless the dependent variable is a numeric variable. Otherwise, it is an error.

standardize

Default FALSE. Instead of simply mean-centering the variables, should they also be "standardized" by first mean-centering and then dividing by the estimated standard deviation.

terms

Optional. A vector of variable names to be centered. Supplying this argument will stop meanCenter from searching for interaction terms that might need to be centered.

Value

A regression model of the same type as the input model, with attributes representing the names of the centered variables.

Details

Works with "lm" class objects, objects estimated by glm(). This centers some or all of the the predictors and then re-fits the original model with the new variables. This is a convenience to researchers who are often urged to center their predictors. This is sometimes suggested as a way to ameliorate multi-collinearity in models that include interaction terms (Aiken and West, 1991; Cohen, et al 2002). Mean-centering may enhance interpretation of the regression intercept, but it actually does not help with multicollinearity. (Echambadi and Hess, 2007). This function facilitates comparison of mean-centered models with others by calculating centered variables. The defaults will cause a regression's numeric interactive variables to be mean centered. Variations on the arguments are discussed in details.

Suppose the user's formula that fits the original model is m1 <- lm(y ~ x1*x2 + x3 + x4, data = dat). The fitted model will include estimates for predictors x1, x2, x1:x2, x3 and x4. By default, meanCenter(m1) scans the output to see if there are interaction terms of the form x1:x2. If so, then x1 and x2 are replaced by centered versions (m1-mean(m1)) and (m2-mean(m2)). The model is re-estimated with those new variables. model (the main effect and the interaction). The resulting thing is "just another regression model", which can be analyzed or plotted like any R regression object.

The user can claim control over which variables are centered in several ways. Most directly, by specifying a vector of variable names, the user can claim direct control. For example, the argument terms=c("x1","x2","x3") would cause 3 predictors to be centered. If one wants all predictors to be centered, the argument centerOnlyInteractors should be set to FALSE. Please note, this WILL NOT center factor variables. But it will find all numeric predictors and center them.

The dependent variable will not be centered, unless the user explicitly requests it by setting centerDV = TRUE.

As an additional convenience to the user, the argument standardize = TRUE can be used. This will divide each centered variable by its observed standard deviation. For people who like standardized regression, I suggest this is a better approach than the standardize function (which is brain-dead in the style of SPSS). meanCenter with standardize = TRUE will only try to standardize the numeric predictors.

To be completely clear, I believe mean-centering is not helpful with the multicollinearity problem. It doesn't help, it doesn't hurt. Only a misunderstanding leads its proponents to claim otherwise. This is emphasized in the vignette "rockchalk" that is distributed with this package.

References

Aiken, L. S. and West, S.G. (1991). Multiple Regression: Testing and Interpreting Interactions. Newbury Park, Calif: Sage Publications.

Cohen, J., Cohen, P., West, S. G., and Aiken, L. S. (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (Third.). Routledge Academic.

Echambadi, R., and Hess, J. D. (2007). Mean-Centering Does Not Alleviate Collinearity Problems in Moderated Multiple Regression Models. Marketing Science, 26(3), 438-445.

Author

Paul E. Johnson pauljohn@ku.edu

Examples


library(rockchalk)
N <- 100
dat <- genCorrelatedData(N = N, means = c(100, 200), sds = c(20, 30),
                         rho = 0.4, stde = 10)
dat$x3 <- rnorm(100, m = 40, s = 4)

m1 <- lm(y ~ x1 * x2 + x3, data = dat)
summary(m1)
#> 
#> Call:
#> lm(formula = y ~ x1 * x2 + x3, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -17.453825  40.074264  -0.436   0.6642  
#> x1            0.581439   0.356429   1.631   0.1061  
#> x2            0.364921   0.177370   2.057   0.0424 *
#> x3           -0.482429   0.265261  -1.819   0.0721 .
#> x1:x2        -0.001659   0.001718  -0.966   0.3366  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 
mcDiagnose(m1)
#> The following auxiliary models are being estimated and returned in a list:
#> x1 ~ x2 + x3 + `x1:x2`
#> x2 ~ x1 + x3 + `x1:x2`
#> x3 ~ x1 + x2 + `x1:x2`
#> `x1:x2` ~ x1 + x2 + x3
#> 
#> R_j Squares of auxiliary models
#>         x1         x2         x3      x1:x2 
#> 0.98096380 0.95892260 0.08076608 0.98996289 
#> The Corresponding VIF, 1/(1-R_j^2)
#>        x1        x2        x3     x1:x2 
#> 52.531503 24.344285  1.087862 99.630235 
#> Bivariate Pearson Correlations for design matrix 
#>         x1    x2    x3 x1:x2
#> x1    1.00  0.34  0.08  0.89
#> x2    0.34  1.00 -0.03  0.73
#> x3    0.08 -0.03  1.00  0.07
#> x1:x2 0.89  0.73  0.07  1.00

m1c <- meanCenter(m1)
summary(m1c)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c"
#> The centers and scale factors were 
#>            x1c      x2c
#> mean  99.29911 199.0167
#> scale  1.00000   1.0000
#> The summary statistics of the variables in the design matrix (after centering). 
#>             mean std.dev.
#> y        60.5215  13.4035
#> x1c       0.0000  20.7613
#> x2c       0.0000  28.4011
#> x3       39.9228   4.0145
#> x1c:x2c 200.4222 636.5567
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m1)
#> 
#> Call:
#> lm(formula = y ~ x1c * x2c + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 80.114057  10.552238   7.592 2.16e-11 ***
#> x1c          0.251185   0.053654   4.682 9.45e-06 ***
#> x2c          0.200142   0.038429   5.208 1.10e-06 ***
#> x3          -0.482429   0.265261  -1.819   0.0721 .  
#> x1c:x2c     -0.001659   0.001718  -0.966   0.3366    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 
mcDiagnose(m1c)
#> The following auxiliary models are being estimated and returned in a list:
#> x1c ~ x2c + x3 + `x1c:x2c`
#> x2c ~ x1c + x3 + `x1c:x2c`
#> x3 ~ x1c + x2c + `x1c:x2c`
#> `x1c:x2c` ~ x1c + x2c + x3
#> 
#> R_j Squares of auxiliary models
#>        x1c        x2c         x3    x1c:x2c 
#> 0.15991095 0.12494002 0.08076608 0.12871144 
#> The Corresponding VIF, 1/(1-R_j^2)
#>      x1c      x2c       x3  x1c:x2c 
#> 1.190350 1.142779 1.087862 1.147725 
#> Bivariate Pearson Correlations for design matrix 
#>          x1c   x2c    x3 x1c:x2c
#> x1c     1.00  0.34  0.08    0.24
#> x2c     0.34  1.00 -0.03    0.13
#> x3      0.08 -0.03  1.00    0.27
#> x1c:x2c 0.24  0.13  0.27    1.00

m2 <- lm(y ~ x1 * x2 + x3, data = dat)
summary(m2)
#> 
#> Call:
#> lm(formula = y ~ x1 * x2 + x3, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -17.453825  40.074264  -0.436   0.6642  
#> x1            0.581439   0.356429   1.631   0.1061  
#> x2            0.364921   0.177370   2.057   0.0424 *
#> x3           -0.482429   0.265261  -1.819   0.0721 .
#> x1:x2        -0.001659   0.001718  -0.966   0.3366  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 
mcDiagnose(m2)
#> The following auxiliary models are being estimated and returned in a list:
#> x1 ~ x2 + x3 + `x1:x2`
#> x2 ~ x1 + x3 + `x1:x2`
#> x3 ~ x1 + x2 + `x1:x2`
#> `x1:x2` ~ x1 + x2 + x3
#> 
#> R_j Squares of auxiliary models
#>         x1         x2         x3      x1:x2 
#> 0.98096380 0.95892260 0.08076608 0.98996289 
#> The Corresponding VIF, 1/(1-R_j^2)
#>        x1        x2        x3     x1:x2 
#> 52.531503 24.344285  1.087862 99.630235 
#> Bivariate Pearson Correlations for design matrix 
#>         x1    x2    x3 x1:x2
#> x1    1.00  0.34  0.08  0.89
#> x2    0.34  1.00 -0.03  0.73
#> x3    0.08 -0.03  1.00  0.07
#> x1:x2 0.89  0.73  0.07  1.00

m2c <- meanCenter(m2, standardize = TRUE)
summary(m2c)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1cs" "x2cs"
#> The centers and scale factors were 
#>           x1cs      x2cs
#> mean  99.29911 199.01673
#> scale 20.76130  28.40112
#> The summary statistics of the variables in the design matrix (after centering). 
#>               mean std.dev.
#> y         60.52153 13.40353
#> x1cs       0.00000  1.00000
#> x2cs       0.00000  1.00000
#> x3        39.92284  4.01449
#> x1cs:x2cs  0.33990  1.07956
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m2, standardize = TRUE)
#> 
#> Call:
#> lm(formula = y ~ x1cs * x2cs + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  80.1141    10.5522   7.592 2.16e-11 ***
#> x1cs          5.2149     1.1139   4.682 9.45e-06 ***
#> x2cs          5.6842     1.0914   5.208 1.10e-06 ***
#> x3           -0.4824     0.2653  -1.819   0.0721 .  
#> x1cs:x2cs    -0.9785     1.0132  -0.966   0.3366    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 
mcDiagnose(m2c)
#> The following auxiliary models are being estimated and returned in a list:
#> x1cs ~ x2cs + x3 + `x1cs:x2cs`
#> x2cs ~ x1cs + x3 + `x1cs:x2cs`
#> x3 ~ x1cs + x2cs + `x1cs:x2cs`
#> `x1cs:x2cs` ~ x1cs + x2cs + x3
#> 
#> R_j Squares of auxiliary models
#>       x1cs       x2cs         x3  x1cs:x2cs 
#> 0.15991095 0.12494002 0.08076608 0.12871144 
#> The Corresponding VIF, 1/(1-R_j^2)
#>      x1cs      x2cs        x3 x1cs:x2cs 
#>  1.190350  1.142779  1.087862  1.147725 
#> Bivariate Pearson Correlations for design matrix 
#>           x1cs  x2cs    x3 x1cs:x2cs
#> x1cs      1.00  0.34  0.08      0.24
#> x2cs      0.34  1.00 -0.03      0.13
#> x3        0.08 -0.03  1.00      0.27
#> x1cs:x2cs 0.24  0.13  0.27      1.00

m2c2 <- meanCenter(m2, centerOnlyInteractors = FALSE)
summary(m2c2)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c" "x3c"
#> The centers and scale factors were 
#>            x1c      x2c      x3c
#> mean  99.29911 199.0167 39.92284
#> scale  1.00000   1.0000  1.00000
#> The summary statistics of the variables in the design matrix (after centering). 
#>             mean std.dev.
#> y        60.5215  13.4035
#> x1c       0.0000  20.7613
#> x2c       0.0000  28.4011
#> x3c       0.0000   4.0145
#> x1c:x2c 200.4222 636.5567
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m2, centerOnlyInteractors = FALSE)
#> 
#> Call:
#> lm(formula = y ~ x1c * x2c + x3c, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 60.854115   1.072650  56.732  < 2e-16 ***
#> x1c          0.251185   0.053654   4.682 9.45e-06 ***
#> x2c          0.200142   0.038429   5.208 1.10e-06 ***
#> x3c         -0.482429   0.265261  -1.819   0.0721 .  
#> x1c:x2c     -0.001659   0.001718  -0.966   0.3366    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 

m2c3 <- meanCenter(m2, centerOnlyInteractors = FALSE, centerDV = TRUE)
summary(m2c3)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "yc"  "x1c" "x2c" "x3c"
#> The centers and scale factors were 
#>             yc      x1c      x2c      x3c
#> mean  60.52153 99.29911 199.0167 39.92284
#> scale  1.00000  1.00000   1.0000  1.00000
#> The summary statistics of the variables in the design matrix (after centering). 
#>             mean std.dev.
#> yc        0.0000  13.4035
#> x1c       0.0000  20.7613
#> x2c       0.0000  28.4011
#> x3c       0.0000   4.0145
#> x1c:x2c 200.4222 636.5567
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m2, centerOnlyInteractors = FALSE, 
#>     centerDV = TRUE)
#> 
#> Call:
#> lm(formula = yc ~ x1c * x2c + x3c, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -23.679  -4.835   0.081   6.131  22.732 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.332586   1.072650   0.310   0.7572    
#> x1c          0.251185   0.053654   4.682 9.45e-06 ***
#> x2c          0.200142   0.038429   5.208 1.10e-06 ***
#> x3c         -0.482429   0.265261  -1.819   0.0721 .  
#> x1c:x2c     -0.001659   0.001718  -0.966   0.3366    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 10.16 on 95 degrees of freedom
#> Multiple R-squared:  0.4488,	Adjusted R-squared:  0.4256 
#> F-statistic: 19.34 on 4 and 95 DF,  p-value: 1.152e-11
#> 

dat <- genCorrelatedData(N = N, means = c(100, 200), sds = c(20, 30),
                         rho = 0.4, stde = 10)
dat$x3 <- rnorm(100, m = 40, s = 4)
dat$x3 <- gl(4, 25, labels = c("none", "some", "much", "total"))

m3 <- lm(y ~ x1 * x2 + x3, data = dat)
summary(m3)
#> 
#> Call:
#> lm(formula = y ~ x1 * x2 + x3, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -26.834  -8.511   1.840   8.121  26.512 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.585e+01  4.824e+01  -0.329    0.743
#> x1           2.899e-01  4.668e-01   0.621    0.536
#> x2           2.763e-01  2.509e-01   1.101    0.274
#> x3some      -1.648e+00  3.378e+00  -0.488    0.627
#> x3much      -2.005e+00  3.462e+00  -0.579    0.564
#> x3total     -9.792e-01  3.466e+00  -0.282    0.778
#> x1:x2       -3.786e-04  2.366e-03  -0.160    0.873
#> 
#> Residual standard error: 11.85 on 93 degrees of freedom
#> Multiple R-squared:  0.4227,	Adjusted R-squared:  0.3854 
#> F-statistic: 11.35 on 6 and 93 DF,  p-value: 1.754e-09
#> 
## visualize, for fun
plotPlane(m3, "x1", "x2")


m3c1 <- meanCenter(m3)
summary(m3c1)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c"
#> The centers and scale factors were 
#>            x1c      x2c
#> mean  102.2981 200.3176
#> scale   1.0000   1.0000
#> The summary statistics of the variables in the design matrix (after centering). 
#>            mean std.dev.
#> y        60.148  15.1175
#> x1c       0.000  18.6811
#> x2c       0.000  31.6105
#> x3some    0.250   0.4352
#> x3much    0.250   0.4352
#> x3total   0.250   0.4352
#> x1c:x2c 227.762 530.1327
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m3)
#> 
#> Call:
#> lm(formula = y ~ x1c * x2c + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -26.834  -8.511   1.840   8.121  26.512 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 61.3923191  2.5407152  24.163  < 2e-16 ***
#> x1c          0.2141069  0.0725932   2.949  0.00403 ** 
#> x2c          0.2375320  0.0415086   5.722 1.27e-07 ***
#> x3some      -1.6482791  3.3781023  -0.488  0.62675    
#> x3much      -2.0048704  3.4617949  -0.579  0.56389    
#> x3total     -0.9791906  3.4663022  -0.282  0.77820    
#> x1c:x2c     -0.0003786  0.0023661  -0.160  0.87322    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 11.85 on 93 degrees of freedom
#> Multiple R-squared:  0.4227,	Adjusted R-squared:  0.3854 
#> F-statistic: 11.35 on 6 and 93 DF,  p-value: 1.754e-09
#> 

## Not exactly the same as a "standardized" regression because the
## interactive variables are centered in the model frame,
## and the term "x1:x2" is never centered again.
m3c2 <- meanCenter(m3, centerDV = TRUE,
                   centerOnlyInteractors = FALSE, standardize = TRUE)
summary(m3c2)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "ycs"  "x1cs" "x2cs"
#> The centers and scale factors were 
#>            ycs      x1cs      x2cs
#> mean  60.14800 102.29814 200.31763
#> scale 15.11747  18.68108  31.61046
#> The summary statistics of the variables in the design matrix (after centering). 
#>               mean  std.dev.
#> ycs       0.000000 1.0000000
#> x1cs      0.000000 1.0000000
#> x2cs      0.000000 1.0000000
#> x3some    0.250000 0.4351941
#> x3much    0.250000 0.4351941
#> x3total   0.250000 0.4351941
#> x1cs:x2cs 0.385699 0.8977427
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m3, centerOnlyInteractors = FALSE, 
#>     centerDV = TRUE, standardize = TRUE)
#> 
#> Call:
#> lm(formula = ycs ~ x1cs * x2cs + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.7750 -0.5630  0.1217  0.5372  1.7537 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.08231    0.16806   0.490  0.62546    
#> x1cs         0.26458    0.08971   2.949  0.00403 ** 
#> x2cs         0.49668    0.08679   5.722 1.27e-07 ***
#> x3some      -0.10903    0.22346  -0.488  0.62675    
#> x3much      -0.13262    0.22899  -0.579  0.56389    
#> x3total     -0.06477    0.22929  -0.282  0.77820    
#> x1cs:x2cs   -0.01479    0.09242  -0.160  0.87322    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.7839 on 93 degrees of freedom
#> Multiple R-squared:  0.4227,	Adjusted R-squared:  0.3854 
#> F-statistic: 11.35 on 6 and 93 DF,  p-value: 1.754e-09
#> 

m3st <- standardize(m3)
summary(m3st)
#> All variables in the model matrix and the dependent variable
#> were centered. The centered variables have the letter "s" appended to their
#> non-centered counterparts, even constructed
#> variables like `x1:x2` and poly(x1,2). We agree, that's probably
#> ill-advised, but you asked for it by running standardize().
#> 
#> The rockchalk function meanCenter is a smarter option, probably. 
#> 
#> The summary statistics of the variables in the design matrix. 
#>          mean std.dev.
#> ys          0        1
#> x1s         0        1
#> x2s         0        1
#> x3somes     0        1
#> x3muchs     0        1
#> x3totals    0        1
#> `x1:x2s`    0        1
#> 
#> Call:
#> lm(formula = ys ~ -1 + x1s + x2s + x3somes + x3muchs + x3totals + 
#>     `x1:x2s`, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.7750 -0.5630  0.1217  0.5372  1.7537 
#> 
#> Coefficients:
#>          Estimate Std. Error t value Pr(>|t|)
#> x1s       0.35830    0.57380   0.624    0.534
#> x2s       0.57766    0.52181   1.107    0.271
#> x3somes  -0.04745    0.09673  -0.491    0.625
#> x3muchs  -0.05772    0.09912  -0.582    0.562
#> x3totals -0.02819    0.09925  -0.284    0.777
#> `x1:x2s` -0.14571    0.90575  -0.161    0.873
#> 
#> Residual standard error: 0.7798 on 94 degrees of freedom
#> Multiple R-squared:  0.4227,	Adjusted R-squared:  0.3858 
#> F-statistic: 11.47 on 6 and 94 DF,  p-value: 1.36e-09
#> 

## Make a bigger dataset to see effects better
N <- 500
dat <- genCorrelatedData(N = N, means = c(200,200), sds = c(60,30),
                         rho = 0.2, stde = 10)
dat$x3 <- rnorm(100, m = 40, s = 4)
dat$x3 <- gl(4, 25, labels = c("none", "some", "much", "total"))
dat$y2 <- with(dat,
               0.4 - 0.15 * x1 + 0.04 * x1^2 -
               drop(contrasts(dat$x3)[dat$x3, ] %*% c(-1.9, 0, 5.1))  +
               1000* rnorm(nrow(dat)))
dat$y2 <- drop(dat$y2)

m4literal <- lm(y2 ~ x1 + I(x1*x1) + x2 + x3, data = dat)
summary(m4literal)
#> 
#> Call:
#> lm(formula = y2 ~ x1 + I(x1 * x1) + x2 + x3, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  9.412e+02  4.318e+02   2.180  0.02976 *  
#> x1          -8.940e+00  3.302e+00  -2.707  0.00702 ** 
#> I(x1 * x1)   6.270e-02  7.955e-03   7.882 2.09e-14 ***
#> x2          -1.080e+00  1.527e+00  -0.707  0.47974    
#> x3some      -1.120e+02  1.243e+02  -0.901  0.36781    
#> x3much       5.929e+01  1.244e+02   0.477  0.63382    
#> x3total      2.130e+01  1.242e+02   0.171  0.86396    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 
plotCurves(m4literal, plotx="x1")

## Superficially, there is multicollinearity (omit the intercept)
cor(model.matrix(m4literal)[ -1 , -1 ])
#>                      x1  I(x1 * x1)          x2       x3some      x3much
#> x1          1.000000000  0.97484776 0.200596889  0.055596995 -0.03568623
#> I(x1 * x1)  0.974847755  1.00000000 0.193916204  0.044145556 -0.03810690
#> x2          0.200596889  0.19391620 1.000000000  0.004033362  0.04724582
#> x3some      0.055596995  0.04414556 0.004033362  1.000000000 -0.33422460
#> x3much     -0.035686229 -0.03810690 0.047245820 -0.334224599  1.00000000
#> x3total     0.002703986  0.01114203 0.029117095 -0.334224599 -0.33422460
#>                 x3total
#> x1          0.002703986
#> I(x1 * x1)  0.011142031
#> x2          0.029117095
#> x3some     -0.334224599
#> x3much     -0.334224599
#> x3total     1.000000000

m4literalmc <- meanCenter(m4literal, terms = "x1")
summary(m4literalmc)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c"
#> The centers and scale factors were 
#>            x1c
#> mean  196.3668
#> scale   1.0000
#> The summary statistics of the variables in the design matrix (after centering). 
#>                  mean std.dev.
#> y2           1603.279 1419.422
#> x1c             0.000   59.776
#> I(x1c * x1c) 3566.035 5562.310
#> x2            199.914   29.439
#> x3some          0.250    0.433
#> x3much          0.250    0.433
#> x3total         0.250    0.433
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m4literal, terms = "x1")
#> 
#> Call:
#> lm(formula = y2 ~ x1c + I(x1c * x1c) + x2 + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   1.603e+03  3.134e+02   5.117 4.47e-07 ***
#> x1c           1.568e+01  7.560e-01  20.747  < 2e-16 ***
#> I(x1c * x1c)  6.270e-02  7.955e-03   7.882 2.09e-14 ***
#> x2           -1.080e+00  1.527e+00  -0.707    0.480    
#> x3some       -1.120e+02  1.243e+02  -0.901    0.368    
#> x3much        5.929e+01  1.244e+02   0.477    0.634    
#> x3total       2.130e+01  1.242e+02   0.171    0.864    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 

m4literalmcs <- meanCenter(m4literal, terms = "x1", standardize = TRUE)
summary(m4literalmcs)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1cs"
#> The centers and scale factors were 
#>           x1cs
#> mean  196.3668
#> scale  59.7761
#> The summary statistics of the variables in the design matrix (after centering). 
#>                     mean  std.dev.
#> y2             1603.2794 1419.4217
#> x1cs              0.0000    1.0000
#> I(x1cs * x1cs)    0.9980    1.5567
#> x2              199.9144   29.4390
#> x3some            0.2500    0.4334
#> x3much            0.2500    0.4334
#> x3total           0.2500    0.4334
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m4literal, standardize = TRUE, terms = "x1")
#> 
#> Call:
#> lm(formula = y2 ~ x1cs + I(x1cs * x1cs) + x2 + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    1603.421    313.381   5.117 4.47e-07 ***
#> x1cs            937.573     45.190  20.747  < 2e-16 ***
#> I(x1cs * x1cs)  224.038     28.425   7.882 2.09e-14 ***
#> x2               -1.080      1.527  -0.707    0.480    
#> x3some         -112.038    124.292  -0.901    0.368    
#> x3much           59.289    124.387   0.477    0.634    
#> x3total          21.296    124.232   0.171    0.864    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 

m4 <- lm(y2 ~ poly(x1, 2, raw = TRUE) + x2 + x3, data = dat)
summary(m4)
#> 
#> Call:
#> lm(formula = y2 ~ poly(x1, 2, raw = TRUE) + x2 + x3, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               9.412e+02  4.318e+02   2.180  0.02976 *  
#> poly(x1, 2, raw = TRUE)1 -8.940e+00  3.302e+00  -2.707  0.00702 ** 
#> poly(x1, 2, raw = TRUE)2  6.270e-02  7.955e-03   7.882 2.09e-14 ***
#> x2                       -1.080e+00  1.527e+00  -0.707  0.47974    
#> x3some                   -1.120e+02  1.243e+02  -0.901  0.36781    
#> x3much                    5.929e+01  1.244e+02   0.477  0.63382    
#> x3total                   2.130e+01  1.242e+02   0.171  0.86396    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 
plotCurves(m4, plotx="x1")

m4mc1 <- meanCenter(m4, terms = "x1")
summary(m4mc1)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c"
#> The centers and scale factors were 
#>            x1c
#> mean  196.3668
#> scale   1.0000
#> The summary statistics of the variables in the design matrix (after centering). 
#>                               mean std.dev.
#> y2                        1603.279 1419.422
#> poly(x1c, 2, raw = TRUE)1    0.000   59.776
#> poly(x1c, 2, raw = TRUE)2 3566.035 5562.310
#> x2                         199.914   29.439
#> x3some                       0.250    0.433
#> x3much                       0.250    0.433
#> x3total                      0.250    0.433
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m4, terms = "x1")
#> 
#> Call:
#> lm(formula = y2 ~ poly(x1c, 2, raw = TRUE) + x2 + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>                             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                1.603e+03  3.134e+02   5.117 4.47e-07 ***
#> poly(x1c, 2, raw = TRUE)1  1.568e+01  7.560e-01  20.747  < 2e-16 ***
#> poly(x1c, 2, raw = TRUE)2  6.270e-02  7.955e-03   7.882 2.09e-14 ***
#> x2                        -1.080e+00  1.527e+00  -0.707    0.480    
#> x3some                    -1.120e+02  1.243e+02  -0.901    0.368    
#> x3much                     5.929e+01  1.244e+02   0.477    0.634    
#> x3total                    2.130e+01  1.242e+02   0.171    0.864    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 

m4mc2 <- meanCenter(m4, terms = "x1", standardize = TRUE)
summary(m4mc2)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1cs"
#> The centers and scale factors were 
#>           x1cs
#> mean  196.3668
#> scale  59.7761
#> The summary statistics of the variables in the design matrix (after centering). 
#>                                 mean  std.dev.
#> y2                         1603.2794 1419.4217
#> poly(x1cs, 2, raw = TRUE)1    0.0000    1.0000
#> poly(x1cs, 2, raw = TRUE)2    0.9980    1.5567
#> x2                          199.9144   29.4390
#> x3some                        0.2500    0.4334
#> x3much                        0.2500    0.4334
#> x3total                       0.2500    0.4334
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m4, standardize = TRUE, terms = "x1")
#> 
#> Call:
#> lm(formula = y2 ~ poly(x1cs, 2, raw = TRUE) + x2 + x3, data = stddat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2644.2  -721.1   -14.8   596.9  3287.6 
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                1603.421    313.381   5.117 4.47e-07 ***
#> poly(x1cs, 2, raw = TRUE)1  937.573     45.190  20.747  < 2e-16 ***
#> poly(x1cs, 2, raw = TRUE)2  224.038     28.425   7.882 2.09e-14 ***
#> x2                           -1.080      1.527  -0.707    0.480    
#> x3some                     -112.038    124.292  -0.901    0.368    
#> x3much                       59.289    124.387   0.477    0.634    
#> x3total                      21.296    124.232   0.171    0.864    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 979.8 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#> 

m4mc3 <- meanCenter(m4, terms = "x1", centerDV = TRUE, standardize = TRUE)
summary(m4mc3)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "y2cs" "x1cs"
#> The centers and scale factors were 
#>           y2cs     x1cs
#> mean  1603.279 196.3668
#> scale 1419.422  59.7761
#> The summary statistics of the variables in the design matrix (after centering). 
#>                                mean std.dev.
#> y2cs                         0.0000  1.00000
#> poly(x1cs, 2, raw = TRUE)1   0.0000  1.00000
#> poly(x1cs, 2, raw = TRUE)2   0.9980  1.55668
#> x2                         199.9144 29.43904
#> x3some                       0.2500  0.43345
#> x3much                       0.2500  0.43345
#> x3total                      0.2500  0.43345
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m4, centerDV = TRUE, standardize = TRUE, 
#>     terms = "x1")
#> 
#> Call:
#> lm(formula = y2cs ~ poly(x1cs, 2, raw = TRUE) + x2 + x3, data = stddat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.86287 -0.50805 -0.01046  0.42049  2.31616 
#> 
#> Coefficients:
#>                              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                 9.973e-05  2.208e-01   0.000    1.000    
#> poly(x1cs, 2, raw = TRUE)1  6.605e-01  3.184e-02  20.747  < 2e-16 ***
#> poly(x1cs, 2, raw = TRUE)2  1.578e-01  2.003e-02   7.882 2.09e-14 ***
#> x2                         -7.607e-04  1.076e-03  -0.707    0.480    
#> x3some                     -7.893e-02  8.756e-02  -0.901    0.368    
#> x3much                      4.177e-02  8.763e-02   0.477    0.634    
#> x3total                     1.500e-02  8.752e-02   0.171    0.864    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.6903 on 493 degrees of freedom
#> Multiple R-squared:  0.5293,	Adjusted R-squared:  0.5235 
#> F-statistic: 92.39 on 6 and 493 DF,  p-value: < 2.2e-16
#>