Skip to contents

Given a fitted lm, this function scans for coefficients estimated from "interaction terms" by checking for colon symbols. The function then calculates the "residual centered" estimate of the interaction term and replaces the interaction term with that residual centered estimate. It works for any order of interaction, unlike other implementations of the same approach. The function lmres in the now-archived package pequod was a similar function.

Calculates predicted values of residual centered interaction regressions estimated in any type of regression framework (lm, glm, etc).

Usage

residualCenter(model)

# Default S3 method
residualCenter(model)

# S3 method for class 'rcreg'
predict(object, ...)

Arguments

model

A fitted lm object

object

Fitted residual-centered regression from residualCenter

...

Other named arguments. May include newdata, a dataframe of predictors. That should include values for individual predictor, need not include interactions that are constructed by residualCenter. These parameters that will be passed to the predict method of the model.

Value

a regression model of the type as the input model, with the exception that the residualCentered predictor is used in place of the original interaction. The return model includes new variable centeringRegressions: a list including each of the intermediate regressions that was calculated in order to create the residual centered interaction terms. These latter objects may be necessary for diagnostics and to calculate predicted values for hypothetical values of the inputs. If there are no interactive terms, then NULL is returned.

References

Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the Merits of Orthogonalizing Powered and Product Terms: Implications for Modeling Interactions Among Latent Variables. Structural Equation Modeling, 13(4), 497-519.

Author

Paul E. Johnson pauljohn@ku.edu

Examples

set.seed(123)
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)
x4 <- rnorm(100)
y <- rnorm(100)
dat <- data.frame(y, x1,x2,x3,x4)
rm(x1,x2,x3,x4,y)
m1 <- lm(y~ x1*x2 + x4, data = dat)
 
m1RC <- residualCenter(m1)

m1RCs <- summary(m1RC)
## The stage 1 centering regressions can be viewed as well
## lapply(m1RC$rcRegressions, summary)

## Verify residualCenter() output against the manual calculation
dat$x1rcx2 <- as.numeric(resid(lm(I(x1*x2) ~ x1 + x2, data = dat)))
m1m <- lm(y ~ x1 + x2 + x4 + x1rcx2, data=dat)
summary(m1m)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x4 + x1rcx2, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.47406 -0.58765  0.00153  0.64621  2.46504 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.10952    0.09889   1.107    0.271  
#> x1          -0.21718    0.10793  -2.012    0.047 *
#> x2          -0.14304    0.10188  -1.404    0.164  
#> x4          -0.01607    0.09508  -0.169    0.866  
#> x1rcx2       0.08115    0.11863   0.684    0.496  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9782 on 95 degrees of freedom
#> Multiple R-squared:  0.06193,	Adjusted R-squared:  0.02244 
#> F-statistic: 1.568 on 4 and 95 DF,  p-value: 0.1891
#> 
cbind("residualCenter" = coef(m1RC), "manual" = coef(m1m))
#>             residualCenter      manual
#> (Intercept)     0.10951964  0.10951964
#> x1             -0.21717945 -0.21717945
#> x2             -0.14304064 -0.14304064
#> x4             -0.01606806 -0.01606806
#> x1.X.x2         0.08114636  0.08114636


m2 <- lm(y~ x1*x2*x3 + x4, data=dat)
m2RC <- residualCenter(m2)
m2RCs <- summary(m2RC)

## Verify that result manually
dat$x2rcx3 <- as.numeric(resid(lm(I(x2*x3) ~ x2 + x3, data = dat)))
dat$x1rcx3 <- as.numeric(resid(lm(I(x1*x3) ~ x1 + x3, data = dat)))
dat$x1rcx2rcx3 <- as.numeric( resid(lm(I(x1*x2*x3) ~ x1 + x2 + x3 + x1rcx2 +
                                       x1rcx3 + x2rcx3 , data=dat)))
(m2m <- lm(y ~ x1 + x2 + x3+ x4 + x1rcx2 + x1rcx3 + x2rcx3 + x1rcx2rcx3,
           data = dat))
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4 + x1rcx2 + x1rcx3 + x2rcx3 + 
#>     x1rcx2rcx3, data = dat)
#> 
#> Coefficients:
#> (Intercept)           x1           x2           x3           x4       x1rcx2  
#>    0.118543    -0.236471    -0.141952    -0.065243    -0.035328     0.104212  
#>      x1rcx3       x2rcx3   x1rcx2rcx3  
#>    0.006331     0.070705     0.060647  
#> 

cbind("residualCenter" = coef(m2RC), "manual" = coef(m2m))
#>              residualCenter       manual
#> (Intercept)     0.118542591  0.118542591
#> x1             -0.236470691 -0.236470691
#> x2             -0.141951863 -0.141951863
#> x3             -0.065242673 -0.065242673
#> x4             -0.035327893 -0.035327893
#> x1.X.x2         0.104211909  0.104211909
#> x1.X.x3         0.006330921  0.006330921
#> x2.X.x3         0.070704921  0.070704921
#> x1.X.x2.X.x3    0.060647383  0.060647383


### As good as pequod's lmres
### not run because pequod generates R warnings
###
### if (require(pequod)){
###  pequodm1 <- lmres(y ~ x1*x2*x3 + x4, data=dat) 
###  pequodm1s <- summary(pequodm1)
###  coef(pequodm1s)
### }

### Works with any number of interactions. See:

m3 <- lm(y~ x1*x2*x3*x4, data=dat)
m3RC <- residualCenter(m3)
summary(m3RC)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4 + x1.X.x2 + x1.X.x3 + x2.X.x3 + 
#>     x1.X.x4 + x2.X.x4 + x3.X.x4 + x1.X.x2.X.x3 + x1.X.x2.X.x4 + 
#>     x1.X.x3.X.x4 + x2.X.x3.X.x4 + x1.X.x2.X.x3.X.x4, data = mfnew)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.41000 -0.53226  0.00876  0.59009  2.33835 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)        0.12930    0.10185   1.270   0.2077  
#> x1                -0.25252    0.11423  -2.211   0.0298 *
#> x2                -0.10625    0.10914  -0.974   0.3331  
#> x3                -0.11271    0.11515  -0.979   0.3304  
#> x4                -0.04223    0.09999  -0.422   0.6739  
#> x1.X.x2            0.13714    0.14919   0.919   0.3606  
#> x1.X.x3           -0.03444    0.12053  -0.286   0.7758  
#> x2.X.x3            0.12018    0.12132   0.991   0.3247  
#> x1.X.x4           -0.15997    0.13541  -1.181   0.2408  
#> x2.X.x4           -0.00017    0.12879  -0.001   0.9990  
#> x3.X.x4           -0.06433    0.10870  -0.592   0.5556  
#> x1.X.x2.X.x3       0.02111    0.13434   0.157   0.8755  
#> x1.X.x2.X.x4       0.24153    0.18166   1.330   0.1873  
#> x1.X.x3.X.x4      -0.17789    0.13475  -1.320   0.1904  
#> x2.X.x3.X.x4      -0.03220    0.14286  -0.225   0.8222  
#> x1.X.x2.X.x3.X.x4 -0.23813    0.16727  -1.424   0.1583  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9941 on 84 degrees of freedom
#> Multiple R-squared:  0.1434,	Adjusted R-squared:  -0.009608 
#> F-statistic: 0.9372 on 15 and 84 DF,  p-value: 0.5274
#> 
##'
## Verify that one manually (Gosh, this is horrible to write out)
dat$x1rcx4 <- as.numeric(resid(lm(I(x1*x4) ~ x1 + x4, data=dat)))
dat$x2rcx4 <- as.numeric(resid(lm(I(x2*x4) ~ x2 + x4, data=dat)))
dat$x3rcx4 <- as.numeric(resid(lm(I(x3*x4) ~ x3 + x4, data=dat)))
dat$x1rcx2rcx4 <- as.numeric(resid(lm(I(x1*x2*x4) ~ x1 + x2 + x4 +
                                      x1rcx2 + x1rcx4 + x2rcx4, data=dat)))
dat$x1rcx3rcx4 <- as.numeric(resid(lm(I(x1*x3*x4) ~ x1 + x3 + x4 +
                                      x1rcx3 + x1rcx4 + x3rcx4, data=dat)))
dat$x2rcx3rcx4 <- as.numeric(resid(lm(I(x2*x3*x4) ~ x2 + x3 + x4 +
                                      x2rcx3 + x2rcx4 + x3rcx4, data=dat)))
dat$x1rcx2rcx3rcx4 <-
    as.numeric(resid(lm(I(x1*x2*x3*x4) ~ x1 + x2 + x3 + x4 +
                        x1rcx2 + x1rcx3 + x2rcx3 + x1rcx4  + x2rcx4 +
                        x3rcx4  + x1rcx2rcx3 + x1rcx2rcx4 + x1rcx3rcx4 +
                        x2rcx3rcx4, data=dat)))
(m3m <- lm(y ~ x1 + x2 + x3 + x4 + x1rcx2 + x1rcx3 + x2rcx3 + x1rcx4 +
           x2rcx4 + x3rcx4 + x1rcx2rcx3 + x1rcx2rcx4 + x1rcx3rcx4 +
           x2rcx3rcx4 + x1rcx2rcx3rcx4, data=dat))
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4 + x1rcx2 + x1rcx3 + x2rcx3 + 
#>     x1rcx4 + x2rcx4 + x3rcx4 + x1rcx2rcx3 + x1rcx2rcx4 + x1rcx3rcx4 + 
#>     x2rcx3rcx4 + x1rcx2rcx3rcx4, data = dat)
#> 
#> Coefficients:
#>    (Intercept)              x1              x2              x3              x4  
#>        0.12930        -0.25252        -0.10625        -0.11272        -0.04223  
#>         x1rcx2          x1rcx3          x2rcx3          x1rcx4          x2rcx4  
#>        0.13714        -0.03444         0.12018        -0.15997        -0.00017  
#>         x3rcx4      x1rcx2rcx3      x1rcx2rcx4      x1rcx3rcx4      x2rcx3rcx4  
#>       -0.06433         0.02111         0.24153        -0.17789        -0.03220  
#> x1rcx2rcx3rcx4  
#>       -0.23813  
#> 

cbind("residualCenter"=coef(m3RC), "manual"=coef(m3m))
#>                   residualCenter        manual
#> (Intercept)         0.1293021234  0.1293021234
#> x1                 -0.2525227204 -0.2525227204
#> x2                 -0.1062510160 -0.1062510160
#> x3                 -0.1127150280 -0.1127150280
#> x4                 -0.0422279649 -0.0422279649
#> x1.X.x2             0.1371405135  0.1371405135
#> x1.X.x3            -0.0344389289 -0.0344389289
#> x2.X.x3             0.1201824844  0.1201824844
#> x1.X.x4            -0.1599696981 -0.1599696981
#> x2.X.x4            -0.0001699564 -0.0001699564
#> x3.X.x4            -0.0643254497 -0.0643254497
#> x1.X.x2.X.x3        0.0211110064  0.0211110064
#> x1.X.x2.X.x4        0.2415319453  0.2415319453
#> x1.X.x3.X.x4       -0.1778931446 -0.1778931446
#> x2.X.x3.X.x4       -0.0321984183 -0.0321984183
#> x1.X.x2.X.x3.X.x4  -0.2381330745 -0.2381330745

### If you want to fit a sequence of models, as in pequod, can do.

tm <-terms(m2)
tmvec <- attr(terms(m2), "term.labels")
f1 <- tmvec[grep(":", tmvec, invert = TRUE)]
f2 <- tmvec[grep(":.*:", tmvec, invert = TRUE)]
f3 <- tmvec[grep(":.*:.*:", tmvec, invert = TRUE)]

## > f1
## [1] "x1" "x2" "x3" "x4"
## > f2
## [1] "x1"    "x2"    "x3"    "x4"    "x1:x2" "x1:x3" "x2:x3"
## > f3
## [1] "x1"       "x2"       "x3"       "x4"       "x1:x2"    "x1:x3"    "x2:x3"   
## [8] "x1:x2:x3"

f1 <- lm(as.formula(paste("y","~", paste(f1, collapse=" + "))), data=dat)
f1RC <- residualCenter(f1)
summary(f1RC)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4, data = mfnew)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.47336 -0.58010  0.07461  0.68778  2.46552 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.11614    0.10000   1.161   0.2484  
#> x1          -0.22425    0.10899  -2.058   0.0424 *
#> x2          -0.14151    0.10204  -1.387   0.1687  
#> x3          -0.05046    0.10468  -0.482   0.6309  
#> x4          -0.02333    0.09506  -0.245   0.8067  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9794 on 95 degrees of freedom
#> Multiple R-squared:  0.05961,	Adjusted R-squared:  0.02002 
#> F-statistic: 1.506 on 4 and 95 DF,  p-value: 0.2067
#> 

f2 <- lm(as.formula(paste("y","~", paste(f2, collapse=" + "))), data=dat)
f2RC <- residualCenter(f2)
summary(f2RC)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4 + x1.X.x2 + x1.X.x3 + x2.X.x3, 
#>     data = mfnew)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.45237 -0.55481  0.06683  0.65416  2.44249 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.118670   0.101151   1.173    0.244  
#> x1          -0.235837   0.111445  -2.116    0.037 *
#> x2          -0.142414   0.104425  -1.364    0.176  
#> x3          -0.064775   0.107096  -0.605    0.547  
#> x4          -0.027304   0.097784  -0.279    0.781  
#> x1.X.x2      0.104368   0.127653   0.818    0.416  
#> x1.X.x3      0.007055   0.113700   0.062    0.951  
#> x2.X.x3      0.069285   0.111245   0.623    0.535  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.9901 on 92 degrees of freedom
#> Multiple R-squared:  0.06927,	Adjusted R-squared:  -0.00155 
#> F-statistic: 0.9781 on 7 and 92 DF,  p-value: 0.452
#> 

f3 <- lm(as.formula(paste("y","~", paste(f3, collapse=" + "))), data=dat)
f3RC <- residualCenter(f3)
summary(f3RC)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4 + x1.X.x2 + x1.X.x3 + x2.X.x3 + 
#>     x1.X.x2.X.x3, data = mfnew)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.42164 -0.56426  0.02735  0.68647  2.47984 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)   0.118543   0.101552   1.167   0.2461  
#> x1           -0.236471   0.111893  -2.113   0.0373 *
#> x2           -0.141952   0.104842  -1.354   0.1791  
#> x3           -0.065243   0.107524  -0.607   0.5455  
#> x4           -0.035328   0.099354  -0.356   0.7230  
#> x1.X.x2       0.104212   0.128159   0.813   0.4183  
#> x1.X.x3       0.006331   0.114159   0.055   0.9559  
#> x2.X.x3       0.070705   0.111718   0.633   0.5284  
#> x1.X.x2.X.x3  0.060647   0.115558   0.525   0.6010  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.994 on 91 degrees of freedom
#> Multiple R-squared:  0.07208,	Adjusted R-squared:  -0.0095 
#> F-statistic: 0.8835 on 8 and 91 DF,  p-value: 0.5336
#> 

library(rockchalk)
dat <- genCorrelatedData(1000, stde=5)

m1 <- lm(y ~ x1 * x2, data=dat)

m1mc <- meanCenter(m1)
summary(m1mc)
#> These variables were mean-centered before any transformations were made on the design matrix.
#> [1] "x1c" "x2c"
#> The centers and scale factors were 
#>            x1c      x2c
#> mean  49.90851 50.00128
#> scale  1.00000  1.00000
#> The summary statistics of the variables in the design matrix (after centering). 
#>             mean  std.dev.
#> y       19.98497   5.91242
#> x1c      0.00000  10.05505
#> x2c      0.00000   9.80779
#> x1c:x2c  2.71218 100.53693
#> 
#> The following results were produced from: 
#> meanCenter.default(model = m1)
#> 
#> Call:
#> lm(formula = y ~ x1c * x2c, data = stddat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -15.8872  -3.2103  -0.2919   3.3943  17.0119 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 19.9873047  0.1583952 126.186   <2e-16 ***
#> x1c          0.2137959  0.0157633  13.563   <2e-16 ***
#> x2c          0.2295018  0.0161609  14.201   <2e-16 ***
#> x1c:x2c     -0.0008599  0.0015762  -0.546    0.585    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 5.007 on 996 degrees of freedom
#> Multiple R-squared:  0.285,	Adjusted R-squared:  0.2828 
#> F-statistic: 132.3 on 3 and 996 DF,  p-value: < 2.2e-16
#> 

m1rc <- residualCenter(m1)
summary(m1rc)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x1.X.x2, data = mfnew)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -15.8872  -3.2103  -0.2919   3.3943  17.0119 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -2.1601746  1.1232019  -1.923   0.0547 .  
#> x1           0.2139460  0.0157609  13.574   <2e-16 ***
#> x2           0.2293426  0.0161583  14.194   <2e-16 ***
#> x1.X.x2     -0.0008599  0.0015762  -0.546   0.5855    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 5.007 on 996 degrees of freedom
#> Multiple R-squared:  0.285,	Adjusted R-squared:  0.2828 
#> F-statistic: 132.3 on 3 and 996 DF,  p-value: < 2.2e-16
#> 


newdf <- apply(dat, 2, summary)
newdf <- as.data.frame(newdf)

predict(m1rc, newdata=newdf)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>  4.876977 16.887753 20.054123 19.987305 22.830213 31.830607