Assists creation of predicted value curves for regression models.

Creates a predicted value plot that includes a separate predicted value line for each value of a focal variable. The x axis variable is specified by the plotx argument. As of rockchalk 1.7.x, the moderator argument, modx, is optional. Think of this a new version of R's termplot, but it allows for interactions. And it handles some nonlinear transformations more gracefully than termplot.

Usage

plotCurves(
  model,
  plotx,
  nx = 40,
  modx,
  plotxRange = NULL,
  n,
  modxVals = NULL,
  interval = c("none", "confidence", "prediction"),
  plotPoints = TRUE,
  plotLegend = TRUE,
  legendTitle = NULL,
  legendPct = TRUE,
  col = c("black", "blue", "darkgreen", "red", "orange", "purple", "green3"),
  llwd = 2,
  opacity = 100,
  envir = environment(formula(model)),
  ...
)

Arguments

model: Required. Fitted regression object. Must have a predict method
plotx: Required. String with name of predictor for the x axis
nx: Number of values of plotx at which to calculate the predicted value. Default = 40.
modx: Optional. String for moderator variable name. May be either numeric or factor.
plotxRange: Optional. If not specified, the observed range of plotx will be used to determine the axis range.
n: Optional. Number of focal values of modx, used by algorithms specified by modxVals; will be ignored if modxVals supplies a vector of focal values.
modxVals: Optional. A vector of focal values for which predicted values are to be plotted. May also be a character string to select an algorithm ("quantile","std.dev." or "table"), or a user-supplied function to select focal values (a new method similar to getFocal). If modx is a factor, currently, the only available algorithm is "table" (see getFocal.factor.
interval: Optional. Intervals provided by the predict.lm may be supplied, either "conf" (95 interval for the estimated conditional mean) or "pred" (95 interval for observed values of y given the rest of the model).
plotPoints: Optional. TRUE or FALSE: Should the plot include the scatterplot points along with the lines.
plotLegend: Optional. TRUE or FALSE: Should the default legend be included?
legendTitle: Optional. You'll get an automatically generated title, such as "Moderator: modx", but if you don't like that, specify your own string here.
legendPct: Default = TRUE. Variable labels print with sample percentages.
col: I offer my preferred color vector as default. Replace if you like. User may supply a vector of valid color names, or rainbow(10) or gray.colors(5). Color names will be recycled if there are more focal values of modx than colors provided.
llwd: Optional. Line widths for predicted values. Can be single value or a vector, which will be recycled as necessary.
opacity: Optional, default = 100. A number between 1 and 255. 1 means "transparent" or invisible, 255 means very dark. the darkness of confidence interval regions
envir: environment to search for variables.
...: further arguments that are passed to plot or predict. The arguments that are monitored to be sent to predict are c("type", "se.fit", "dispersion", "interval", "level", "terms", "na.action").

Value

A plot is created as a side effect, a list is returned including 1) the call, 2) a newdata object that includes information on the curves that were plotted, 3) a vector modxVals, the values for which curves were drawn.

Details

This is similar to plotSlopes, but it accepts regressions in which there are transformed variables, such as "log(x1)". It creates a plot of the predicted dependent variable against one of the numeric predictors, plotx. It draws a predicted value line for each value of modx, a moderator variable. The moderator may be a numeric or categorical moderator variable.

The user may designate which particular values of the moderator are used for calculating the predicted value lines. That is, modxVals = c( 12,22,37) would draw lines for values 12, 22, and 37 of the moderator. User may instead supply a character string to choose one of the built in algorithms. The default algorithm is "quantile", which will select n values that are evenly spaced along the modx axis. The algorithm "std.dev" will select the mean of modx (m) and then it will select values that step away from the mean in standard deviation sd units. For example, if n = 3, the focal values will m, m - sd, am + sd.

Author

Paul E. Johnson pauljohn@ku.edu

Examples


library(rockchalk)

## Replicate some R classics.  The budworm.lg data from predict.glm
## will work properly after re-formatting the information as a data.frame:

## example from Venables and Ripley (2002, pp. 190-2.)
df <- data.frame(ldose = rep(0:5, 2),
                 sex = factor(rep(c("M", "F"), c(6, 6))),
                 SF.numdead = c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16))
df$SF.numalive = 20 - df$SF.numdead

budworm.lg <- glm(cbind(SF.numdead, SF.numalive) ~ sex*ldose, data = df,
                  family = binomial)


plotCurves(budworm.lg, plotx = "ldose", modx = "sex", interval = "confidence",
           ylim = c(0, 1))
#> Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
#>   Consider formula(paste(x, collapse = " ")) instead.
#> rockchalk:::predCI: model's predict method does not return an interval.
#>  We will improvize with a Wald type approximation to the confidence interval 
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf


## See infert
model2 <- glm(case ~ age + parity + education + spontaneous + induced,
              data = infert, family = binomial())

## Curvature so slight we can barely see it
model2pc1 <- plotCurves(model2, plotx = "age", modx = "education",
                        interval = "confidence", ylim = c(0, 1))
#> rockchalk:::predCI: model's predict method does not return an interval.
#>  We will improvize with a Wald type approximation to the confidence interval 



model2pc2 <- plotCurves(model2, plotx = "age", modx = "education",
                        modxVals = levels(infert$education)[1],
                        interval = "confidence", ylim = c(0, 1))
#> rockchalk:::predCI: model's predict method does not return an interval.
#>  We will improvize with a Wald type approximation to the confidence interval 



model2pc2 <- plotCurves(model2, plotx = "age", modx = "education",
                        modxVals = levels(infert$education)[c(2,3)],
                        interval = "confidence", ylim = c(0, 1))
#> rockchalk:::predCI: model's predict method does not return an interval.
#>  We will improvize with a Wald type approximation to the confidence interval 


model2pc2 <- plotCurves(model2, plotx = "age", modx = "education",
                        modxVals = levels(infert$education)[c(2,3)],
                        ylim = c(0, 1), type = "response")






## Manufacture some data
set.seed(12345)
N <- 500
dat <- genCorrelatedData2(N = 500, means = c(5, 0, 0, 0), sds = rep(1, 4),
                          rho = 0.2, beta = rep(1, 5), stde = 20)
#> [1] "The equation that was calculated was"
#> y = 1 + 1*x1 + 1*x2 + 1*x3 + 1*x4 
#>  + 0*x1*x1 + 0*x2*x1 + 0*x3*x1 + 0*x4*x1 
#>  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 + 0*x4*x2 
#>  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 + 0*x4*x3 
#>  + 0*x1*x4 + 0*x2*x4 + 0*x3*x4 + 0*x4*x4 
#>  + N(0,20) random error 

dat$xcat1 <- gl(2, N/2, labels = c("Monster", "Human"))
dat$xcat2 <- cut(rnorm(N), breaks = c(-Inf, 0, 0.4, 0.9, 1, Inf),
             labels = c("R", "M", "D", "P", "G"))

###The design matrix for categorical variables, xcat numeric
dat$xcat1n <- with(dat, contrasts(xcat1)[xcat1, ])
dat$xcat2n <- with(dat, contrasts(xcat2)[xcat2, ])


stde <- 2
dat$y <- with(dat, 0.03 + 11.5 * log(x1) * contrasts(dat$xcat1)[dat$xcat1] +
              0.1 * x2 + 0.04 * x2^2 + stde*rnorm(N))

stde <- 1
dat$y2 <- with(dat, 0.03 + 0.1 * x1 + 0.1 * x2 + 0.25 * x1 * x2 + 0.4 * x3 -
               0.1 * x4 + stde * rnorm(N))
stde <- 8
dat$y3 <- with(dat, 3 + 0.5 * x1 + 1.2 * (as.numeric(xcat1)-1) +
-0.8 * (as.numeric(xcat1)-1) * x1 +  stde * rnorm(N))

stde <- 8
dat$y4 <- with(dat, 3 + 0.5 * x1 +
               contrasts(dat$xcat2)[dat$xcat2, ] %*% c(0.1, -0.2, 0.3, 0.05)  +
               stde * rnorm(N))


## Curvature with interaction
m1 <- lm(y ~ log(x1)*xcat1 + x2 + I(x2^2), data=dat)
summary(m1)
#> 
#> Call:
#> lm(formula = y ~ log(x1) * xcat1 + x2 + I(x2^2), data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.2368 -1.2216 -0.0446  1.2399  4.9315 
#> 
#> Coefficients:
#>                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)         2.67164    0.91475   2.921  0.00365 ** 
#> log(x1)            -1.70802    0.56807  -3.007  0.00278 ** 
#> xcat1Human         -2.51612    1.29846  -1.938  0.05322 .  
#> x2                  0.19792    0.08673   2.282  0.02290 *  
#> I(x2^2)            -0.04577    0.06426  -0.712  0.47670    
#> log(x1):xcat1Human 13.21959    0.81000  16.320  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.857 on 494 degrees of freedom
#> Multiple R-squared:  0.9626,	Adjusted R-squared:  0.9622 
#> F-statistic:  2542 on 5 and 494 DF,  p-value: < 2.2e-16
#> 

## First, with no moderator
plotCurves(m1, plotx = "x1")


plotCurves(m1, plotx = "x1", modx = "xcat1")


## ## Verify that plot by comparing against a manually contructed alternative
## par(mfrow=c(1,2))
## plotCurves(m1, plotx = "x1", modx = "xcat1")
## newdat <- with(dat, expand.grid(x1 = plotSeq(x1, 30), xcat1 = levels(xcat1)))
## newdat$x2 <-  with(dat, mean(x2, na.rm = TRUE))
## newdat$m1p <- predict(m1, newdata = newdat)
## plot( y ~ x1, data = dat, type = "n", ylim = magRange(dat$y, c(1, 1.2)))
## points( y ~ x1, data = dat, col = dat$xcat1, cex = 0.4, lwd = 0.5)
## by(newdat, newdat$xcat1, function(dd) {lines(dd$x1, dd$m1p)})
## legend("topleft", legend=levels(dat$xcat1), col = as.numeric(dat$xcat1), lty = 1)
## par(mfrow = c(1,1))
## ##Close enough!


plotCurves(m1, plotx = "x2", modx = "x1")


plotCurves(m1, plotx = "x2", modx = "xcat1")


plotCurves(m1, plotx = "x2", modx = "xcat1", interval = "conf")



m2 <- lm(y ~ log(x1)*xcat1 + xcat1*(x2 + I(x2^2)), data = dat)
summary(m2)
#> 
#> Call:
#> lm(formula = y ~ log(x1) * xcat1 + xcat1 * (x2 + I(x2^2)), data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.4211 -1.1936 -0.0554  1.2288  4.9131 
#> 
#> Coefficients:
#>                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)         3.00314    0.92528   3.246  0.00125 ** 
#> log(x1)            -1.87872    0.57457  -3.270  0.00115 ** 
#> xcat1Human         -3.16751    1.32882  -2.384  0.01752 *  
#> x2                  0.37497    0.12205   3.072  0.00224 ** 
#> I(x2^2)            -0.09868    0.08742  -1.129  0.25951    
#> log(x1):xcat1Human 13.56145    0.82871  16.365  < 2e-16 ***
#> xcat1Human:x2      -0.34460    0.17349  -1.986  0.04756 *  
#> xcat1Human:I(x2^2)  0.09449    0.12889   0.733  0.46383    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.851 on 492 degrees of freedom
#> Multiple R-squared:  0.9629,	Adjusted R-squared:  0.9624 
#> F-statistic:  1827 on 7 and 492 DF,  p-value: < 2.2e-16
#> 
plotCurves(m2, plotx = "x2", modx = "xcat1")


plotCurves(m2, plotx  ="x2", modx = "x1")



m3a <- lm(y ~ poly(x2, 2) + xcat1, data = dat)

plotCurves(m3a, plotx = "x2")

plotCurves(m3a, plotx = "x2", modx = "xcat1")

#OK

m4 <- lm(log(y+10) ~ poly(x2, 2)*xcat1 + x1, data = dat)
summary(m4)
#> 
#> Call:
#> lm(formula = log(y + 10) ~ poly(x2, 2) * xcat1 + x1, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.95025 -0.07313  0.00750  0.09019  0.43193 
#> 
#> Coefficients:
#>                          Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)              2.166865   0.040326  53.734   <2e-16 ***
#> poly(x2, 2)1             0.484173   0.239110   2.025   0.0434 *  
#> poly(x2, 2)2            -0.540927   0.226887  -2.384   0.0175 *  
#> xcat1Human               1.069861   0.014881  71.895   <2e-16 ***
#> x1                       0.020319   0.007702   2.638   0.0086 ** 
#> poly(x2, 2)1:xcat1Human -0.152951   0.333419  -0.459   0.6466    
#> poly(x2, 2)2:xcat1Human  0.587847   0.334495   1.757   0.0795 .  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.1662 on 493 degrees of freedom
#> Multiple R-squared:  0.9131,	Adjusted R-squared:  0.9121 
#> F-statistic: 863.5 on 6 and 493 DF,  p-value: < 2.2e-16
#> 
plotCurves(m4, plotx = "x2")


plotCurves(m4, plotx  ="x2", modx = "xcat1")


plotCurves(m4, plotx = "x2", modx = "x1")


plotCurves(m4, plotx = "x2", modx = "xcat1")


plotCurves(m4, plotx = "x2", modx = "xcat1", modxVals = c("Monster"))



##ordinary interaction
m5 <- lm(y2 ~ x1*x2 + x3 +x4, data = dat)
summary(m5)
#> 
#> Call:
#> lm(formula = y2 ~ x1 * x2 + x3 + x4, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.3410 -0.6731 -0.0192  0.6530  2.4076 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.14712    0.23815  -0.618  0.53701    
#> x1           0.14875    0.04647   3.201  0.00146 ** 
#> x2           0.19880    0.23772   0.836  0.40340    
#> x3           0.38256    0.04403   8.688  < 2e-16 ***
#> x4          -0.03070    0.04479  -0.685  0.49343    
#> x1:x2        0.21975    0.04578   4.800 2.11e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.966 on 494 degrees of freedom
#> Multiple R-squared:  0.7027,	Adjusted R-squared:  0.6996 
#> F-statistic: 233.5 on 5 and 494 DF,  p-value: < 2.2e-16
#> 
plotCurves(m5, plotx = "x1", modx = "x2")

plotCurves(m5, plotx = "x1", modx = "x2", modxVals = c( -2, -1, 0, 1, 2))

plotCurves(m5, plotx = "x1", modx = "x2", modxVals = c(-2))

plotCurves(m5, plotx = "x1", modx = "x2", modxVals = "std.dev.")

plotCurves(m5, plotx = "x1", modx = "x2", modxVals = "quantile")

plotCurves(m5, plotx = "x3", modx = "x2")



if(require(carData)){
    mc1 <- lm(statusquo ~ income * sex, data = Chile)
    summary(mc1)
    plotCurves(mc1, plotx = "income")
    plotCurves(mc1, modx = "sex", plotx = "income")
    plotCurves(mc1, modx = "sex", plotx = "income", modxVals = "M")
    
    mc2 <- lm(statusquo ~ region * income, data =  Chile)
    summary(mc2)
    plotCurves(mc2, modx = "region", plotx = "income")
    plotCurves(mc2, modx = "region", plotx = "income",
               modxVals = levels(Chile$region)[c(1,4)])
    plotCurves(mc2, modx = "region", plotx = "income", modxVals = c("S","M","SA"))
    plotCurves(mc2, modx = "region", plotx = "income", modxVals = c("S","M","SA"),
               interval = "conf")
    
    plotCurves(mc2, modx = "region", plotx = "income", plotPoints = FALSE)
 
    mc3 <- lm(statusquo ~ region * income + sex + age, data =  Chile)
    summary(mc3)
    plotCurves(mc3, modx = "region", plotx = "income")
 
    mc4 <- lm(statusquo ~ income * (age + I(age^2)) + education + sex + age, data = Chile)
    summary(mc4)
    plotCurves(mc4, plotx = "age")
    plotCurves(mc4, plotx = "age", interval = "conf")
     
    plotCurves(mc4, plotx = "age", modx = "income")
    plotCurves(mc4, plotx = "age", modx = "income", plotPoints = FALSE)
    
    plotCurves(mc4,  plotx = "income", modx = "age")
    plotCurves(mc4,  plotx = "income", modx = "age", n = 8)
    
    plotCurves(mc4,  plotx = "income", modx = "age", modxVals = "std.dev.")
    plotCurves(mc4, modx = "income", plotx = "age", plotPoints = FALSE)
}
#> Loading required package: carData