Plot Effects of Variables Estimated by a Regression Model Fit
plot.Predict.RdUses lattice graphics to plot the effect of one or two predictors
on the linear predictor or X beta scale, or on some transformation of
that scale. The first argument specifies the result of the
Predict function. The predictor is always plotted in its
original coding. plot.Predict uses the
xYplot function unless formula is omitted and the x-axis
variable is a factor, in which case it reverses the x- and y-axes and
uses the Dotplot function.
If data is given, a rug plot is drawn showing
the location/density of data values for the \(x\)-axis variable. If
there is a groups (superposition) variable that generated separate
curves, the data density specific to each class of points is shown.
This assumes that the second variable was a factor variable. The rug plots
are drawn by scat1d. When the same predictor is used on all
\(x\)-axes, and multiple panels are drawn, you can use
subdata to specify an expression to subset according to other
criteria in addition.
To plot effects instead of estimates (e.g., treatment differences as a
function of interacting factors) see contrast.rms and
summary.rms.
pantext creates a lattice panel function for including
text such as that produced by print.anova.rms inside a panel or
in a base graphic.
Usage
# S3 method for class 'Predict'
plot(x, formula, groups=NULL,
cond=NULL, varypred=FALSE, subset,
xlim, ylim, xlab, ylab,
data=NULL, subdata, anova=NULL, pval=FALSE, cex.anova=.85,
col.fill=gray(seq(.825, .55, length=5)),
adj.subtitle, cex.adj, cex.axis, perim=NULL, digits=4, nlevels=3,
nlines=FALSE, addpanel, scat1d.opts=list(frac=0.025, lwd=0.3),
type=NULL, yscale=NULL, scaletrans=function(z) z, ...)
pantext(object, x, y, cex=.5, adj=0, fontfamily="Courier", lattice=TRUE)Arguments
- x
a data frame created by
Predict, or forpantextthe x-coordinate for text- formula
the right hand side of a
latticeformula reference variables in data framex. You may not specifyformulaif you varied multiple predictors separately when callingPredict. Otherwise, whenformulais not given,plot.Predictconstructs one from information inx.- groups
an optional name of one of the variables in
xthat is to be used as a grouping (superpositioning) variable. Note thatgroupsdoes not contain the groups data as is customary inlattice; it is only a single character string specifying the name of the grouping variable.- cond
when plotting effects of different predictors,
condis a character string that specifies a single variable name inxthat can be used to form panels. Only applies if usingrbindto combine severalPredictresults.- varypred
set to
TRUEifxis the result of passing multiplePredictresults, that represent different predictors, torbind.Predict. This will cause the.set.variable created byrbindto be copied to the.predictor.variable.- subset
a subsetting expression for restricting the rows of
xthat are used in plotting. For example, predictions may have been requested for males and females but one wants to plot only females.- xlim
This parameter is seldom used, as limits are usually controlled with
Predict. One reason to usexlimis to plot afactorvariable on the x-axis that was created with thecut2function with thelevels.meanoption, withval.lev=TRUEspecified toplot.Predict. In this case you may want the axis to have the range of the original variable values given tocut2rather than the range of the means within quantile groups.- ylim
Range for plotting on response variable axis. Computed by default.
- xlab
Label for
x-axis. Default is one given toasis, rcs, etc., which may have been the"label"attribute of the variable.- ylab
Label for
y-axis. Iffunis not given, default is"log Odds"forlrm,"log Relative Hazard"forcph, name of the response variable forols,TRUEorlog(TRUE)forpsm, or"X * Beta"otherwise.- data
a data frame containing the original raw data on which the regression model were based, or at least containing the \(x\)-axis and grouping variable. If
datais present and contains the needed variables, the original data are added to the graph in the form of a rug plot usingscat1d.- subdata
if
datais specified, an expression to be evaluated in thedataenvironment that evaluates to a logical vector specifying which observations indatato keep. This will be intersected with the criterion for thegroupsvariable. Example: if conditioning on two paneling variables using|a*byou can specifysubdata=b==levels(b)[which.packet()[2]], where the2comes from the fact thatbwas listed second after the vertical bar (this assumesbis afactorindata. Another example:subdata=sex==c('male','female')[current.row()].- anova
an object returned by
anova.rms. Ifanovais specified, the overall test of association for predictor plotted is added as text to each panel, located at the spot at which the panel is most empty unless there is significant empty space at the top or bottom of the panel; these areas are given preference.- pval
specify
pval=TRUEforanovato include not only the test statistic but also the P-value- cex.anova
character size for the test statistic printed on the panel
- col.fill
a vector of colors used to fill confidence bands for successive superposed groups. Default is inceasingly dark gray scale.
- adj.subtitle
Set to
FALSEto suppress subtitling the graph with the list of settings of non-graphed adjustment values.- cex.adj
cexparameter for size of adjustment settings in subtitles. Default is 0.75 timespar("cex").- cex.axis
cexparameter for x-axis tick labels- perim
perimspecifies a function having two arguments. The first is the vector of values of the first variable that is about to be plotted on the x-axis. The second argument is the single value of the variable representing different curves, for the current curve being plotted. The function's returned value must be a logical vector whose length is the same as that of the first argument, with valuesTRUEif the corresponding point should be plotted for the current curve,FALSEotherwise. See one of the latter examples. If a predictor is not specified toplot,NULLis passed as the second argument toperim, although it makes little sense to useperimwhen the sameperimis used for multiple predictors.- digits
Controls how numeric variables used for panel labels are formatted. The default is 4 significant digits.
- nlevels
when
groupsandformulaare not specified, if any panel variable hasnlevelsor fewer values, that variable is converted to agroups(superpositioning) variable. Setnlevels=0to prevent this behavior. For other situations, a numeric x-axis variable withnlevelsor fewer unique values will cause a dot plot to be drawn instead of an x-y plot.- nlines
If
formulais given, you can setnlinestoTRUEto convert the x-axis variable to a factor and then to an integer. Points are plotted at integer values on the x-axis but labeled with category levels. Points are connected by lines.- addpanel
an additional panel function to call along with panel functions used for
xYplotandDotplotdisplays- scat1d.opts
a list containing named elements that specifies parameters to
scat1dwhendatais given. Thecolparameter is usually derived from other plotting information and not specified by the user.- type
a value (
"l","p","b") to override default choices related to showing or connecting points. Especially useful for discrete x coordinate variables.- yscale
a
latticescalelistfor they-axis to be added to what is automatically generated for thex-axis. Example:yscale=list(at=c(.005,.01,.05),labels=format(c(.005,.01,.05))). See xyplot- scaletrans
a function that operates on the
scaleobject created byplot.Predictto produce a modifiedscaleobject that is passed to the lattice graphics function. This is useful for adding otherscalesoptions or for changing thex-axis limits for one predictor.- ...
extra arguments to pass to
xYplotorDotplot. Some useful ones arelabel.curvesandabline. Setlabel.curvestoFALSEto suppress labeling of separate curves. Default isTRUE, which causeslabcurveto be invoked to place labels at positions where the curves are most separated, labeling each curve with the full curve label. Setlabel.curvesto alistto specify options tolabcurve, e.g.,label.curves=list(method="arrow", cex=.8). These option names may be abbreviated in the usual way arguments are abbreviated. Use for examplelabel.curves=list(keys=letters[1:5])to draw single lower case letters on 5 curves where they are most separated, and automatically position a legend in the most empty part of the plot. Thecol,lty, andlwdparameters are passed automatically tolabcurvealthough they may be overridden here. It is also useful to use ... to passlatticegraphics parameters, e.g.par.settings=list(axis.text=list(cex=1.2), par.ylab.text=list(col='blue',cex=.9),par.xlab.text=list(cex=1)).- object
an object having a
printmethod- y
y-coordinate for placing text in a
latticepanel or on a base graphics plot- cex
character expansion size for
pantext- adj
text justification. Default is left justified.
- fontfamily
font family for
pantext. Default is"Courier"which will line up columns of a table.- lattice
set to
FALSEto usetextinstead ofltextin the function generated bypantext, to use base graphics
Details
When a groups (superpositioning) variable was used, you can issue
the command Key(...) after printing the result of
plot.Predict, to draw a key for the groups.
References
Fox J, Hong J (2009): Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. J Stat Software 32 No. 1.
Note
If plotting the effects of all predictors you can reorder the
panels using for example p <- Predict(fit); p$.predictor. <-
factor(p$.predictor., v) where v is a vector of predictor
names specified in the desired order.
See also
Predict, ggplot.Predict,
link{plotp.Predict}, rbind.Predict,
datadist, predictrms, anova.rms,
contrast.rms, summary.rms,
rms, rmsMisc,
labcurve, scat1d,
xYplot, Overview
Examples
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age' # label is in Hmisc
label(cholesterol) <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')
fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
x=TRUE, y=TRUE)
#> Error in Design(data, formula = formula): dataset ddist not found for options(datadist=)
an <- anova(fit)
#> Error: object 'fit' not found
# Plot effects of all 4 predictors with test statistics from anova, and P
plot(Predict(fit), anova=an, pval=TRUE)
#> Error: object 'fit' not found
plot(Predict(fit), data=llist(blood.pressure,age))
#> Error: object 'fit' not found
# rug plot for two of the predictors
p <- Predict(fit, name=c('age','cholesterol')) # Make 2 plots
#> Error: object 'fit' not found
plot(p)
#> Error: object 'p' not found
p <- Predict(fit, age=seq(20,80,length=100), sex, conf.int=FALSE)
#> Error: object 'fit' not found
# Plot relationship between age and log
# odds, separate curve for each sex,
plot(p, subset=sex=='female' | age > 30)
#> Error: object 'p' not found
# No confidence interval, suppress estimates for males <= 30
p <- Predict(fit, age, sex)
#> Error: object 'fit' not found
plot(p, label.curves=FALSE, data=llist(age,sex))
#> Error: object 'p' not found
# use label.curves=list(keys=c('a','b'))'
# to use 1-letter abbreviations
# data= allows rug plots (1-dimensional scatterplots)
# on each sex's curve, with sex-
# specific density of age
# If data were in data frame could have used that
p <- Predict(fit, age=seq(20,80,length=100), sex='male', fun=plogis)
#> Error: object 'fit' not found
# works if datadist not used
plot(p, ylab=expression(hat(P)))
#> Error: object 'p' not found
# plot predicted probability in place of log odds
per <- function(x, y) x >= 30
plot(p, perim=per) # suppress output for age < 30 but leave scale alone
#> Error: object 'p' not found
# Take charge of the plot setup by specifying a lattice formula
p <- Predict(fit, age, blood.pressure=c(120,140,160),
cholesterol=c(180,200,215), sex)
#> Error: object 'fit' not found
plot(p, ~ age | blood.pressure*cholesterol, subset=sex=='male')
#> Error: object 'p' not found
# plot(p, ~ age | cholesterol*blood.pressure, subset=sex=='female')
# plot(p, ~ blood.pressure|cholesterol*round(age,-1), subset=sex=='male')
plot(p)
#> Error: object 'p' not found
# Plot the age effect as an odds ratio
# comparing the age shown on the x-axis to age=30 years
ddist$limits$age[2] <- 30 # make 30 the reference value for age
# Could also do: ddist$limits["Adjust to","age"] <- 30
fit <- update(fit) # make new reference value take effect
#> Error: object 'fit' not found
p <- Predict(fit, age, ref.zero=TRUE, fun=exp)
#> Error: object 'fit' not found
plot(p, ylab='Age=x:Age=30 Odds Ratio',
abline=list(list(h=1, lty=2, col=2), list(v=30, lty=2, col=2)))
#> Error: object 'p' not found
# Compute predictions for three predictors, with superpositioning or
# conditioning on sex, combined into one graph
p1 <- Predict(fit, age, sex)
#> Error: object 'fit' not found
p2 <- Predict(fit, cholesterol, sex)
#> Error: object 'fit' not found
p3 <- Predict(fit, blood.pressure, sex)
#> Error: object 'fit' not found
p <- rbind(age=p1, cholesterol=p2, blood.pressure=p3)
#> Error: object 'p1' not found
plot(p, groups='sex', varypred=TRUE, adj.subtitle=FALSE)
#> Error: object 'p' not found
plot(p, cond='sex', varypred=TRUE, adj.subtitle=FALSE)
#> Error: object 'p' not found
if (FALSE) { # \dontrun{
# For males at the median blood pressure and cholesterol, plot 3 types
# of confidence intervals for the probability on one plot, for varying age
ages <- seq(20, 80, length=100)
p1 <- Predict(fit, age=ages, sex='male', fun=plogis) # standard pointwise
p2 <- Predict(fit, age=ages, sex='male', fun=plogis,
conf.type='simultaneous') # simultaneous
p3 <- Predict(fit, age=c(60,65,70), sex='male', fun=plogis,
conf.type='simultaneous') # simultaneous 3 pts
# The previous only adjusts for a multiplicity of 3 points instead of 100
f <- update(fit, x=TRUE, y=TRUE)
g <- bootcov(f, B=500, coef.reps=TRUE)
p4 <- Predict(g, age=ages, sex='male', fun=plogis) # bootstrap percentile
p <- rbind(Pointwise=p1, 'Simultaneous 100 ages'=p2,
'Simultaneous 3 ages'=p3, 'Bootstrap nonparametric'=p4)
xYplot(Cbind(yhat, lower, upper) ~ age, groups=.set.,
data=p, type='l', method='bands', label.curve=list(keys='lines'))
} # }
# Plots for a parametric survival model
require(survival)
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
ddist <- datadist(age, sex)
Srv <- Surv(t,e)
# Fit log-normal survival model and plot median survival time vs. age
f <- psm(Srv ~ rcs(age), dist='lognormal')
#> number of knots in rcs defaulting to 5
#> Error in Design(m, formula = formula, specials = c("strata", "cluster")): dataset ddist not found for options(datadist=)
med <- Quantile(f) # Creates function to compute quantiles
#> Error: object 'f' not found
# (median by default)
p <- Predict(f, age, fun=function(x) med(lp=x))
#> Error: object 'f' not found
plot(p, ylab="Median Survival Time")
#> Error: object 'p' not found
# Note: confidence intervals from this method are approximate since
# they don't take into account estimation of scale parameter
# Fit an ols model to log(y) and plot the relationship between x1
# and the predicted mean(y) on the original scale without assuming
# normality of residuals; use the smearing estimator
# See help file for rbind.Predict for a method of showing two
# types of confidence intervals simultaneously.
set.seed(1)
x1 <- runif(300)
x2 <- runif(300)
ddist <- datadist(x1,x2)
y <- exp(x1+x2-1+rnorm(300))
f <- ols(log(y) ~ pol(x1,2)+x2)
#> Error in Design(X, formula = formula): dataset ddist not found for options(datadist=)
r <- resid(f)
#> Error: object 'f' not found
smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean')
formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)])
#> Error: object 'r' not found
#smean$res <- r[!is.na(r)] # define default res argument to function
plot(Predict(f, x1, fun=smean), ylab='Predicted Mean on y-scale')
#> Error: object 'f' not found
# Make an 'interaction plot', forcing the x-axis variable to be
# plotted at integer values but labeled with category levels
n <- 100
set.seed(1)
gender <- c(rep('male', n), rep('female',n))
m <- sample(c('a','b'), 2*n, TRUE)
d <- datadist(gender, m); options(datadist='d')
anxiety <- runif(2*n) + .2*(gender=='female') + .4*(gender=='female' & m=='b')
tapply(anxiety, llist(gender,m), mean)
#> m
#> gender a b
#> female 0.68977 1.11430
#> male 0.39019 0.48358
f <- ols(anxiety ~ gender*m)
#> Error in Design(X, formula = formula): dataset d not found for options(datadist=)
p <- Predict(f, gender, m)
#> Error: object 'f' not found
plot(p) # horizontal dot chart; usually preferred for categorical predictors
#> Error: object 'p' not found
Key(.5, .5)
plot(p, ~gender, groups='m', nlines=TRUE)
#> Error: object 'p' not found
plot(p, ~m, groups='gender', nlines=TRUE)
#> Error: object 'p' not found
plot(p, ~gender|m, nlines=TRUE)
#> Error: object 'p' not found
options(datadist=NULL)
if (FALSE) { # \dontrun{
# Example in which separate curves are shown for 4 income values
# For each curve the estimated percentage of voters voting for
# the democratic party is plotted against the percent of voters
# who graduated from college. Data are county-level percents.
incomes <- seq(22900, 32800, length=4)
# equally spaced to outer quintiles
p <- Predict(f, college, income=incomes, conf.int=FALSE)
plot(p, xlim=c(0,35), ylim=c(30,55))
# Erase end portions of each curve where there are fewer than 10 counties having
# percent of college graduates to the left of the x-coordinate being plotted,
# for the subset of counties having median family income with 1650
# of the target income for the curve
show.pts <- function(college.pts, income.pt) {
s <- abs(income - income.pt) < 1650 #assumes income known to top frame
x <- college[s]
x <- sort(x[!is.na(x)])
n <- length(x)
low <- x[10]; high <- x[n-9]
college.pts >= low & college.pts <= high
}
plot(p, xlim=c(0,35), ylim=c(30,55), perim=show.pts)
# Rename variables for better plotting of a long list of predictors
f <- ...
p <- Predict(f)
re <- c(trt='treatment', diabet='diabetes', sbp='systolic blood pressure')
for(n in names(re)) {
names(p)[names(p)==n] <- re[n]
p$.predictor.[p$.predictor.==n] <- re[n]
}
plot(p)
} # }