Predicted Values from Model Fit
predictrms.RdThe predict function is used to obtain a variety of values or
predicted values from either the data used to fit the model (if
type="adjto" or "adjto.data.frame" or if x=TRUE or
linear.predictors=TRUE were specified to the modeling function), or from
a new dataset. Parameters such as knots and factor levels used in creating
the design matrix in the original fit are "remembered".
See the Function function for another method for computing the
linear predictors. predictrms is an internal utility function
that is for the other functions.
Usage
predictrms(fit, newdata=NULL,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean', 'individual', 'simultaneous'),
kint=NULL, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ref.zero=FALSE,
posterior.summary=c('mean', 'median', 'mode'),
second=FALSE, ...)
# S3 method for class 'bj'
predict(object, newdata,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1,
na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # for bj
# S3 method for class 'cph'
predict(object, newdata=NULL,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # cph
# S3 method for class 'Glm'
predict(object, newdata,
type= c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # Glm
# S3 method for class 'Gls'
predict(object, newdata,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # Gls
# S3 method for class 'ols'
predict(object, newdata,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # ols
# S3 method for class 'psm'
predict(object, newdata,
type=c("lp", "x", "data.frame", "terms", "cterms", "ccterms",
"adjto", "adjto.data.frame", "model.frame"),
se.fit=FALSE, conf.int=FALSE,
conf.type=c('mean','individual','simultaneous'),
kint=1, na.action=na.keep, expand.na=TRUE,
center.terms=type=="terms", ...) # psmArguments
- object,fit
a fit object with an
rmsfitting function- newdata
An S data frame, list or a matrix specifying new data for which predictions are desired. If
newdatais a list, it is converted to a matrix first. A matrix is converted to a data frame. For the matrix form, categorical variables (catgorstrat) must be coded as integer category numbers corresponding to the order in which value labels were stored. For list or matrix forms,matrxfactors must be given a single value. If this single value is the S missing valueNA, the adjustment values of matrx (the column medians) will later replace this value. If the single value is notNA, it is propagated throughout the columns of thematrxfactor. Forfactorvariables having numeric levels, you can specify the numeric values innewdatawithout first converting the variables to factors. These numeric values are checked to make sure they match a level, then the variable is converted internally to afactor. It is most typical to use a data frame for newdata, and the S functionexpand.gridis very handy here. For example, one may specifynewdata=expand.grid(age=c(10,20,30),race=c("black","white","other"),chol=seq(100,300,by=25)).- type
Type of output desired. The default is
"lp"to get the linear predictors - predicted \(X\beta\). For Cox models, these predictions are centered. You may specify"x"to get an expanded design matrix at the desired combinations of values,"data.frame"to get an S data frame of the combinations,"model.frame"to get a data frame of the transformed predictors,"terms"to get a matrix with each column being the linear combination of variables making up a factor (with separate terms for interactions),"cterms"("combined terms") to not create separate terms for interactions but to add all interaction terms involving each predictor to the main terms for each predictor,"ccterms"to combine all related terms (related through interactions) and their interactions into a single column,"adjto"to return a vector oflimits[2](seedatadist) in coded form, and"adjto.data.frame"to return a data frame version of these central adjustment values. Use oftype="cterms"does not make sense for astratvariable that does not interact with another variable. Ifnewdatais not given,predictwill attempt to return information stored with the fit object if the appropriate options were used with the modeling function (e.g.,x, y, linear.predictors, se.fit).- se.fit
Defaults to
FALSE. Iftype="linear.predictors", setse.fit=TRUEto return a list with componentslinear.predictorsandse.fitinstead of just a vector of fitted values. For Cox model fits, standard errors of linear predictors are computed after subtracting the original column means from the new design matrix.- conf.int
Specify
conf.intas a positive fraction to obtain upper and lower confidence intervals (e.g.,conf.int=0.95). The \(t\)-distribution is used in the calculation forolsfits. Otherwise, the normal critical value is used. For Bayesian modelsconf.intis the highest posterior density interval probability.- conf.type
specifies the type of confidence interval. Default is for the mean. For
olsfits there is the option of obtaining confidence limits for individual predicted values by specifyingconf.type="individual".- posterior.summary
when making predictions from a Bayesian model, specifies whether you want the linear predictor to be computed from the posterior mean of parameters (default) or the posterior mode or median median
- second
set to
TRUEto use the model's second formula. At present this pertains only to a partial proportional odds model fitted using theblrmfunction. Whensecond=TRUEandtype='x'the Z design matrix is returned (that goes with the tau parameters in the partial PO model). Whentype='lp'is specified Z*tau is computed. In neither case is the result is multiplied by the by thecppofunction.- kint
a single integer specifying the number of the intercept to use in multiple-intercept models. The default is 1 for
lrmand the reference median intercept forormandblrm. For a partial PO model,kintshould correspond to the response variable value that will be used when dealing withsecond=TRUE.- na.action
Function to handle missing values in
newdata. For predictions "in data", the samena.actionthat was used during model fitting is used to define annaresidfunction to possibly restore rows of the data matrix that were deleted due to NAs. For predictions "out of data", the defaultna.actionisna.keep, resulting in NA predictions when a row ofnewdatahas an NA. Whateverna.actionis in effect at the time for "out of data" predictions, the correspondingnaresidis used also.- expand.na
set to
FALSEto keep thenaresidfrom having any effect, i.e., to keep from adding back observations removed because of NAs in the returned object. Ifexpand.na=FALSE, thena.actionattribute will be added to the returned object.- center.terms
set to
FALSEto suppress subtracting adjust-to values from columns of the design matrix before computing terms withtype="terms".- ref.zero
Set to
TRUEto subtract a constant from \(X\beta\) before plotting so that the reference value of thex-variable yieldsy=0. This is done before applying functionfun. This is especially useful for Cox models to make the hazard ratio be 1.0 at reference values, and the confidence interval have width zero.- ...
ignored
Details
datadist and options(datadist=) should be run before predictrms
if using type="adjto", type="adjto.data.frame", or type="terms",
or if the fit is a Cox model fit and you are requesting se.fit=TRUE.
For these cases, the adjustment values are needed (either for the
returned result or for the correct covariance matrix computation).
Examples
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
treat <- factor(sample(c('a','b','c'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) +
.3*sqrt(blood.pressure-60)-2.3 + 1*(treat=='b')
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
ddist <- datadist(age, blood.pressure, cholesterol, sex, treat)
options(datadist='ddist')
fit <- lrm(y ~ rcs(blood.pressure,4) +
sex * (age + rcs(cholesterol,4)) + sex*treat*age)
#> Error in Design(data, formula = formula): dataset ddist not found for options(datadist=)
# Use xYplot to display predictions in 9 panels, with error bars,
# with superposition of two treatments
dat <- expand.grid(treat=levels(treat),sex=levels(sex),
age=c(20,40,60),blood.pressure=120,
cholesterol=seq(100,300,length=10))
# Add variables linear.predictors and se.fit to dat
dat <- cbind(dat, predict(fit, dat, se.fit=TRUE))
#> Error: object 'fit' not found
# This is much easier with Predict
# xYplot in Hmisc extends xyplot to allow error bars
xYplot(Cbind(linear.predictors,linear.predictors-1.96*se.fit,
linear.predictors+1.96*se.fit) ~ cholesterol | sex*age,
groups=treat, data=dat, type='b')
#> Error in eval(parse(text = yvname), data): object 'linear.predictors' not found
# Since blood.pressure doesn't interact with anything, we can quickly and
# interactively try various transformations of blood.pressure, taking
# the fitted spline function as the gold standard. We are seeking a
# linearizing transformation even though this may lead to falsely
# narrow confidence intervals if we use this data-dredging-based transformation
bp <- 70:160
logit <- predict(fit, expand.grid(treat="a", sex='male', age=median(age),
cholesterol=median(cholesterol),
blood.pressure=bp), type="terms")[,"blood.pressure"]
#> Error: object 'fit' not found
#Note: if age interacted with anything, this would be the age
# "main effect" ignoring interaction terms
#Could also use Predict(f, age=ag)$yhat
#which allows evaluation of the shape for any level of interacting
#factors. When age does not interact with anything, the result from
#predict(f, \dots, type="terms") would equal the result from
#plot if all other terms were ignored
plot(bp^.5, logit) # try square root vs. spline transform.
#> Error: object 'logit' not found
plot(bp^1.5, logit) # try 1.5 power
#> Error: object 'logit' not found
plot(sqrt(bp-60), logit)
#> Error: object 'logit' not found
#Some approaches to making a plot showing how predicted values
#vary with a continuous predictor on the x-axis, with two other
#predictors varying
combos <- gendata(fit, age=seq(10,100,by=10), cholesterol=c(170,200,230),
blood.pressure=c(80,120,160))
#> Error: object 'fit' not found
#treat, sex not specified -> set to mode
#can also used expand.grid
require(lattice)
combos$pred <- predict(fit, combos)
#> Error: object 'fit' not found
xyplot(pred ~ age | cholesterol*blood.pressure, data=combos, type='l')
#> Error: object 'combos' not found
xYplot(pred ~ age | cholesterol, groups=blood.pressure, data=combos, type='l')
#> Error: object 'combos' not found
Key() # Key created by xYplot
xYplot(pred ~ age, groups=interaction(cholesterol,blood.pressure),
data=combos, type='l', lty=1:9)
#> Error: object 'combos' not found
Key()
# Add upper and lower 0.95 confidence limits for individuals
combos <- cbind(combos, predict(fit, combos, conf.int=.95))
#> Error: object 'combos' not found
xYplot(Cbind(linear.predictors, lower, upper) ~ age | cholesterol,
groups=blood.pressure, data=combos, type='b')
#> Error: object 'combos' not found
Key()
# Plot effects of treatments (all pairwise comparisons) vs.
# levels of interacting factors (age, sex)
d <- gendata(fit, treat=levels(treat), sex=levels(sex), age=seq(30,80,by=10))
#> Error: object 'fit' not found
x <- predict(fit, d, type="x")
#> Error: object 'fit' not found
betas <- fit$coef
#> Error: object 'fit' not found
cov <- vcov(fit, intercepts='none')
#> Error: object 'fit' not found
i <- d$treat=="a"; xa <- x[i,]; Sex <- d$sex[i]; Age <- d$age[i]
#> Error: object 'd' not found
i <- d$treat=="b"; xb <- x[i,]
#> Error: object 'd' not found
i <- d$treat=="c"; xc <- x[i,]
#> Error: object 'd' not found
doit <- function(xd, lab) {
xb <- matxv(xd, betas)
se <- apply((xd %*% cov) * xd, 1, sum)^.5
q <- qnorm(1-.01/2) # 0.99 confidence limits
lower <- xb - q * se; upper <- xb + q * se
#Get odds ratios instead of linear effects
xb <- exp(xb); lower <- exp(lower); upper <- exp(upper)
#First elements of these agree with
#summary(fit, age=30, sex='female',conf.int=.99))
for(sx in levels(Sex)) {
j <- Sex==sx
errbar(Age[j], xb[j], upper[j], lower[j], xlab="Age",
ylab=paste(lab, "Odds Ratio"), ylim=c(.1, 20), log='y')
title(paste("Sex:", sx))
abline(h=1, lty=2)
}
}
par(mfrow=c(3,2), oma=c(3,0,3,0))
doit(xb - xa, "b:a")
#> Error in doit(xb - xa, "b:a"): object 'betas' not found
doit(xc - xa, "c:a")
#> Error in doit(xc - xa, "c:a"): object 'betas' not found
doit(xb - xa, "c:b")
#> Error in doit(xb - xa, "c:b"): object 'betas' not found
# NOTE: This is much easier to do using contrast.rms
# Demonstrate type="terms", "cterms", "ccterms"
set.seed(1)
n <- 40
x <- 1:n
w <- factor(sample(c('a', 'b'), n, TRUE))
u <- factor(sample(c('A', 'B'), n, TRUE))
y <- .01*x + .2*(w=='b') + .3*(u=='B') + .2*(w=='b' & u=='B') + rnorm(n)/5
ddist <- datadist(x, w, u)
f <- ols(y ~ x*w*u, x=TRUE, y=TRUE)
#> Error in Design(X, formula = formula): dataset ddist not found for options(datadist=)
f
#> Error: object 'f' not found
anova(f)
#> Error: object 'f' not found
z <- predict(f, type='terms', center.terms=FALSE)
#> Error: object 'f' not found
z[1:5,]
#> Error: object 'z' not found
k <- coef(f)
#> Error: object 'f' not found
## Manually compute combined terms
wb <- w=='b'
uB <- u=='B'
h <- k['x * w=b * u=B']*x*wb*uB
#> Error: object 'k' not found
tx <- k['x'] *x + k['x * w=b']*x*wb + k['x * u=B'] *x*uB + h
#> Error: object 'k' not found
tw <- k['w=b']*wb + k['x * w=b']*x*wb + k['w=b * u=B']*wb*uB + h
#> Error: object 'k' not found
tu <- k['u=B']*uB + k['x * u=B']*x*uB + k['w=b * u=B']*wb*uB + h
#> Error: object 'k' not found
h <- z[,'x * w * u'] # highest order term is present in all cterms
#> Error: object 'z' not found
tx2 <- z[,'x']+z[,'x * w']+z[,'x * u']+h
#> Error: object 'z' not found
tw2 <- z[,'w']+z[,'x * w']+z[,'w * u']+h
#> Error: object 'z' not found
tu2 <- z[,'u']+z[,'x * u']+z[,'w * u']+h
#> Error: object 'z' not found
ae <- function(a, b) all.equal(a, b, check.attributes=FALSE)
ae(tx, tx2)
#> Error: object 'tx' not found
ae(tw, tw2)
#> Error: object 'tw' not found
ae(tu, tu2)
#> Error: object 'tu' not found
zc <- predict(f, type='cterms')
#> Error: object 'f' not found
zc[1:5,]
#> Error: object 'zc' not found
ae(tx, zc[,'x'])
#> Error: object 'tx' not found
ae(tw, zc[,'w'])
#> Error: object 'tw' not found
ae(tu, zc[,'u'])
#> Error: object 'tu' not found
zc <- predict(f, type='ccterms')
#> Error: object 'f' not found
# As all factors are indirectly related, ccterms gives overall linear
# predictor except for the intercept
zc[1:5,]
#> Error: object 'zc' not found
ae(as.vector(zc + coef(f)[1]), f$linear.predictors)
#> Error: object 'zc' not found
if (FALSE) { # \dontrun{
#A variable state.code has levels "1", "5","13"
#Get predictions with or without converting variable in newdata to factor
predict(fit, data.frame(state.code=c(5,13)))
predict(fit, data.frame(state.code=factor(c(5,13))))
#Use gendata function (gendata.rms) for interactive specification of
#predictor variable settings (for 10 observations)
df <- gendata(fit, nobs=10, viewvals=TRUE)
df$predicted <- predict(fit, df) # add variable to data frame
df
df <- gendata(fit, age=c(10,20,30)) # leave other variables at ref. vals.
predict(fit, df, type="fitted")
# See reShape (in Hmisc) for an example where predictions corresponding to
# values of one of the varying predictors are reformatted into multiple
# columns of a matrix
} # }
options(datadist=NULL)