Compute Predicted Values and Confidence Limits
Predict.RdPredict allows the user to easily specify which predictors are to
vary. When the vector of values over which a predictor should vary is
not specified, the
range will be all levels of a categorical predictor or equally-spaced
points between the datadist "Low:prediction" and
"High:prediction" values for the variable (datadist by
default uses the 10th smallest and 10th largest predictor values in the
dataset). Predicted values are
the linear predictor (X beta), a user-specified transformation of that
scale, or estimated probability of surviving past a fixed single time
point given the linear predictor. Predict is usually used for
plotting predicted values but there is also a print method.
When the first argument to Predict is a fit object created by
bootcov with coef.reps=TRUE, confidence limits come from
the stored matrix of bootstrap repetitions of coefficients, using
bootstrap percentile nonparametric confidence limits, basic bootstrap,
or BCa limits. Such confidence
intervals do not make distributional assumptions. You can force
Predict to instead use the bootstrap covariance matrix by setting
usebootcoef=FALSE. If coef.reps was FALSE,
usebootcoef=FALSE is the default.
There are ggplot, plotp, and plot methods for
Predict objects that makes it easy to show predicted values and
confidence bands.
The rbind method for Predict objects allows you to create
separate sets of predictions under different situations and to combine
them into one set for feeding to plot.Predict,
ggplot.Predict, or plotp.Predict. For example you
might want to plot confidence intervals for means and for individuals
using ols, and have the two types of confidence bands be
superposed onto one plot or placed into two panels. Another use for
rbind is to combine predictions from quantile regression models
that predicted three different quantiles.
If conf.type="simultaneous", simultaneous (over all requested
predictions) confidence limits are computed. See the
predictrms function for details.
If fun is given, conf.int > 0, the model is not a
Bayesian model, and the bootstrap was not used, fun may return
limits attribute when fun computed its own confidence
limits. These confidence limits will be functions of the design matrix,
not just the linear predictor.
Usage
Predict(object, ..., fun=NULL, funint=TRUE,
type = c("predictions", "model.frame", "x"),
np = 200, conf.int = 0.95,
conf.type = c("mean", "individual","simultaneous"),
usebootcoef=TRUE, boot.type=c("percentile", "bca", "basic"),
posterior.summary=c('mean', 'median', 'mode'),
adj.zero = FALSE, ref.zero = FALSE,
kint=NULL, ycut=NULL, time = NULL, loglog = FALSE, digits=4, name,
factors=NULL, offset=NULL)
# S3 method for class 'Predict'
print(x, ...)
# S3 method for class 'Predict'
rbind(..., rename)Arguments
- object
an
rmsfit object, or forprintthe result ofPredict.options(datadist="d")must have been specified (wheredwas created bydatadist), or it must have been in effect when the the model was fitted.- ...
One or more variables to vary, or single-valued adjustment values. Specify a variable name without an equal sign to use the default display range, or any range you choose (e.g.
seq(0,100,by=2),c(2,3,7,14)). The default list of values for which predictions are made is taken as the list of unique values of the variable if they number fewer than 11. For variables with \(>10\) unique values,npequally spaced values in the range are used for plotting if the range is not specified. Variables not specified are set to the default adjustment valuelimits[2], i.e. the median for continuous variables and a reference category for non-continuous ones. Later variables define adjustment settings. For categorical variables, specify the class labels in quotes when specifying variable values. If the levels of a categorical variable are numeric, you may omit the quotes. For variables not described usingdatadist, you must specify explicit ranges and adjustment settings for predictors that were in the model. If no variables are specified in ..., predictions will be made by separately varying all predictors in the model over their default range, holding the other predictors at their adjustment values. This has the same effect as specifyingnameas a vector containing all the predictors. Forrbind, ... represents a series of results fromPredict. If you name the results, these names will be taken as the values of the new.set.variable added to the concatenated data frames. See an example below.- fun
an optional transformation of the linear predictor. Specify
fun='mean'if the fit is a proportional odds model fit and you ranbootcovwithcoef.reps=TRUE. This will let the mean function be re-estimated for each bootstrap rep to properly account for all sources of uncertainty in estimating the mean response.funcan be a general function and can compute confidence limits (stored as a list in thelimitsattribute) of the transformed parameters such as means.- funint
set to
FALSEiffunis not a function such as the result ofMean,Quantile, orExProbthat contains aninterceptsargument- type
defaults to providing predictions. Set to
"model.frame"to return a data frame of predictor settings used. Set to"x"to return the corresponding design matrix constructed from the predictor settings.- np
the number of equally-spaced points computed for continuous predictors that vary, i.e., when the specified value is omitted (with the variable name appearing without an equals sign) or is
NA- conf.int
confidence level (highest posterior density interval probability for Bayesian models). Default is 0.95. Specify
FALSEto suppress.- conf.type
type of confidence interval. Default is
"mean"which applies to all models. For models containing a residual variance (e.g,ols), you can specifyconf.type="individual"instead, to obtain limits on the predicted value for an individual subject. Specifyconf.type="simultaneous"to obtain simultaneous confidence bands for mean predictions with family-wise coverage ofconf.int.- usebootcoef
set to
FALSEto force the use of the bootstrap covariance matrix estimator even when bootstrap coefficient reps are present- boot.type
set to
'bca'to compute BCa confidence limits or'basic'to use the basic bootstrap. The default is to compute percentile intervals- posterior.summary
defaults to using the posterior mean of the regression coefficients. Specify
'mode'or'median'to instead use the other summaries.- adj.zero
Set to
TRUEto adjust all non-plotted variables to 0 (or reference cell for categorical variables) and to omit intercept(s) from consideration. Default isFALSE.- ref.zero
Set to
TRUEto subtract a constant from \(X\beta\) before plotting so that the reference value of thex-variable yieldsy=0. This is done before applying functionfun. This is especially useful for Cox models to make the hazard ratio be 1.0 at reference values, and the confidence interval have width zero. To set the reference value, either (a) set the reference value by editing thedatadistobject prior to fitting the model, or (b) if the model is already fit, edit thedatadistobject and then run the update command.- kint
This is only useful in a multiple intercept model such as the ordinal logistic model. There to use to second of three intercepts, for example, specify
kint=2. The default is 1 forlrmand the middle intercept corresponding to the medianyforormorblrm. You can specifyycutinstead, and the intercept corresponding to Y >= ycut will be used forkint.- ycut
for an ordinal model specifies the Y cutoff to use in evaluating departures from proportional odds, when the constrained partial proportional odds model is used. When omitted,
ycutis implied bykint. The only time it is absolutely mandatory to specifyycutis when computed an effect (e.g., odds ratio) at a level of the response variable that did not occur in the data. This would only occur when thecppofunction given toblrmis a continuous function.- time
Specify a single time
uto cause functionsurvestto be invoked to plot the probability of surviving until timeuwhen the fit is fromcphorpsm.- loglog
Specify
loglog=TRUEto plotlog[-log(survival)]instead of survival, whentimeis given.- digits
Controls how “adjust-to” values are plotted. The default is 4 significant digits.
- name
Instead of specifying the variables to vary in the
variables(...) list, you can specify one or more variables by specifying a vector of character string variable names in thenameargument. Using this mode you cannot specify a list of variable values to use; prediction is done as if you had said e.g.agewithout the equal sign. Also, interacting factors can only be set to their reference values using this notation.- factors
an alternate way of specifying ..., mainly for use by
survplotorgendata. This must be a list with one or more values for each variable listed, withNAvalues for default ranges.- offset
a list containing one value for one variable, which is mandatory if the model included an offset term. The variable name must match the innermost variable name in the offset term. The single offset is added to all predicted values.
- x
an object created by
Predict- rename
If you are concatenating predictor sets using
rbindand one or more of the variables were renamed for one or more of the sets, but these new names represent different versions of the same predictors (e.g., using or not using imputation), you can specify a named character vector to rename predictors to a central name. For example, specifyrename=c(age.imputed='age', corrected.bp='bp')to rename from old namesage.imputed, corrected.bptoage, bp. This happens before concatenation of rows.
Details
When there are no intercepts in the fitted model, plot subtracts adjustment values from each factor while computing variances for confidence limits.
Specifying time will not work for Cox models with time-dependent
covariables. Use survest or survfit for that purpose.
Value
a data frame containing all model predictors and the computed values
yhat, lower, upper, the latter two if confidence
intervals were requested. The data frame has an additional
class "Predict". If name is specified or no
predictors are specified in ..., the resulting data frame has an
additional variable called .predictor. specifying which
predictor is currently being varied. .predictor. is handy for
use as a paneling variable in lattice or ggplot2 graphics.
See also
plot.Predict, ggplot.Predict,
plotp.Predict,
datadist, predictrms,
contrast.rms, summary.rms,
rms, rms.trans, survest,
survplot, rmsMisc,
transace, rbind, bootcov,
bootBCa, boot.ci
Examples
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age' # label is in Hmisc
label(cholesterol) <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')
fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)))
#> Error in Design(data, formula = formula): dataset ddist not found for options(datadist=)
Predict(fit, age, cholesterol, np=4)
#> Error: object 'fit' not found
Predict(fit, age=seq(20,80,by=10), sex, conf.int=FALSE)
#> Error: object 'fit' not found
Predict(fit, age=seq(20,80,by=10), sex='male') # works if datadist not used
#> Error: object 'fit' not found
# Get simultaneous confidence limits accounting for making 7 estimates
# Predict(fit, age=seq(20,80,by=10), sex='male', conf.type='simult')
# (this needs the multcomp package)
ddist$limits$age[2] <- 30 # make 30 the reference value for age
# Could also do: ddist$limits["Adjust to","age"] <- 30
fit <- update(fit) # make new reference value take effect
#> Error: object 'fit' not found
Predict(fit, age, ref.zero=TRUE, fun=exp)
#> Error: object 'fit' not found
# Make two curves, and plot the predicted curves as two trellis panels
w <- Predict(fit, age, sex)
#> Error: object 'fit' not found
require(lattice)
#> Loading required package: lattice
xyplot(yhat ~ age | sex, data=w, type='l')
#> Error: object 'w' not found
# To add confidence bands we need to use the Hmisc xYplot function in
# place of xyplot
xYplot(Cbind(yhat,lower,upper) ~ age | sex, data=w,
method='filled bands', type='l', col.fill=gray(.95))
#> Error: object 'w' not found
# If non-displayed variables were in the model, add a subtitle to show
# their settings using title(sub=paste('Adjusted to',attr(w,'info')$adjust),adj=0)
# Easier: feed w into plot.Predict, ggplot.Predict, plotp.Predict
if (FALSE) { # \dontrun{
# Predictions form a parametric survival model
require(survival)
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
ddist <- datadist(age, sex)
Srv <- Surv(t,e)
# Fit log-normal survival model and plot median survival time vs. age
f <- psm(Srv ~ rcs(age), dist='lognormal')
med <- Quantile(f) # Creates function to compute quantiles
# (median by default)
Predict(f, age, fun=function(x)med(lp=x))
# Note: This works because med() expects the linear predictor (X*beta)
# as an argument. Would not work if use
# ref.zero=TRUE or adj.zero=TRUE.
# Also, confidence intervals from this method are approximate since
# they don't take into account estimation of scale parameter
# Fit an ols model to log(y) and plot the relationship between x1
# and the predicted mean(y) on the original scale without assuming
# normality of residuals; use the smearing estimator. Before doing
# that, show confidence intervals for mean and individual log(y),
# and for the latter, also show bootstrap percentile nonparametric
# pointwise confidence limits
set.seed(1)
x1 <- runif(300)
x2 <- runif(300)
ddist <- datadist(x1,x2); options(datadist='ddist')
y <- exp(x1+ x2 - 1 + rnorm(300))
f <- ols(log(y) ~ pol(x1,2) + x2, x=TRUE, y=TRUE) # x y for bootcov
fb <- bootcov(f, B=100)
pb <- Predict(fb, x1, x2=c(.25,.75))
p1 <- Predict(f, x1, x2=c(.25,.75))
p <- rbind(normal=p1, boot=pb)
plot(p)
p1 <- Predict(f, x1, conf.type='mean')
p2 <- Predict(f, x1, conf.type='individual')
p <- rbind(mean=p1, individual=p2)
plot(p, label.curve=FALSE) # uses superposition
plot(p, ~x1 | .set.) # 2 panels
r <- resid(f)
smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean')
formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)])
#smean$res <- r[!is.na(r)] # define default res argument to function
Predict(f, x1, fun=smean)
## Example using offset
g <- Glm(Y ~ offset(log(N)) + x1 + x2, family=poisson)
Predict(g, offset=list(N=100))
} # }
options(datadist=NULL)