Predictive Ability using Resampling

predab.resample is a general-purpose function that is used by functions for specific models. It computes estimates of optimism of, and bias-corrected estimates of a vector of indexes of predictive accuracy, for a model with a specified design matrix, with or without fast backward step-down of predictors. If bw=TRUE, the design matrix x must have been created by ols, lrm, or cph. If bw=TRUE, predab.resample stores as the kept attribute a logical matrix encoding which factors were selected at each repetition.

Unless conf.int is FALSE or 0, the function computes approximate bootstrap confidence intervals for overfitting-corrected predictive performance measures using the method of Harrell (2025) described and tested at https://www.fharrell.com/post/bootcal/ and inspired by Noma et al (2021).

Usage

predab.resample(fit.orig, fit, measure, 
                method=c("boot","crossvalidation",".632","randomization"),
                bw=FALSE, B=50, conf.int=0.95, pr=FALSE, prmodsel=TRUE,
                rule="aic", type="residual", sls=.05, aics=0,
                tol=.Machine$double.eps, force=NULL, estimates=TRUE,
                non.slopes.in.x=TRUE, kint=1,
                cluster, subset, group=NULL,
                allow.varying.intercepts=FALSE, debug=FALSE, saveraw=FALSE, ...)

Arguments

fit.orig: object containing the original full-sample fit, with the x=TRUE and y=TRUE options specified to the model fitting function. This model should be the FULL model including all candidate variables ever excluded because of poor associations with the response.
fit: a function to fit the model, either the original model fit, or a fit in a sample. fit has as arguments x,y, iter, penalty, penalty.matrix, xcol, and other arguments passed to predab.resample. If you don't want iter as an argument inside the definition of fit, add ... to the end of its argument list. iter is passed to fit to inform the function of the sampling repetition number (0=original sample). If bw=TRUE, fit should allow for the possibility of selecting no predictors, i.e., it should fit an intercept-only model if the model has intercept(s). fit must return objects coef and fail (fail=TRUE if fit failed due to singularity or non-convergence - these cases are excluded from summary statistics). fit must add design attributes to the returned object if bw=TRUE. The penalty.matrix parameter is not used if penalty=0. The xcol vector is a vector of columns of X to be used in the current model fit. For ols and psm it includes a 1 for the intercept position. xcol is not defined if iter=0 unless the initial fit had been from a backward step-down. xcol is used to select the correct rows and columns of penalty.matrix for the current variables selected, for example.
measure: a function to compute a vector of indexes of predictive accuracy for a given fit. For method=".632" or method="crossval", it will make the most sense for measure to compute only indexes that are independent of sample size. The measure function should take the following arguments or use ...: xbeta (X beta for current fit), y, evalfit, fit, iter, and fit.orig. iter is as in fit. evalfit is set to TRUE by predab.resample if the fit is being evaluated on the sample used to make the fit, FALSE otherwise; fit.orig is the fit object returned by the original fit on the whole sample. Using evalfit will sometimes save computations. For example, in bootstrapping the area under an ROC curve for a logistic regression model, lrm already computes the area if the fit is on the training sample. fit.orig is used to pass computed configuration parameters from the original fit such as quantiles of predicted probabilities that are used as cut points in other samples. The vector created by measure should have names() associated with it if the model performance measures are indexes such as the Brier score or calibration slope. The vector must not have names if it corresponds to a nonparametric calibration curve.
method: The default is "boot" for ordinary bootstrapping (Efron, 1983, Eq. 2.10). Use ".632" for Efron's .632 method (Efron, 1983, Section 6 and Eq. 6.10), "crossvalidation" for grouped cross–validation, "randomization" for the randomization method. May be abbreviated down to any level, e.g. "b", ".", "cross", "rand".
bw: Set to TRUE to do fast backward step-down for each training sample. Default is FALSE.
B: Number of repetitions, default=50, which is far too small. For method="crossvalidation", this is also the number of groups the original sample is split into.
conf.int: confidence level for approximate confidence limits for overfitting-corrected indexes. Set to FALSE or 0 to not compute limits. For calibration,
pr: TRUE to print results for each sample. Default is FALSE. Also controls printing of number of divergent or singular samples.
prmodsel: set to FALSE to suppress printing of model selection output such as that from fastbw.
rule: Stopping rule for fastbw, "aic" or "p". Default is "aic" to use Akaike's information criterion.
type: Type of statistic to use in stopping rule for fastbw, "residual" (the default) or "individual".
sls: Significance level for stopping in fastbw if rule="p". Default is .05.
aics: Stopping criteria for rule="aic". Stops deleting factors when chi-square - 2 times d.f. falls below aics. Default is 0.
tol: Tolerance for singularity checking. Is passed to fit and fastbw.
force: see fastbw
estimates: see print.fastbw
non.slopes.in.x: set to FALSE if the design matrix x does not have columns for intercepts and these columns are needed
kint: For multiple intercept models such as the ordinal logistic model, you may specify which intercept to use as kint. This affects the linear predictor that is passed to measure.
cluster: Vector containing cluster identifiers. This can be specified only if method="boot". If it is present, the bootstrap is done using sampling with replacement from the clusters rather than from the original records. If this vector is not the same length as the number of rows in the data matrix used in the fit, an attempt will be made to use naresid on fit.orig to conform cluster to the data. See bootcov for more about this.
subset: specify a vector of positive or negative integers or a logical vector when you want to have the measure function compute measures of accuracy on a subset of the data. The whole dataset is still used for all model development. For example, you may want to validate or calibrate a model by assessing the predictions on females when the fit was based on males and females. When you use cr.setup to build extra observations for fitting the continuation ratio ordinal logistic model, you can use subset to specify which cohort or observations to use for deriving indexes of predictive accuracy. For example, specify subset=cohort=="all" to validate the model for the first layer of the continuation ratio model (Prob(Y=0)).
group: a grouping variable used to stratify the sample upon bootstrapping. This allows one to handle k-sample problems, i.e., each bootstrap sample will be forced to selected the same number of observations from each level of group as the number appearing in the original dataset.
allow.varying.intercepts: set to TRUE to not throw an error if the number of intercepts varies from fit to fit
debug: set to TRUE to print subscripts of all training and test samples
saveraw: set to TRUE to store a list named .predab_raw. in the global environment. The list has the elements orig (original estimates of performance indexes), btrain (a matrix with up to B bootstrap repetitions of indexes computed on training samples), and btest (a similar matrix but computing indexes on the test samples). For bootstrapping, training samples are bootstrap samples and test samples are the original data.
...: The user may add other arguments here that are passed to fit and measure.

Value

a matrix of class "validate" with rows corresponding to indexes computed by measure, and the following columns:

index.orig: indexes in original overall fit
training: average indexes in training samples
test: average indexes in test samples
optimism: average training-test except for method=".632" - is .632 times (index.orig - test)
index.corrected: index.orig-optimism
n: number of successful repetitions with the given index non-missing

. Also contains an attribute keepinfo if measure returned such an attribute when run on the original fit.

Details

For method=".632", the program stops with an error if every observation is not omitted at least once from a bootstrap sample. Efron's ".632" method was developed for measures that are formulated in terms on per-observation contributions. In general, error measures (e.g., ROC areas) cannot be written in this way, so this function uses a heuristic extension to Efron's formulation in which it is assumed that the average error measure omitting the ith observation is the same as the average error measure omitting any other observation. Then weights are derived for each bootstrap repetition and weighted averages over the B repetitions can easily be computed.

Author

Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com

References

Efron B, Tibshirani R (1997). Improvements on cross-validation: The .632+ bootstrap method. JASA 92:548–560.

Noma H, et al (2021). Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods. Statistics in Medicine 40:5691-5701.

Harrell FE (2025). Bootstrap confidence limits for bootstrap overfitting-corrected model performance, https://www.fharrell.com/post/bootcal/

Examples

# See the code for validate.ols for an example of the use of
# predab.resample