Resampling Validation of a Fitted Model's Indexes of Fit
validate.RdThe validate function when used on an object created by one of the
rms series does resampling validation of a
regression model, with or without backward step-down variable
deletion.
The print method will call the latex or html method
if options(prType=) is set to "latex" or "html".
For "latex" printing through print(), the LaTeX table
environment is turned off. When using html with Quarto or RMarkdown,
results='asis' need not be written in the chunk header.
See predab.resample for information about confidence limits.
Usage
# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
validate(fit, method="boot", B=40,
bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0,
force=NULL, estimates=TRUE, pr=FALSE, ...)
# S3 method for class 'validate'
print(x, digits=4, B=Inf, ...)
# S3 method for class 'validate'
latex(object, digits=4, B=Inf, file='', append=FALSE,
title=first.word(deparse(substitute(x))),
caption=NULL, table.env=FALSE,
size='normalsize', extracolsize=size, ...)
# S3 method for class 'validate'
html(object, digits=4, B=Inf, caption=NULL, ...)Arguments
- fit
a fit derived by e.g.
lrm,cph,psm,ols. The optionsx=TRUEandy=TRUEmust have been specified.- method
may be
"crossvalidation","boot"(the default),".632", or"randomization". Seepredab.resamplefor details. Can abbreviate, e.g."cross", "b", ".6".- B
number of repetitions. For
method="crossvalidation", is the number of groups of omitted observations. Forprint.validate,latex.validate, andhtml.validate,Bis an upper limit on the number of resamples for which information is printed about which variables were selected in each model re-fit. Specify zero to suppress printing. Default is to print all re-samples.- bw
TRUEto do fast step-down using thefastbwfunction, for both the overall model and for each repetition.fastbwkeeps parameters together that represent the same factor.- rule
Applies if
bw=TRUE."aic"to use Akaike's information criterion as a stopping rule (i.e., a factor is deleted if the \(\chi^2\) falls below twice its degrees of freedom), or"p"to use \(P\)-values.- type
"residual"or"individual"- stopping rule is for individual factors or for the residual \(\chi^2\) for all variables deleted- sls
significance level for a factor to be kept in a model, or for judging the residual \(\chi^2\).
- aics
cutoff on AIC when
rule="aic".- force
see
fastbw- estimates
see
print.fastbw- pr
TRUEto print results of each repetition- ...
parameters for each specific validate function, and parameters to pass to
predab.resample(note especially thegroup,cluster, amdsubsetparameters). Forlatex, optional arguments tolatex.default. Ignored forhtml.validate.For
psm, you can pass themaxiterparameter here (passed tosurvreg.control, default is 15 iterations) as well as atolparameter for judging matrix singularity insolvet(default is 1e-12) and arel.toleranceparameter that is passed tosurvreg.control(default is 1e-5).For
print.validate... is ignored.- x,object
an object produced by one of the
validatefunctions- digits
number of decimal places to print
- file
file to write LaTeX output. Default is standard output.
- append
set to
TRUEto append LaTeX output to an existing file- title, caption, table.env, extracolsize
see
latex.default. Iftable.envisFALSEandcaptionis given, the character string contained incaptionwill be placed before the table, centered.- size
size of LaTeX output. Default is
'normalsize'. Must be a defined LaTeX size when prepended by double slash.
Details
It provides bias-corrected indexes that are specific to each type
of model. For validate.cph and validate.psm, see validate.lrm,
which is similar.
For validate.cph and validate.psm, there is
an extra argument dxy, which if TRUE causes the dxy.cens
function to be invoked to compute the Somers' \(D_{xy}\) rank correlation
to be computed at each resample. The values corresponding to the row
\(D_{xy}\) are equal to \(2 * (C - 0.5)\) where C is the
C-index or concordance probability.
For validate.cph with dxy=TRUE,
you must specify an argument u if the model is stratified, since
survival curves can then cross and \(X\beta\) is not 1-1 with
predicted survival.
There is also validate method for
tree, which only does cross-validation and which has a different
list of arguments.
Value
a matrix with rows corresponding to the statistical indexes and columns for columns for the original index, resample estimates, indexes applied to the whole or omitted sample using the model derived from the resample, average optimism, corrected index, and number of successful re-samples.
Examples
# See examples for validate.cph, validate.lrm, validate.ols
# Example of validating a parametric survival model:
require(survival)
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, TRUE))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
S <- Surv(dt,e)
f <- psm(S ~ age*sex, x=TRUE, y=TRUE) # Weibull model
# Validate full model fit
validate(f, B=10) # usually B=150
#> index.orig training test optimism index.corrected Lower Upper
#> Dxy 0.3854 0.3937 0.3789 0.0148 0.3706 0.2846 0.4076
#> R2 0.0918 0.1032 0.0881 0.0152 0.0766 0.0388 0.0896
#> Intercept 0.0000 0.0000 0.3467 -0.3467 0.3467 -0.3214 0.9523
#> Slope 1.0000 1.0000 0.9015 0.0985 0.9015 0.6851 1.0844
#> D 0.0449 0.0503 0.0430 0.0073 0.0376 0.0164 0.0447
#> U -0.0012 -0.0011 -0.0017 0.0005 -0.0017 -0.0110 0.0008
#> Q 0.0461 0.0515 0.0447 0.0068 0.0393 0.0243 0.0452
#> g 0.7377 0.8071 0.7236 0.0835 0.6542 0.4520 0.7634
#> n
#> Dxy 10
#> R2 10
#> Intercept 10
#> Slope 10
#> D 10
#> U 10
#> Q 10
#> g 10
# Validate stepwise model with typical (not so good) stopping rule
# bw=TRUE does not preserve hierarchy of terms at present
validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")
#>
#> Backwards Step-down - Original Model
#>
#> Deleted Chi-Sq d.f. P Residual d.f. P AIC
#> age * sex 0.98 1 0.3217 0.98 1 0.3217 -1.02
#>
#> Approximate Estimates after Deleting Factors
#>
#> Coef S.E. Wald Z P
#> (Intercept) 5.40012 0.37320 14.470 0.000e+00
#> age -0.04254 0.00598 -7.114 1.127e-12
#> sex=Male 0.58686 0.14850 3.952 7.750e-05
#>
#> Factors in Final Model
#>
#> [1] age sex
#> index.orig training test optimism index.corrected Lower Upper
#> Dxy 0.3858 0.3942 0.3805 0.0137 0.3721 0.3078 0.4571
#> R2 0.0907 0.0981 0.0894 0.0087 0.0820 0.0301 0.1157
#> Intercept 0.0000 0.0000 0.0759 -0.0759 0.0759 -0.7556 0.9171
#> Slope 1.0000 1.0000 0.9799 0.0201 0.9799 0.7395 1.2519
#> D 0.0443 0.0481 0.0437 0.0044 0.0399 0.0134 0.0575
#> U -0.0012 -0.0012 -0.0010 -0.0002 -0.0010 -0.0047 0.0012
#> Q 0.0455 0.0493 0.0447 0.0046 0.0409 0.0172 0.0583
#> g 0.6888 0.7217 0.6996 0.0220 0.6668 0.5233 0.8190
#> n
#> Dxy 10
#> R2 10
#> Intercept 10
#> Slope 10
#> D 10
#> U 10
#> Q 10
#> g 10
#>
#> Factors Retained in Backwards Elimination
#>
#> age sex age * sex
#> * *
#> * * *
#> * *
#> * *
#> * *
#> * *
#> * *
#> * *
#> * * *
#> * *
#>
#> Frequencies of Numbers of Factors Retained
#>
#> 2 3
#> 8 2