Validation of a Fitted Cox or Parametric Survival Model's Indexes of Fit
validate.cph.RdThis is the version of the validate function specific to models
fitted with cph or psm. Also included is a small
function dxy.cens that retrieves \(D_{xy}\) and its
standard error from the survival package's
concordancefit function. This allows for incredibly fast
computation of \(D_{xy}\) or the c-index even for hundreds of
thousands of observations. dxy.cens negates \(D_{xy}\)
if log relative hazard is being predicted. If y is a
left-censored Surv object, times are negated and a
right-censored object is created, then \(D_{xy}\) is negated.
See predab.resample for information about confidence limits.
Usage
# fit <- cph(formula=Surv(ftime,event) ~ terms, x=TRUE, y=TRUE, \dots)
# S3 method for class 'cph'
validate(fit, method="boot", B=40, bw=FALSE, rule="aic",
type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE,
pr=FALSE, dxy=TRUE, u, tol=1e-9, ...)
# S3 method for class 'psm'
validate(fit, method="boot",B=40,
bw=FALSE, rule="aic", type="residual", sls=.05, aics=0,
force=NULL, estimates=TRUE, pr=FALSE,
dxy=TRUE, tol=1e-12, rel.tolerance=1e-5, maxiter=15, ...)
dxy.cens(x, y, type=c('time','hazard'))Arguments
- fit
a fit derived
cph. The optionsx=TRUEandy=TRUEmust have been specified. If the model contains any stratification factors and dxy=TRUE, the optionssurv=TRUEandtime.inc=umust also have been given, whereuis the same value ofugiven tovalidate.- method
see
validate- B
number of repetitions. For
method="crossvalidation", is the number of groups of omitted observations.- rel.tolerance,maxiter,bw
TRUEto do fast step-down using thefastbwfunction, for both the overall model and for each repetition.fastbwkeeps parameters together that represent the same factor.- rule
Applies if
bw=TRUE."aic"to use Akaike's information criterion as a stopping rule (i.e., a factor is deleted if the \(\chi^2\) falls below twice its degrees of freedom), or"p"to use \(P\)-values.- type
"residual"or"individual"- stopping rule is for individual factors or for the residual \(\chi^2\) for all variables deleted. Fordxy.cens, specifytype="hazard"ifxis on the hazard or cumulative hazard (or their logs) scale, causing negation of the correlation index.- sls
significance level for a factor to be kept in a model, or for judging the residual \(\chi^2\).
- aics
cutoff on AIC when
rule="aic".- force
see
fastbw- estimates
see
print.fastbw- pr
TRUEto print results of each repetition- tol,...
see
validateorpredab.resample- dxy
set to
TRUEto validate Somers' \(D_{xy}\) usingdxy.cens, which is fast until n > 500,000. Uses thesurvivalpackage'sconcordancefitservice function forconcordance.- u
must be specified if the model has any stratification factors and
dxy=TRUE. In that case, strata are not included in \(X\beta\) and the survival curves may cross. Predictions at timet=uare correlated with observed survival times. Does not apply tovalidate.psm.- x
a numeric vector
- y
a
Survobject that may be uncensored or right-censored
Details
Statistics validated include the Nagelkerke \(R^2\),
\(D_{xy}\), slope shrinkage, the discrimination index \(D\)
[(model L.R. \(\chi^2\) - 1)/L], the unreliability index
\(U\) = (difference in -2 log likelihood between uncalibrated
\(X\beta\) and
\(X\beta\) with overall slope calibrated to test sample) / L,
and the overall quality index \(Q = D - U\). \(g\) is the
\(g\)-index on the log relative hazard (linear predictor) scale.
L is -2 log likelihood with beta=0. The "corrected" slope
can be thought of as shrinkage factor that takes into account overfitting.
See predab.resample for the list of resampling methods.
Value
matrix with rows corresponding to \(D_{xy}\), Slope, \(D\),
\(U\), and \(Q\), and columns for the original index, resample estimates,
indexes applied to whole or omitted sample using model derived from
resample, average optimism, corrected index, and number of successful
resamples.
The values corresponding to the row \(D_{xy}\) are equal to \(2 * (C - 0.5)\) where C is the C-index or concordance probability. If the user is correlating the linear predictor (predicted log hazard) with survival time, \(D_{xy}\) is automatically negated.
Examples
require(survival)
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, TRUE))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
S <- Surv(dt,e)
f <- cph(S ~ age*sex, x=TRUE, y=TRUE)
# Validate full model fit
validate(f, B=10) # normally B=150
#> index.orig training test optimism index.corrected Lower Upper n
#> Dxy 0.3852 0.3937 0.3786 0.0150 0.3701 0.2849 0.4063 10
#> R2 0.0811 0.0903 0.0778 0.0125 0.0686 0.0341 0.0801 10
#> Slope 1.0000 1.0000 0.9130 0.0870 0.9130 0.6998 1.0431 10
#> D 0.0312 0.0342 0.0298 0.0044 0.0268 0.0150 0.0329 10
#> U -0.0008 -0.0008 0.0005 -0.0013 0.0005 -0.0007 0.0041 10
#> Q 0.0320 0.0350 0.0293 0.0057 0.0263 0.0083 0.0328 10
#> g 0.7388 0.7924 0.7228 0.0696 0.6692 0.5217 0.7577 10
# Validate a model with stratification. Dxy is the only
# discrimination measure for such models, by Dxy requires
# one to choose a single time at which to predict S(t|X)
f <- cph(S ~ rcs(age)*strat(sex),
x=TRUE, y=TRUE, surv=TRUE, time.inc=2)
#> number of knots in rcs defaulting to 5
validate(f, u=2, B=10) # normally B=150
#> index.orig training test optimism index.corrected Lower Upper n
#> Dxy 0.3491 0.3932 0.3578 0.0355 0.3137 0.2502 0.3720 10
#> R2 0.0759 0.0859 0.0695 0.0164 0.0595 0.0135 0.0855 10
#> Slope 1.0000 1.0000 0.9073 0.0927 0.9073 0.6242 1.1138 10
#> D 0.0317 0.0361 0.0289 0.0071 0.0246 0.0056 0.0359 10
#> U -0.0009 -0.0009 0.0007 -0.0016 0.0007 -0.0008 0.0051 10
#> Q 0.0326 0.0370 0.0282 0.0088 0.0239 0.0015 0.0367 10
#> g 0.6587 1.8035 1.6442 0.1593 0.4994 0.0599 0.9037 10
# Note u=time.inc