Validation of an Ordinary Linear Model
validate.ols.RdThe validate function when used on an object created by
ols does resampling validation of a multiple linear regression
model, with or without backward step-down variable deletion. Uses
resampling to estimate the optimism in various measures of predictive
accuracy which include \(R^2\), \(MSE\) (mean squared error with a
denominator of \(n\)), the \(g\)-index, and the intercept and slope
of an overall
calibration \(a + b\hat{y}\). The "corrected"
slope can be thought of as shrinkage factor that takes into account
overfitting. validate.ols can also be used when a model for a
continuous response is going to be applied to a binary response. A
Somers' \(D_{xy}\) for this case is computed for each resample by
dichotomizing y. This can be used to obtain an ordinary receiver
operating characteristic curve area using the formula \(0.5(D_{xy} +
1)\). The Nagelkerke-Maddala \(R^2\) index for the dichotomized
y is also given. See predab.resample for information about confidence limits and for the list of resampling methods.
The LaTeX needspace package must be in effect to use the latex method.
Usage
# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
# S3 method for class 'ols'
validate(fit, method="boot", B=40,
bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0,
force=NULL, estimates=TRUE, pr=FALSE, u=NULL, rel=">",
tolerance=1e-7, ...)Arguments
- fit
a fit derived by
ols. The optionsx=TRUEandy=TRUEmust have been specified. Seevalidatefor a description of argumentsmethod-pr.- method,B,bw,rule,type,sls,aics,force,estimates,pr
see
validateandpredab.resampleandfastbw- u
If specifed,
yis also dichotomized at the cutoffufor the purpose of getting a bias-corrected estimate of \(D_{xy}\).- rel
relationship for dichotomizing predicted
y. Defaults to">"to usey>u.relcan also be"<",">=", and"<=".- tolerance
tolerance for singularity; passed to
lm.fit.qr.- ...
other arguments to pass to
predab.resample, such asgroup,cluster, andsubset
Value
matrix with rows corresponding to R-square, MSE, g, intercept, slope, and optionally \(D_{xy}\) and \(R^2\), and columns for the original index, resample estimates, indexes applied to whole or omitted sample using model derived from resample, average optimism, corrected index, and number of successful resamples.
Examples
set.seed(1)
x1 <- runif(200)
x2 <- sample(0:3, 200, TRUE)
x3 <- rnorm(200)
distance <- (x1 + x2/3 + rnorm(200))^2
f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2) + x3, x=TRUE, y=TRUE)
#Validate full model fit (from all observations) but for x1 < .75
validate(f, B=20, subset=x1 < .75) # normally B=300
#> index.orig training test optimism index.corrected Lower Upper n
#> R-square 0.0939 0.122 0.0592 0.0625 0.0313 -0.1181 0.154 20
#> MSE 0.6047 0.593 0.6278 -0.0348 0.6395 0.4857 0.803 20
#> g 0.2896 0.327 0.2654 0.0616 0.2280 0.0849 0.405 20
#> Intercept 0.0863 0.065 0.2358 -0.1708 0.2571 -0.3944 0.698 20
#> Slope 0.9016 0.919 0.7651 0.1535 0.7481 0.3518 1.324 20
#Validate stepwise model with typical (not so good) stopping rule
validate(f, B=20, bw=TRUE, rule="p", sls=.1, type="individual")
#>
#> Backwards Step-down - Original Model
#>
#> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2
#> x3 0.99 1 0.3204 0.99 1 0.3204 -1.01 0.128
#>
#> Approximate Estimates after Deleting Factors
#>
#> Coef S.E. Wald Z P
#> Intercept 0.94530 0.2548 3.7100 0.0002072
#> x1 -0.65558 1.0361 -0.6327 0.5269109
#> x1' 3.11974 2.9290 1.0651 0.2868256
#> x1'' -8.11867 9.5505 -0.8501 0.3952801
#> x2 0.05955 0.1540 0.3868 0.6989377
#> x2=2 0.27098 0.2766 0.9797 0.3272444
#> x2=3 0.36983 0.4068 0.9091 0.3632803
#>
#> Factors in Final Model
#>
#> [1] x1 x2
#> index.orig training test optimism index.corrected Lower Upper n
#> R-square 0.128 0.150 0.0818 0.0679 0.0602 -0.0532 0.178 20
#> MSE 0.668 0.631 0.7034 -0.0727 0.7407 0.5570 0.959 20
#> g 0.360 0.373 0.3094 0.0636 0.2961 0.1390 0.427 20
#> Intercept 0.000 0.000 0.1973 -0.1973 0.1973 -0.2860 0.647 20
#> Slope 1.000 1.000 0.8627 0.1373 0.8627 0.4063 1.290 20
#>
#> Factors Retained in Backwards Elimination
#>
#> x1 x2 x3
#> *
#> * *
#> * *
#> *
#> * *
#> * *
#> * * *
#> * * *
#> * *
#> * *
#> * * *
#> * *
#> *
#> * *
#> * *
#> * * *
#> * *
#> * * *
#> * * *
#> * * *
#>
#> Frequencies of Numbers of Factors Retained
#>
#> 1 2 3
#> 3 10 7