Dxy and Mean Squared Error by Cross-validating a Tree Sequence
validate.rpart.RdUses xval-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' Dxy rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score. For
survival trees, Dxy is negated so that larger is better.
There are print and plot methods for
objects created by validate.rpart.
Usage
# f <- rpart(formula=y ~ x1 + x2 + \dots) # or rpart
# S3 method for class 'rpart'
validate(fit, method, B, bw, rule, type, sls, aics,
force, estimates, pr=TRUE,
k, rand, xval=10, FUN, ...)
# S3 method for class 'validate.rpart'
print(x, ...)
# S3 method for class 'validate.rpart'
plot(x, what=c("mse","dxy"), legendloc=locator, ...)Arguments
- fit
an object created by
rpart. You must have specified themodel=TRUEargument torpart.- method,B,bw,rule,type,sls,aics,force,estimates
are there only for consistency with the generic
validatefunction; these are ignored- x
the result of
validate.rpart- k
a sequence of cost/complexity values. By default these are obtained from calling
FUNwith no optional arguments or from therpartcptableobject in the original fit object. You may also specify a scalar or vector.- rand
a random sample (usually omitted)
- xval
number of splits
- FUN
the name of a function which produces a sequence of trees, such
prune.- ...
additional arguments to
FUN(ignored byprint,plot).- pr
set to
FALSEto prevent intermediate results for eachkto be printed- what
a vector of things to plot. By default, 2 plots will be done, one for
mseand one forDxy.- legendloc
a function that is evaluated with a single argument equal to
1to generate a list with componentsx, yspecifying coordinates of the upper left corner of a legend, or a 2-vector. For the latter,legendlocspecifies the relative fraction of the plot at which to center the legend.
Value
a list of class "validate.rpart" with components named k, size, dxy.app,
dxy.val, mse.app, mse.val, binary, xval. size is the number of nodes,
dxy refers to Somers' D, mse refers to mean squared error of prediction,
app means apparent accuracy on training samples, val means validated
accuracy on test samples, binary is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary). size will not be present if the user specifies k.