Stepwise model builder for GAM
step.gam.RdBuilds a GAM model in a step-wise fashion. For each "term" there is an
ordered list of alternatives, and the function traverses these in a greedy
fashion. Note: this is NOT a method for step, which used to be a
generic, so must be invoked with the full name.
Usage
step.Gam(
object,
scope,
scale,
direction = c("both", "backward", "forward"),
trace = TRUE,
keep = NULL,
steps = 1000,
parallel = FALSE,
...
)Arguments
- object
An object of class
Gamor any of it's inheritants.- scope
defines the range of models examined in the step-wise search. It is a list of formulas, with each formula corresponding to a term in the model. Each of these formulas specifies a "regimen" of candidate forms in which the particular term may enter the model. For example, a term formula might be
~1+ Income + log(Income) + s(Income). This means thatIncomecould either appear not at all, linearly, linearly in its logarithm, or as a smooth function estimated nonparametrically. A1in the formula allows the additional option of leaving the term out of the model entirely. Every term in the model is described by such a term formula, and the final model is built up by selecting a component from each formula.As an alternative more convenient for big models, each list can have instead of a formula a character vector corresponding to the candidates for that term. Thus we could have
c("1","x","s(x,df=5")rather than~1+x+s(x,df=5).The supplied model
objectis used as the starting model, and hence there is the requirement that one term from each of the term formulas be present informula(object). This also implies that any terms informula(object)not contained in any of the term formulas will be forced to be present in every model considered. The functiongam.scopeis helpful for generating the scope argument for a large model.- scale
an optional argument used in the definition of the AIC statistic used to evaluate models for selection. By default, the scaled Chi-squared statistic for the initial model is used, but if forward selection is to be performed, this is not necessarily a sound choice.
- direction
The mode of step-wise search, can be one of
"both","backward", or"forward", with a default of"both". Ifscopeis missing, the default fordirectionis "both".- trace
If
TRUE(the default), information is printed during the running ofstep.Gam(). This is an encouraging choice in general, sincestep.Gam()can take some time to compute either for large models or when called with an an extensivescope=argument. A simple one line model summary is printed for each model selected. This argument can also be given as the binary0or1. A valuetrace=2gives a more verbose trace.- keep
A filter function whose input is a fitted
Gamobject, and anything else passed via ..., and whose output is arbitrary. Typicallykeep()will select a subset of the components of the object and return them. The default is not to keep anything.- steps
The maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.
- parallel
If
TRUE, use parallelforeachto fit each trial run. Must register parallel before hand, such asdoMCor others. See the example below.- ...
Additional arguments to be passed on to
keep
Value
The step-wise-selected model is returned, with up to two additional
components. There is an "anova" component corresponding to the steps
taken in the search, as well as a "keep" component if the
keep= argument was supplied in the call.
We describe the most general setup, when direction = "both". At any
stage there is a current model comprising a single term from each of the
term formulas supplied in the scope= argument. A series of models is
fitted, each corrresponding to a formula obtained by moving each of the
terms one step up or down in its regimen, relative to the formula of the
current model. If the current value for any term is at either of the extreme
ends of its regimen, only one rather than two steps can be considered. So if
there are p term formulas, at most 2*p - 1 models are
considered. A record is kept of all the models ever visited (hence the
-1 above), to avoid repetition. Once each of these models has been
fit, the "best" model in terms of the AIC statistic is selected and defines
the step. The entire process is repeated until either the maximum number of
steps has been used, or until the AIC criterion can not be decreased by any
of the eligible steps.
References
Hastie, T. J. (1992) Generalized additive models. Chapter 7 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Hastie, T. and Tibshirani, R. (1990) Generalized Additive Models. London: Chapman and Hall.
Author
Written by Trevor Hastie, following closely the design in the "Generalized Additive Models" chapter (Hastie, 1992) in Chambers and Hastie (1992).
Examples
data(gam.data)
Gam.object <- gam(y~x+z, data=gam.data)
step.object <-step.Gam(Gam.object, scope=list("x"=~1+x+s(x,4)+s(x,6)+s(x,12),"z"=~1+z+s(z,4)))
#> Start: y ~ x + z; AIC= 127.7316
#> Step:1 y ~ s(x, 4) + z ; AIC= 44.0543
#> Step:2 y ~ s(x, 4) ; AIC= 43.1799
#> Step:3 y ~ s(x, 6) ; AIC= 42.6681
if (FALSE) { # \dontrun{
# Parallel
require(doMC)
registerDoMC(cores=2)
step.Gam(Gam.object, scope=list("x"=~1+x+s(x,4)+s(x,6)+s(x,12),"z"=~1+z+s(z,4)),parallel=TRUE)
} # }