Defining O'Sullivan Spline Smooths in VGAM Formulas
sm.os.RdThis function represents an O-spline smooth term
in a vgam formula
and confers automatic smoothing parameter selection.
Usage
sm.os(x, ..., niknots = 6, spar = -1, o.order = 2,
alg.niknots = c("s", ".nknots.smspl")[1], all.knots = FALSE,
ridge.adj = 1e-5, spillover = 0.01, maxspar = 1e12,
outer.ok = FALSE, fixspar = FALSE)Arguments
- x
covariate (abscissae) to be smoothed. Also called the regressor. If the
xijfacility is used then these covariates are inputted via the...argument.- ...
Used to accommodate the other \(M-1\) covariates when the
xijfacility is used. See Section 3.4.4 of Yee (2015) for something very similar. This argument, found in the second argument, means that the other argument names must be fully specified if used, e.g.,outer.okand notouter. See the example below. In the example below, the term in the main formula issm.os(gcost.air, gcost.trn, gcost.bus)and one might be tempted to use something likesm.os(gcost)to represent thatxijterm. However, this is not recommended becausesm.os(gcost)might not have the same number of columns assm.os(gcost.air, gcost.trn, gcost.bus)etc. That is, it is best to select one of the diagonal elements of the block matrix to represent that term.- niknots
numeric, the number of interior knots, called \(K\) below. The default is to use this value. If you want
alg.niknotsto operate then assignNULLto this argument.- alg.niknots
character. The algorithm used to determine the number of interior knots. Only used when
all.knots = FALSEandniknots = NULL. Note that".nknots.smspl"corresponds to the default ofsmooth.spline. The value"s"corresponds to the same algorithm ass.- all.knots
logical. If
TRUEthen all distinct points inxare used as the interior knots. IfFALSE(default) then a subset ofx[]is used, specificallyx[j]where theniknotsindices are quantiles that are evenly spaced with respect to the argumentprobs—seequantile. Ifall.knots = FALSEandniknots = NULLthen the argumentalg.niknotsis used to computeniknots.- spar, maxspar
sparis a vector of smoothing parameters. Negative values mean thatmagicwill choose initial values in order to do the optimization at each P-IRLS iteration. Positive values mean that they are used as initial values formagic. Iffixspar = TRUEthensparshould be assigned a vector of positive values (but having values less thanmaxspar); then the smoothing parameters will be fixed andmagicwill not be used.
- o.order
The order of the O'Sullivan penalzed spline. Any one value from
1:4is acceptable. The degree of the spline is2 * o.order - 1, so that cubic splines are the default. Settingo.order = 1results in a linear spline which is a piecewise linear function.- ridge.adj
small positive number to stabilize linear dependencies among B-spline bases.
- spillover
small and positive proportion of the range used on the outside of the boundary values. This defines the endpoints \(a\) and \(b\) that cover the data \(x_i\), i.e., we are interested in the interval \([a,b]\) which contains all the abscissae. The interior knots are strictly inside \((a,b)\).
- outer.ok
Fed into the argument (by the same name) of
splineDesign.- fixspar
logical. If
TRUEthensparshould be a vector with positive values and the smoothing parameters are fixed at those values. IfFALSEthensparcontains the initial values for the smoothing parameters, andmagicis called to determine (hopefully) some good values for the smoothing parameters.
Details
This function is currently used by vgam to
allow automatic smoothing parameter selection based on
O-splines to minimize an UBRE quantity.
In contrast, s operates by having a
prespecified amount of smoothing, e.g., its df argument.
When the sample size is reasonably large
this function
is recommended over s also because backfitting
is not required.
This function therefore allows 2nd-generation VGAMs to be
fitted (called G2-VGAMs, or Penalized-VGAMs).
This function should only be used with vgam.
This function uses quantile to
choose the knots, whereas sm.ps
chooses equally-spaced knots.
As Wand and Ormerod (2008) write,
in most situations the differences will be minor,
but it is possible for problems to arise
for either strategy by
constructing certain regression functions and
predictor variable distributions.
Any differences between O-splines and P-splines tend
to be at the boundaries. O-splines have
natural boundary constraints so that the solution is
linear beyond the boundary knots.
Some arguments in decreasing order of precedence are:
all.knots,
niknots,
alg.niknots.
Unlike s, which is symbolic and does not perform
any smoothing itself, this function does compute the penalized spline
when used by vgam—it creates the appropriate columns
of the model matrix. When this function is used within
vgam, automatic smoothing parameter selection is
implemented by calling magic after the necessary
link-ups are done.
By default this function centres the component function. This function is also smart; it can be used for smart prediction (Section 18.6 of Yee (2015)). Automatic smoothing parameter selection is performed using performance-oriented iteration whereby an optimization problem is solved at each IRLS iteration.
This function works better when the sample size is large, e.g., when in the hundreds, say.
Value
A matrix with attributes that are (only) used by vgam.
The number of rows of the matrix is length(x).
The number of columns is a function of the number
of interior knots \(K\) and
the order of the O-spline \(m\):
\(K+2m-1\).
In code, this is
niknots + 2 * o.order - 1,
or using sm.ps-like arguments,
ps.int + degree - 1
(where ps.int should be more generally
interpreted as the number of intervals. The formula is
the same as sm.ps.).
It transpires then that sm.os and sm.ps
are very similar.
References
Wand, M. P. and Ormerod, J. T. (2008). On semiparametric regression with O'Sullivan penalized splines. Australian and New Zealand Journal of Statistics, 50(2): 179–198.
Author
T. W. Yee, with some of the essential R code coming from the appendix of Wand and Ormerod (2008).
Note
This function is currently under development and may change in the future.
One might try using this function with vglm
so as to fit a regression spline,
however, the default value of niknots will probably
be too high for most data sets.
Warning
Being introduced into VGAM for the first time, this function (and those associated with it) should be used cautiously. Not all options are fully working or have been tested yet, and there are bound to be some bugs lurking around.
See also
vgam,
sm.ps,
s,
smartpred,
is.smart,
summarypvgam,
smooth.spline,
splineDesign,
bs,
magic.
Examples
sm.os(runif(20))
#> 2 3 4 5 6 7
#> 1 -0.121944351 -0.05741511 -0.11395827 -0.08076024 0.40902739 0.20471662
#> 2 -0.146900307 -0.06916513 -0.13729243 -0.18496548 0.08394917 0.49117616
#> 3 0.412680385 -0.04294764 -0.16985014 -0.23235111 -0.15524858 -0.31200906
#> 4 -0.132029424 -0.06216346 -0.12339417 -0.16792712 -0.10119221 0.33886799
#> 5 -0.110707464 -0.04810070 0.33208817 0.41957474 -0.09404402 -0.18908169
#> 6 -0.125361788 -0.05902414 -0.11716262 -0.10570906 0.37624808 0.24936690
#> 7 -0.117122441 -0.05514480 -0.10946216 -0.14896705 -0.09858572 0.20275723
#> 8 0.523003758 0.31262572 -0.02829819 -0.10186868 -0.06806789 -0.13679866
#> 9 -0.142148574 -0.06692787 -0.13285148 -0.18079758 -0.08561197 0.43894426
#> 10 -0.077603290 -0.03653799 -0.07252772 -0.09870297 -0.06594975 -0.04496393
#> 11 -0.110904482 -0.05221720 -0.05562328 0.40848297 0.26401931 -0.14525671
#> 12 -0.110527366 -0.04757431 0.33664351 0.41499565 -0.09391143 -0.18877409
#> 13 -0.042395342 -0.01996102 -0.03962251 -0.05392228 -0.03602891 -0.07224655
#> 14 0.076220212 0.41665892 0.29964267 -0.06304254 -0.06251761 -0.12564406
#> 15 0.003464846 -0.12122498 -0.25276393 -0.34404257 -0.22987677 -0.46199221
#> 16 -0.134421559 -0.06328975 -0.12562985 -0.14917669 0.27484768 0.35650292
#> 17 0.644349066 0.19927470 -0.06344487 -0.11188478 -0.07475735 -0.15024273
#> 18 -0.112843514 -0.05245731 0.27123228 0.47695051 -0.09374145 -0.19272993
#> 19 -0.065407665 -0.03079592 -0.06112974 -0.08319146 -0.05558552 -0.07574215
#> 20 -0.109400700 -0.04361204 0.36340475 0.38730575 -0.09297245 -0.18685032
#> 8 9 10
#> 1 -0.10637277 -0.064711005 -0.048022844
#> 2 -0.08088058 -0.077954137 -0.057850737
#> 3 -0.15935417 -0.096941804 -0.071941722
#> 4 0.29225251 -0.052862217 -0.051994442
#> 5 -0.09657077 -0.058748037 -0.043597651
#> 6 -0.10935319 -0.066524503 -0.049368663
#> 7 0.42930560 0.002631621 -0.046123930
#> 8 -0.06986796 -0.042503600 -0.031542452
#> 9 0.15653975 -0.072885252 -0.055979459
#> 10 0.39233940 0.377461278 0.003185814
#> 11 -0.09674263 -0.058852587 -0.043675239
#> 12 -0.09641367 -0.058652467 -0.043526727
#> 13 -0.02467286 0.194218028 0.754117773
#> 14 -0.06417091 -0.039037846 -0.028970472
#> 15 -0.23595591 -0.143541850 -0.106524197
#> 16 -0.11421993 -0.071332162 -0.052936487
#> 17 -0.07673433 -0.046680701 -0.034642330
#> 18 -0.09843406 -0.059881554 -0.044438848
#> 19 0.25473759 0.494853541 0.096915765
#> 20 -0.09543113 -0.058054746 -0.043083151
#> attr(,"S.arg")
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 3.4219082 -0.143155042 0.47943069 0.71269135 0.492233057 0.92183202
#> [2,] -0.1431550 0.046384168 -0.03578438 -0.02393102 -0.005378735 -0.04535723
#> [3,] 0.4794307 -0.035784376 0.11429894 0.05984950 0.088903590 0.11486871
#> [4,] 0.7126914 -0.023931015 0.05984950 0.24783269 0.049371127 0.14582131
#> [5,] 0.4922331 -0.005378735 0.08890359 0.04937113 0.262232419 -0.01881767
#> [6,] 0.9218320 -0.045357230 0.11486871 0.14582131 -0.018817673 0.59151117
#> [7,] 0.5012513 -0.008834040 0.08271499 0.13281269 0.125672138 -0.15245849
#> [8,] 0.3134553 -0.001361139 0.05828472 0.08310944 0.092097437 -0.01355696
#> [9,] 0.2090536 -0.012105468 0.02122961 0.03170380 0.028702403 0.15547684
#> [,7] [,8] [,9]
#> [1,] 0.50125133 0.313455339 0.20905362
#> [2,] -0.00883404 -0.001361139 -0.01210547
#> [3,] 0.08271499 0.058284724 0.02122961
#> [4,] 0.13281269 0.083109437 0.03170380
#> [5,] 0.12567214 0.092097437 0.02870240
#> [6,] -0.15245849 -0.013556960 0.15547684
#> [7,] 0.72887811 -0.581267654 0.28361458
#> [8,] -0.58126765 2.034672956 -1.24026753
#> [9,] 0.28361458 -1.240267533 0.88631473
#> attr(,"knots")
#> 14.28571% 28.57143% 42.85714% 57.14286%
#> 0.0294682 0.0294682 0.0294682 0.0294682 0.1175366 0.4711764 0.5051767 0.7162931
#> 71.42857% 85.71429%
#> 0.7872094 0.8517168 0.9616814 0.9616814 0.9616814 0.9616814
#> attr(,"intKnots")
#> 14.28571% 28.57143% 42.85714% 57.14286% 71.42857% 85.71429%
#> 0.1175366 0.4711764 0.5051767 0.7162931 0.7872094 0.8517168
#> attr(,"spar")
#> [1] -1
#> attr(,"o.order")
#> [1] 2
#> attr(,"ps.int")
#> [1] NA
#> attr(,"all.knots")
#> [1] FALSE
#> attr(,"alg.niknots")
#> [1] "s"
#> attr(,"ridge.adj")
#> [1] 1e-05
#> attr(,"outer.ok")
#> [1] FALSE
#> attr(,"fixspar")
#> [1] FALSE
if (FALSE) { # \dontrun{
data("TravelMode", package = "AER") # Need to install "AER" first
air.df <- subset(TravelMode, mode == "air") # Form 4 smaller data frames
bus.df <- subset(TravelMode, mode == "bus")
trn.df <- subset(TravelMode, mode == "train")
car.df <- subset(TravelMode, mode == "car")
TravelMode2 <- data.frame(income = air.df$income,
wait.air = air.df$wait - car.df$wait,
wait.trn = trn.df$wait - car.df$wait,
wait.bus = bus.df$wait - car.df$wait,
gcost.air = air.df$gcost - car.df$gcost,
gcost.trn = trn.df$gcost - car.df$gcost,
gcost.bus = bus.df$gcost - car.df$gcost,
wait = air.df$wait) # Value is unimportant
TravelMode2$mode <- subset(TravelMode, choice == "yes")$mode # The response
TravelMode2 <- transform(TravelMode2, incom.air = income, incom.trn = 0,
incom.bus = 0)
set.seed(1)
TravelMode2 <- transform(TravelMode2,
junkx2 = runif(nrow(TravelMode2)))
tfit2 <-
vgam(mode ~ sm.os(gcost.air, gcost.trn, gcost.bus) + ns(junkx2, 4) +
sm.os(incom.air, incom.trn, incom.bus) + wait ,
crit = "coef",
multinomial(parallel = FALSE ~ 1), data = TravelMode2,
xij = list(sm.os(gcost.air, gcost.trn, gcost.bus) ~
sm.os(gcost.air, gcost.trn, gcost.bus) +
sm.os(gcost.trn, gcost.bus, gcost.air) +
sm.os(gcost.bus, gcost.air, gcost.trn),
sm.os(incom.air, incom.trn, incom.bus) ~
sm.os(incom.air, incom.trn, incom.bus) +
sm.os(incom.trn, incom.bus, incom.air) +
sm.os(incom.bus, incom.air, incom.trn),
wait ~ wait.air + wait.trn + wait.bus),
form2 = ~ sm.os(gcost.air, gcost.trn, gcost.bus) +
sm.os(gcost.trn, gcost.bus, gcost.air) +
sm.os(gcost.bus, gcost.air, gcost.trn) +
wait +
sm.os(incom.air, incom.trn, incom.bus) +
sm.os(incom.trn, incom.bus, incom.air) +
sm.os(incom.bus, incom.air, incom.trn) +
junkx2 + ns(junkx2, 4) +
incom.air + incom.trn + incom.bus +
gcost.air + gcost.trn + gcost.bus +
wait.air + wait.trn + wait.bus)
par(mfrow = c(2, 2))
plot(tfit2, se = TRUE, lcol = "orange", scol = "blue", ylim = c(-4, 4))
summary(tfit2)
} # }