stepmented relationships in regression models
stepmented.RdFits regression models with stepmented (i.e. piecewise-constant) relationships between the response and one or more explanatory variables. Break-point estimates are provided.
Usage
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
keep.class=FALSE, var.psi=FALSE, ...)
<!-- %\method{stepmented}{default}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) -->
# S3 method for class 'lm'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
keep.class=FALSE, var.psi=FALSE, ...)
# S3 method for class 'glm'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
keep.class=FALSE, var.psi=FALSE, ...)
# S3 method for class 'numeric'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
keep.class=FALSE, var.psi=FALSE, ...,
pertV=0, centerX=FALSE, adjX=NULL, weights=NULL)
# S3 method for class 'ts'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
keep.class=FALSE, var.psi=FALSE, ...,
pertV=0, centerX=FALSE, adjX=NULL)
<!-- %\method{stepmented}{Arima}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), -->
<!-- % model = TRUE, keep.class=FALSE, ...) -->Arguments
- obj
A standard `linear' regression model of class "lm" or "glm". Alternatively, a simple "ts" object or a simple data vector may be supplied.
- seg.Z
the stepmented variables(s), i.e. the numeric covariate(s) understood to have a piecewise-constant relationship with response. It is a formula with no response variable, such as
seg.Z=~xorseg.Z=~x1+x2. Currently, formulas involving functions, such asseg.Z=~log(x1), or selection operators, such asseg.Z=~d[,"x1"]orseg.Z=~d$x1, are not allowed. Also, variable names formed byUorVonly (with or without numbers ) are not permitted. If missing, the index variableid=1,2,..,nis used. Forstepmented.ts,seg.Zis usually unspecified as the (time) covariate is obtained by thetsobject itself.- psi
starting values for the breakpoints to be estimated. If there is a single stepmented variable specified in
seg.Z,psican be a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the stepmented variable is used as a starting value). Ifseg.Zincludes several covariates,psihas to be specified as a named list of vectors whose names have to match the variables in theseg.Zargument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable inseg.Z. ANAvalue means that `K' quantiles (or equally spaced values) are used as starting values;Kis fixed via theseg.controlauxiliary function.- npsi
A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument
quantinseg.control.npsican be missing andnpsi=1is assumed for all variables specified inseg.Z. Ifpsiis provided,npsiis ignored.- fixed.psi
An optional named list including the breakpoint values to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in
seg.Z. If there is a single variable inseg.Z, a simple numeric vector can be specified. Note that, in addition to the values specified here,stepmentedwill estimate additional breakpoints. To keep fixed all breakpoints (to be specified inpsi) useit.max=0inseg.control- control
a list of parameters for controlling the fitting process. See the documentation for
seg.controlfor details.
- keep.class
logical value indicating if the final fit returned by
stepmented.defaultshould keep the class 'stepmented' (along with the class of the original fitobj). Ignored by the stepmented methods.- ...
optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via
lmorglmfor instance), such asweightsoroffet, have to be included in the starting modelobj.- pertV
Only for
stepmented.tsandstepmented.numeric.- centerX
Only for
stepmented.tsandstepmented.numeric. IfTRUE, the covariate is centered before fitting.- adjX
Only for
stepmented.tsandstepmented.numeric. If the response vector leads to covariate with large values (such as years for ts objects),adjX=TRUEwill shift the covariate to have a zero origin. Default isNULLwhich meansTRUEif the minimum of covariate is 1000 or larger.- var.psi
logical. If
TRUE, the estimate covariance matrix is also computed viavcov.stepmented, thus the breakpoint standard errors are also included in thepsicomponent of the returned object. Default isFALSE, as computing the estimate covariance matrix is somewhat time-consuming when the sample size is large.- weights
possible weights to include in the estimation process (only for
stepmented.numeric).
Details
Given a linear regression model (usually of class "lm" or "glm"), stepmented tries to estimate
a new regression model having piecewise-constant (i.e. step-function like) relationships with the variables specified in seg.Z.
A stepmented relationship is defined by the mean level
parameters and the break-points where the mean level changes. The number of breakpoints
of each stepmented relationship depends on the psi argument, where initial
values for the break-points must be specified. The model
is estimated simultaneously yielding point estimates and relevant approximate
standard errors of all the model parameters, including the break-points.
stepmented implements the algorithm described in Fasola et al. (2018) along with bootstrap restarting
(Wood, 2001) to escape local optima. The procedure turns out to be particularly appealing and efficient
when there are two or more covariates exhibiting different change points to be estimated.
See also section `Note' below.
Value
The returned object is of class "stepmented" which inherits
from the class "lm" or "glm" depending on the class of obj. When only.mean=FALSE, it is a list having two 'stepmented' fits (for the mean and for the dispersion submodels).
An object of class "stepmented" is a list containing the components of the
original object obj with additionally the followings:
- psi
estimated break-points and relevant (approximate) standard errors (on the continuum)
- psi.rounded
the rounded estimated break-points (see Note, below)
- it
number of iterations employed
- epsilon
difference in the objective function when the algorithm stops
- model
the model frame
- psi.history
a list or a vector including the breakpoint estimates at each step
- seed
the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed
- ..
Other components are not of direct interest of the user
Note
Assuming a single changepoint \(\psi\) for the covariate \(x\), the underlying fitted stepmented relationship is
\(\beta_0+\beta_1 \ I(x>\psi)\), namely the fitted value (on the linear predictor scale) is \(\beta_0\) if \(x\le \psi\), and \(\beta_0+\beta_1\) when
\(x > \psi\). While the point estimate \(\hat\psi\) returned (in the psi component of the fit object)
is a unique real number, actually there exist infinite solutions in the range \([a, b)\) where the extremes are the two
closest observed covariate values
such hat \(a \le \hat\psi<b\). The component psi.rounded of the fit object includes the rounded changepoint values
which can be taken as the final estimates. More specifically, each column of psi.rounded
represents a changepoint and the corresponding rows are the extremes of the `optimal' interval \([a, b)\).
The first row, i.e. the lower bound of the interval (\(a\) in the above example),
is taken as point estimate. print.stepmented, print.summary.stepmented,
and confint.stepmented return the rounded (lower) value of the interval.
Also:
The algorithm will start if the
it.maxargument returned byseg.controlis greater than zero. Ifit.max=0stepmentedwill estimate a new linear model with break-point(s) fixed at the starting values reported inpsi. Alternatively, it is also possible to seth=0inseg.control(). In this case, bootstrap restarting is unncessary, then to have changepoints atmypsitypestepmented(.., psi=mypsi, control=seg.control(h=0, n.boot=0, it.max=1))In the returned fit object, `U.' is put before the name of the stepmented variable to indicate the difference in the mean levels.
slopecan be used to compute the actual mean levels corresponding to the different intervals.Currently methods specific to the class
"stepmented"areOthers are inherited from the class
"lm"or"glm"depending on the class ofobj.
References
Fasola S, Muggeo VMR, Kuchenhoff H (2018) A heuristic, iterative algorithm for change-point detection in abrupt change models, Computational Statistics 33, 997–1015
Author
Vito M. R. Muggeo, vito.muggeo@unipa.it (based on original code by Salvatore Fasola)
Examples
n=20
x<-1:n/n
mu<- 2+ 1*(x>.6)
y<- mu + rnorm(n)*.8
#fitting via regression model
os <-stepmented(lm(y~1),~x)
y<-ts(y)
os1<- stepmented(y) #the 'ts' method
os2<- stepmented(y, npsi=2)
#plot(y)
#plot(os1, add=TRUE)
#plot(os2, add=TRUE, col=3:5)
### Example with (poisson) GLM
y<- rpois(n,exp(mu))
o<-stepmented(glm(y~1,family=poisson))
plot(o, res=TRUE)
if (FALSE) { # \dontrun{
## Example using the (well-known) Nile dataset
data(Nile)
plot(Nile)
os<- stepmented(Nile)
plot(os, add=TRUE)
### Example with (binary) GLM (example from the package stepR)
set.seed(1234)
y <- rbinom(200, 1, rep(c(0.1, 0.7, 0.3, 0.9), each=50))
o<-stepmented(glm(y~1,family=binomial), npsi=3)
plot(o, res=TRUE)
### Two stepmented covariates (with 1 and 2 breakpoints); z has also an additional linear effect
n=100
x<-1:n/n
z<-runif(n,2,5)
mu<- 2+ 1*(x>.6)-2*(z>3)+3*(z>4)+z
y<- mu + rnorm(n)*.8
os <-stepmented(lm(y~z),~x+z, npsi=c(x=1,z=2))
os
summary(os)
## see ?plot.stepmented
} # }