stepmented relationships in regression models

Fits regression models with stepmented (i.e. piecewise-constant) relationships between the response and one or more explanatory variables. Break-point estimates are provided.

Usage

stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
   keep.class=FALSE, var.psi=FALSE, ...)
<!-- %\method{stepmented}{default}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) -->
# S3 method for class 'lm'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
   keep.class=FALSE, var.psi=FALSE, ...)

# S3 method for class 'glm'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    keep.class=FALSE, var.psi=FALSE, ...)

# S3 method for class 'numeric'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    keep.class=FALSE, var.psi=FALSE, ...,
    pertV=0, centerX=FALSE, adjX=NULL, weights=NULL)

# S3 method for class 'ts'
stepmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    keep.class=FALSE, var.psi=FALSE, ...,
    pertV=0, centerX=FALSE, adjX=NULL)


<!-- %\method{stepmented}{Arima}(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), -->
<!-- %    model = TRUE, keep.class=FALSE, ...) -->

Arguments

obj: A standard `linear' regression model of class "lm" or "glm". Alternatively, a simple "ts" object or a simple data vector may be supplied.
seg.Z: the stepmented variables(s), i.e. the numeric covariate(s) understood to have a piecewise-constant relationship with response. It is a formula with no response variable, such as seg.Z=~x or seg.Z=~x1+x2. Currently, formulas involving functions, such as seg.Z=~log(x1), or selection operators, such as seg.Z=~d[,"x1"] or seg.Z=~d$x1, are not allowed. Also, variable names formed by U or V only (with or without numbers ) are not permitted. If missing, the index variable id=1,2,..,n is used. For stepmented.ts, seg.Z is usually unspecified as the (time) covariate is obtained by the ts object itself.
psi: starting values for the breakpoints to be estimated. If there is a single stepmented variable specified in seg.Z, psi can be a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the stepmented variable is used as a starting value). If seg.Z includes several covariates, psi has to be specified as a named list of vectors whose names have to match the variables in the seg.Z argument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable in seg.Z. A NA value means that `K' quantiles (or equally spaced values) are used as starting values; K is fixed via the seg.control auxiliary function.
npsi: A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument quant in seg.control. npsi can be missing and npsi=1 is assumed for all variables specified in seg.Z. If psi is provided, npsi is ignored.
fixed.psi: An optional named list including the breakpoint values to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in seg.Z. If there is a single variable in seg.Z, a simple numeric vector can be specified. Note that, in addition to the values specified here, stepmented will estimate additional breakpoints. To keep fixed all breakpoints (to be specified in psi) use it.max=0 in seg.control
control: a list of parameters for controlling the fitting process. See the documentation for seg.control for details.

keep.class: logical value indicating if the final fit returned by stepmented.default should keep the class 'stepmented' (along with the class of the original fit obj). Ignored by the stepmented methods.
...: optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via lm or glm for instance), such as weights or offet, have to be included in the starting model obj.
pertV: Only for stepmented.ts and stepmented.numeric.
centerX: Only for stepmented.ts and stepmented.numeric. If TRUE, the covariate is centered before fitting.
adjX: Only for stepmented.ts and stepmented.numeric. If the response vector leads to covariate with large values (such as years for ts objects), adjX=TRUE will shift the covariate to have a zero origin. Default is NULL which means TRUE if the minimum of covariate is 1000 or larger.
var.psi: logical. If TRUE, the estimate covariance matrix is also computed via vcov.stepmented, thus the breakpoint standard errors are also included in the psi component of the returned object. Default is FALSE, as computing the estimate covariance matrix is somewhat time-consuming when the sample size is large.
weights: possible weights to include in the estimation process (only for stepmented.numeric).

Details

Given a linear regression model (usually of class "lm" or "glm"), stepmented tries to estimate a new regression model having piecewise-constant (i.e. step-function like) relationships with the variables specified in seg.Z. A stepmented relationship is defined by the mean level parameters and the break-points where the mean level changes. The number of breakpoints of each stepmented relationship depends on the psi argument, where initial values for the break-points must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

stepmented implements the algorithm described in Fasola et al. (2018) along with bootstrap restarting (Wood, 2001) to escape local optima. The procedure turns out to be particularly appealing and efficient when there are two or more covariates exhibiting different change points to be estimated.

Value

The returned object is of class "stepmented" which inherits from the class "lm" or "glm" depending on the class of obj. When only.mean=FALSE, it is a list having two 'stepmented' fits (for the mean and for the dispersion submodels).

An object of class "stepmented" is a list containing the components of the original object obj with additionally the followings:

psi: estimated break-points and relevant (approximate) standard errors (on the continuum)
psi.rounded: the rounded estimated break-points (see Note, below)
it: number of iterations employed
epsilon: difference in the objective function when the algorithm stops
model: the model frame
psi.history: a list or a vector including the breakpoint estimates at each step
seed: the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed
..: Other components are not of direct interest of the user

Note

Assuming a single changepoint $\psi$ for the covariate $x$, the underlying fitted stepmented relationship is $\beta_0+\beta_1 \ I(x>\psi)$, namely the fitted value (on the linear predictor scale) is $\beta_0$ if $x\le \psi$, and $\beta_0+\beta_1$ when $x > \psi$. While the point estimate $\hat\psi$ returned (in the psi component of the fit object) is a unique real number, actually there exist infinite solutions in the range $[a, b)$ where the extremes are the two closest observed covariate values such hat $a \le \hat\psi<b$. The component psi.rounded of the fit object includes the rounded changepoint values which can be taken as the final estimates. More specifically, each column of psi.rounded represents a changepoint and the corresponding rows are the extremes of the `optimal' interval $[a, b)$. The first row, i.e. the lower bound of the interval ($a$ in the above example), is taken as point estimate. print.stepmented, print.summary.stepmented, and confint.stepmented return the rounded (lower) value of the interval.

Also:

The algorithm will start if the it.max argument returned by seg.control is greater than zero. If it.max=0 stepmented will estimate a new linear model with break-point(s) fixed at the starting values reported in psi. Alternatively, it is also possible to set h=0 in seg.control(). In this case, bootstrap restarting is unncessary, then to have changepoints at mypsi type

stepmented(.., psi=mypsi, control=seg.control(h=0, n.boot=0, it.max=1))
In the returned fit object, `U.' is put before the name of the stepmented variable to indicate the difference in the mean levels. slope can be used to compute the actual mean levels corresponding to the different intervals.
Currently methods specific to the class "stepmented" are
- print.stepmented
- summary.stepmented
- print.summary.stepmented
- plot.stepmented
- confint.stepmented
- vcov.stepmented
- lines.stepmented
Others are inherited from the class "lm" or "glm" depending on the class of obj.

References

Fasola S, Muggeo VMR, Kuchenhoff H (2018) A heuristic, iterative algorithm for change-point detection in abrupt change models, Computational Statistics 33, 997–1015

Author

Vito M. R. Muggeo, vito.muggeo@unipa.it (based on original code by Salvatore Fasola)

Examples


n=20
x<-1:n/n
mu<- 2+ 1*(x>.6)
y<- mu + rnorm(n)*.8

#fitting via regression model
os <-stepmented(lm(y~1),~x)

y<-ts(y)
os1<- stepmented(y)  #the 'ts' method
os2<- stepmented(y, npsi=2)
#plot(y)
#plot(os1, add=TRUE)
#plot(os2, add=TRUE, col=3:5)


### Example with (poisson) GLM
y<- rpois(n,exp(mu))
o<-stepmented(glm(y~1,family=poisson))
plot(o, res=TRUE)


if (FALSE) { # \dontrun{

## Example using the (well-known) Nile dataset
data(Nile)
plot(Nile)
os<- stepmented(Nile)
plot(os, add=TRUE)


### Example with (binary) GLM (example from the package stepR)
set.seed(1234)
y <- rbinom(200, 1, rep(c(0.1, 0.7, 0.3, 0.9), each=50))
o<-stepmented(glm(y~1,family=binomial), npsi=3)
plot(o, res=TRUE)

### Two stepmented covariates (with 1 and 2 breakpoints); z has also an additional linear effect
n=100
x<-1:n/n
z<-runif(n,2,5)
mu<- 2+ 1*(x>.6)-2*(z>3)+3*(z>4)+z
y<- mu + rnorm(n)*.8

os <-stepmented(lm(y~z),~x+z, npsi=c(x=1,z=2))
os
summary(os)

## see ?plot.stepmented
} # }