GARMA (Generalized Autoregressive Moving-Average) Models

Fits GARMA models to time series data.

Usage

garma(link = "identitylink", p.ar.lag = 1, q.ma.lag = 0,
      coefstart = NULL, step = 1)

Arguments

link

Link function applied to the mean response. The default is suitable for continuous responses. The link loglink should be chosen if the data are counts. The link reciprocal can be chosen if the data are counts and the variance assumed for this is $\mu^2$. The links logitlink, probitlink, clogloglink, and cauchitlink are supported and suitable for binary responses.

Note that when the log or logit link is chosen: for log and logit, zero values can be replaced by bvalue. See loglink and logitlink etc. for specific information about each link function.

p.ar.lag

A positive integer, the lag for the autoregressive component. Called $p$ below.

q.ma.lag

A non-negative integer, the lag for the moving-average component. Called $q$ below.

coefstart

Starting values for the coefficients. Assigning this argument is highly recommended. For technical reasons, the argument coefstart in vglm cannot be used.

step

Numeric. Step length, e.g., 0.5 means half-stepsizing.

Details

This function draws heavily on Benjamin et al. (1998). See also Benjamin et al. (2003). GARMA models extend the ARMA time series model to generalized responses in the exponential family, e.g., Poisson counts, binary responses. Currently, this function is rudimentary and can handle only certain continuous, count and binary responses only. The user must choose an appropriate link for the link argument.

The GARMA($p, q$) model is defined by firstly having a response belonging to the exponential family $$f(y_t|D_t) = \exp \left\{ \frac{y_t \theta_t - b(\theta_t)}{\phi / A_t} + c(y_t, \phi / A_t) \right\}$$ where $\theta_t$ and $\phi$ are the canonical and scale parameters respectively, and $A_t$ are known prior weights. The mean $\mu_t=E(Y_t|D_t)=b'(\theta_t)$ is related to the linear predictor $\eta_t$ by the link function $g$. Here, $D_t=\{x_t,\ldots,x_1,y_{t-1},\ldots,y_1,\mu_{t-1},\ldots,\mu_1\}$ is the previous information set. Secondly, the GARMA($p, q$) model is defined by $$g(\mu_t) = \eta_t = x_t^T \beta + \sum_{k=1}^p \phi_k (g(y_{t-k}) - x_{t-k}^T \beta) + \sum_{k=1}^q \theta_k (g(y_{t-k}) - \eta_{t-k}).$$ Parameter vectors $\beta$, $\phi$ and $\theta$ are estimated by maximum likelihood.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm.

References

Benjamin, M. A., Rigby, R. A. and Stasinopoulos, M. D. (1998). Fitting Non-Gaussian Time Series Models. Pages 191–196 in: Proceedings in Computational Statistics COMPSTAT 1998 by Payne, R. and P. J. Green. Physica-Verlag.

Benjamin, M. A., Rigby, R. A. and Stasinopoulos, M. D. (2003). Generalized Autoregressive Moving Average Models. Journal of the American Statistical Association, 98: 214–223.

Zeger, S. L. and Qaqish, B. (1988). Markov regression models for time series: a quasi-likelihood approach. Biometrics, 44: 1019–1031.

Author

T. W. Yee

Note

This function is unpolished and is requires lots of improvements. In particular, initialization is very poor. Results appear very sensitive to quality of initial values. A limited amount of experience has shown that half-stepsizing is often needed for convergence, therefore choosing crit = "coef" is not recommended.

Overdispersion is not handled. For binomial responses it is currently best to input a vector of 1s and 0s rather than the cbind(successes, failures) because the initialize slot is rudimentary.

Warning

This VGAM family function is 'non-standard' in that the model does need some coercing to get it into the VGLM framework. Special code is required to get it running. A consequence is that some methods functions may give wrong results when applied to the fitted object.

Examples

gdata <- data.frame(interspike = c(68, 41, 82, 66, 101, 66, 57,  41,  27, 78,
59, 73,  6, 44,  72, 66, 59,  60,  39, 52,
50, 29, 30, 56,  76, 55, 73, 104, 104, 52,
25, 33, 20, 60,  47,  6, 47,  22,  35, 30,
29, 58, 24, 34,  36, 34,  6,  19,  28, 16,
36, 33, 12, 26,  36, 39, 24,  14,  28, 13,
 2, 30, 18, 17,  28,  9, 28,  20,  17, 12,
19, 18, 14, 23,  18, 22, 18,  19,  26, 27,
23, 24, 35, 22,  29, 28, 17,  30,  34, 17,
20, 49, 29, 35,  49, 25, 55,  42,  29, 16))  # See Zeger and Qaqish (1988)
gdata <- transform(gdata, spikenum = seq(interspike))
bvalue <- 0.1  # .Machine$double.xmin # Boundary value
fit <- eval(substitute(
  vglm(interspike ~ 1, trace = TRUE, data = gdata,
       garma(paste0("loglink(bvalue = ", .bvalue , ")"),
             p = 2, coefstart = c(4, 0.3, 0.4))),
  list( .bvalue = bvalue)))
#> Iteration 1: loglikelihood = -33517179
#> Taking a modified step.....
#> Iteration  1 :  loglikelihood = 9063.765
#> Iteration 2: loglikelihood = 9039.4617
#> Taking a modified step.
#> Iteration  2 :  loglikelihood = 9294.1272
#> Iteration 3: loglikelihood = 8776.2807
#> Taking a modified step..
#> Iteration  3 :  loglikelihood = 9311.6036
#> Iteration 4: loglikelihood = 9236.8891
#> Taking a modified step..
#> Iteration  4 :  loglikelihood = 9325.2546
#> Iteration 5: loglikelihood = 9303.4811
#> Taking a modified step..
#> Iteration  5 :  loglikelihood = 9327.5058
#> Iteration 6: loglikelihood = 9323.9082
#> Taking a modified step.
#> Iteration  6 :  loglikelihood = 9327.635
#> Iteration 7: loglikelihood = 9318.4243
#> Taking a modified step...
#> Iteration  7 :  loglikelihood = 9328.5283
#> Iteration 8: loglikelihood = 9329.2394
#> Iteration 9: loglikelihood = 9316.0423
#> Taking a modified step..
#> Iteration  9 :  loglikelihood = 9329.4185
#> Iteration 10: loglikelihood = 9329.8789
#> Iteration 11: loglikelihood = 9330.6516
#> Iteration 12: loglikelihood = 9330.6545
#> Iteration 13: loglikelihood = 9330.6617
#> Iteration 14: loglikelihood = 9330.6617
summary(fit)
#> 
#> Call:
#> vglm(formula = interspike ~ 1, family = garma(paste0("loglink(bvalue = ", 
#>     0.1, ")"), p = 2, coefstart = c(4, 0.3, 0.4)), data = gdata, 
#>     trace = TRUE)
#> 
#> Coefficients: 
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  3.71331    0.04117  90.191  < 2e-16 ***
#> (lag1)       0.33723    0.03076  10.962  < 2e-16 ***
#> (lag2)       0.24180    0.02988   8.094 5.79e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Name of linear predictor: loglink(mu) 
#> 
#> Log-likelihood: 9330.662 on 95 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 14 
#> 
coef(fit, matrix = TRUE)
#>             loglink(mu)
#> (Intercept)   3.7133065
#> (lag1)        0.3372322
#> (lag2)        0.2418008
Coef(fit)  # A bug here
#>        mu      <NA>      <NA> 
#> 40.989114  1.401064  1.273541 
if (FALSE)  with(gdata, plot(interspike, ylim = c(0, 120), las = 1,
     xlab = "Spike Number", ylab = "Inter-Spike Time (ms)", col = "blue"))
with(gdata, lines(spikenum[-(1:fit@misc$plag)], fitted(fit), col = "orange"))
#> Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet
abline(h = mean(with(gdata, interspike)), lty = "dashed", col = "gray")  # \dontrun{}
#> Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): plot.new has not been called yet