Ordinal Regression with Stopping Ratios

Fits a stopping ratio logit/probit/cloglog/cauchit/... regression model to an ordered (preferably) factor response.

Usage

sratio(link = "logitlink", parallel = FALSE, reverse = FALSE,
       zero = NULL, ynames = FALSE, Thresh = NULL, Trev = reverse,
       Tref = if (Trev) "M" else 1, Intercept = NULL, whitespace = FALSE)

Arguments

link

Link function applied to the \(M\) stopping ratio probabilities. See Links for more choices.

parallel

A logical, or formula specifying which terms have equal/unequal coefficients.

reverse

Logical. By default, the stopping ratios used are \(\eta_j = logit(P[Y=j|Y \geq j])\) for \(j=1,\dots,M\). If reverse is TRUE, then \(\eta_j = logit(P[Y=j+1|Y \leq j+1])\) will be used.

ynames

See multinomial for information.

zero

Can be an integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,...,\(M\)}. The default value means none are modelled as intercept-only terms. See CommonVGAMffArguments for information.

Thresh, Trev, Tref, Intercept

See cumulative for information. These arguments apply to ordinal categorical regression models.

whitespace

See CommonVGAMffArguments for information.

Details

In this help file the response \(Y\) is assumed to be a factor with ordered values \(1,2,\dots,M+1\), so that \(M\) is the number of linear/additive predictors \(\eta_j\).

There are a number of definitions for the continuation ratio in the literature. To make life easier, in the VGAM package, we use continuation ratios (see cratio) and stopping ratios. Continuation ratios deal with quantities such as logitlink(P[Y>j|Y>=j]).

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

References

Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.

Boersch-Supan, P. H. (2021). Modeling insect phenology using ordinal regression and continuation ratio models. ReScience C, 7.1, 1–14. doi:10.18637/jss.v032.i10 .

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Tutz, G. (2012). Regression for Categorical Data, Cambridge: Cambridge University Press.

Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1–34. doi:10.18637/jss.v032.i10 .

Author

Thomas W. Yee

Note

The response should be either a matrix of counts (with row sums that are all positive), or a factor. In both cases, the y slot returned by vglm/vgam/rrvglm is the matrix of counts.

For a nominal (unordered) factor response, the multinomial logit model (multinomial) is more appropriate.

Here is an example of the usage of the parallel argument. If there are covariates x1, x2 and x3, then parallel = TRUE ~ x1 + x2 -1 and parallel = FALSE ~ x3 are equivalent. This would constrain the regression coefficients for x1 and x2 to be equal; those of the intercepts and x3 would be different.

Warning

No check is made to verify that the response is ordinal if the response is a matrix; see ordered.

Boersch-Supan (2021) considers a sparse data set (called budworm) and the numerical problems encountered when fitting models such as cratio, sratio, cumulative. Although improvements to links such as clogloglink have been made, currently these family functions have not been properly adapted to handle sparse data as well as they could.

Examples

pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
             sratio(parallel = TRUE), data = pneumo))
#> 
#> Call:
#> vglm(formula = cbind(normal, mild, severe) ~ let, family = sratio(parallel = TRUE), 
#>     data = pneumo)
#> 
#> 
#> Coefficients:
#> (Intercept):1 (Intercept):2           let 
#>      8.733797      8.051302     -2.321359 
#> 
#> Degrees of Freedom: 16 Total; 13 Residual
#> Residual deviance: 7.626763 
#> Log-likelihood: -26.39023 
coef(fit, matrix = TRUE)
#>             logitlink(P[Y=1|Y>=1]) logitlink(P[Y=2|Y>=2])
#> (Intercept)               8.733797               8.051302
#> let                      -2.321359              -2.321359
constraints(fit)
#> $`(Intercept)`
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> 
#> $let
#>      [,1]
#> [1,]    1
#> [2,]    1
#> 
predict(fit)
#>   logitlink(P[Y=1|Y>=1]) logitlink(P[Y=2|Y>=2])
#> 1              4.6531774              3.9706824
#> 2              2.4474398              1.7649448
#> 3              1.6117442              0.9292491
#> 4              1.0403809              0.3578859
#> 5              0.5822388             -0.1002563
#> 6              0.1997827             -0.4827124
#> 7             -0.1538548             -0.8363499
#> 8             -0.4160301             -1.0985252
predict(fit, untransform = TRUE)
#>   P[Y=1|Y>=1] P[Y=2|Y>=2]
#> 1   0.9905587   0.9814886
#> 2   0.9203740   0.8538279
#> 3   0.8336534   0.7169229
#> 4   0.7389235   0.5885286
#> 5   0.6415824   0.4749569
#> 6   0.5497802   0.3816118
#> 7   0.4616120   0.3023041
#> 8   0.3974671   0.2500163