Ordinal Regression with Cumulative Probabilities
cumulative.RdFits a cumulative link regression model to a (preferably ordered) factor response.
Usage
cumulative(link = "logitlink", parallel = FALSE,
reverse = FALSE, multiple.responses = FALSE,
ynames = FALSE, Thresh = NULL, Trev = reverse,
Tref = if (Trev) "M" else 1, Intercept = NULL,
whitespace = FALSE)Arguments
- link
Link function applied to the \(J\) cumulative probabilities. See
Linksfor more choices, e.g., for the cumulativeprobitlink/clogloglink/... models.
- parallel
A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption. The default results in what some people call the generalized ordered logit model to be fitted. If
parallel = TRUEthen it does not apply to the intercept.The partial proportional odds model can be fitted by assigning this argument something like
parallel = TRUE ~ -1 + x3 + x5so that there is one regression coefficient forx3andx5. Equivalently, settingparallel = FALSE ~ 1 + x2 + x4means \(M\) regression coefficients for the intercept andx2andx4. It is important that the intercept is never parallel. SeeCommonVGAMffArgumentsfor more information.
- reverse
Logical. By default, the cumulative probabilities used are \(P(Y\leq 1)\), \(P(Y\leq 2)\), ..., \(P(Y\leq J)\). If
reverseisTRUEthen \(P(Y\geq 2)\), \(P(Y\geq 3)\), ..., \(P(Y\geq J+1)\) are used.- ynames
See
multinomialfor information.- multiple.responses
Logical. Multiple responses? If
TRUEthen the input should be a matrix with values \(1,2,\dots,L\), where \(L=J+1\) is the number of levels. Each column of the matrix is a response, i.e., multiple responses. A suitable matrix can be obtained fromCut.
- Thresh
Character. The choices concern constraint matrices applied to the intercepts. They can be constrained to be equally-spaced (equidistant) etc. See
CommonVGAMffArgumentsandconstraintsfor general information. Basically, the choice is pasted to the end of"CM."and that function is called. This means users can easily write their ownCM.-type function.If equally-spaced then the direction and the reference level are controlled by
TrevandTref, and the constraint matrix will be \(M\) by 2, with the second column corresponding to the distance between the thresholds.If
"symm1"then the fitted intercepts are symmetric about the median (\(M\) odd) intercept. If \(M\) is even then the median is the mean of the two most inner and adjacent intercepts. For this,CM.symm1is used to construct the appropriate constraint matrix.If
"symm0"then the median intercept is 0 by definition and the symmetry occurs about the origin. Thus the intercepts comprise pairs that differ by sign only. The appropriate constraint matrix is as with"symm1"but with the first column deleted. The choices"symm1"and"symm0"are effectively equivalent to"symmetric"and"symmetric2"respectively in ordinal.For
"qnorm"thenCM.qnormuses theqnorm((1:M)/(M+1))quantiles of the standard normal.- Trev, Tref
Support arguments for
Threshfor equally-spaced intercepts. The logical argumentTrevis applied first to give the direction (i.e., ascending or descending) before rowTref(ultimately numeric) of the first (intercept) constraint matrix is set to the reference level. Seeconstraintsfor information.- Intercept
Another support argument for
Thresh: does the constraint matrix include an intercept? TheNULLdefault means to use the default of the"CM."function.- whitespace
See
CommonVGAMffArgumentsfor information.
Details
In this help file the response \(Y\) is assumed
to be a factor with ordered values \(1,2,\dots,J+1\).
Hence \(M\) is the number of linear/additive
predictors \(\eta_j\);
for cumulative() one has \(M=J\).
This VGAM family function fits the class of
cumulative link models to (hopefully)
an ordinal response.
By default,
the non-parallel cumulative logit model
is fitted, i.e.,
$$\eta_j = logit(P[Y \leq j])$$
where \(j=1,2,\dots,M\) and
the \(\eta_j\) are not constrained to
be parallel.
This is also known as the
non-proportional odds model.
If the logit link is replaced by a
complementary log-log link
(clogloglink) then
this is known as the proportional-hazards model.
In almost all the literature, the constraint matrices
associated with this family of models are known.
For example, setting
parallel = TRUE will make all constraint
matrices
(except for the intercept) equal to a vector
of \(M\) 1's.
If the constraint matrices are equal, unknown and to
be estimated,
then this can be achieved by fitting the model as a
reduced-rank vector generalized
linear model (RR-VGLM; see rrvglm).
Currently,
reduced-rank vector generalized additive models
(RR-VGAMs) have not been implemented here.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm,
and vgam.
References
Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.
Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ, USA: Wiley.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.
Pujol-Rigol, S., Fernandez, D. and Casals, M. (2025). A systematic review and comparative study of R packages for ordinal response regression models. WIREs Computational Statistics, 17, e70025.
Tutz, G. (2012). Regression for Categorical Data, Cambridge: Cambridge University Press.
Tutz, G. and Berger, M. (2022). Sparser ordinal regression models based on parametric and additive location-shift approaches. International Statistical Review, 90, 306–327. doi:10.1111/insr.12484 .
Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1–34. doi:10.18637/jss.v032.i10 .
Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Note
The response should be either a matrix of counts
(with row sums that
are all positive), or a factor. In both cases,
the y slot
returned by
vglm/vgam/rrvglm
is the matrix
of counts.
The formula must contain an intercept term.
Other VGAM family functions for an ordinal
response include
acat,
cratio,
sratio.
For a nominal (unordered) factor response, the
multinomial logit model (multinomial)
is more appropriate.
With the logit link, setting parallel =
TRUE will fit a proportional odds model. Note
that the TRUE here does not apply to
the intercept term. In practice, the validity
of the proportional odds assumption
needs to be checked, e.g., by a likelihood
ratio test (LRT). If acceptable on the data,
then numerical problems are less likely
to occur during the fitting, and there are
less parameters. Numerical problems occur
when the linear/additive predictors cross,
which results in probabilities outside of
\((0,1)\); setting parallel = TRUE
will help avoid this problem.
Here is an example of the usage of the parallel
argument.
If there are covariates x2, x3 and
x4, then
parallel = TRUE ~ x2 + x3 -1 and
parallel = FALSE ~ x4 are equivalent.
This would constrain the regression coefficients
for x2 and x3 to be equal;
those of the intercepts and x4 would be
different.
If the data is inputted in long format
(not wide format, as in pneumo
below)
and the self-starting initial values are not
good enough then try using
mustart,
coefstart and/or
etatstart.
See the example below.
To fit the proportional odds model one can use the
VGAM family function propodds.
Note that propodds(reverse) is equivalent to
cumulative(parallel = TRUE, reverse = reverse)
(which is equivalent to
cumulative(parallel =
TRUE, reverse = reverse, link = "logitlink")).
It is for convenience only. A call to
cumulative() is preferred since it reminds
the user that a parallelism assumption is made,
as well as being a lot more flexible.
Category specific effects may be modelled using
the xij-facility; see
vglm.control and fill1.
With most Threshold choices,
the first few fitted regression coefficients
need care in their interpretation. For example,
some values could be the distance away from
the median intercept. Typing something
like constraints(fit)[[1]] gives the
constraint matrix of the intercept term.
Warning
No check is made to verify that the response is ordinal
if the response is a matrix;
see ordered.
Boersch-Supan (2021) looks at sparse data and
the numerical problems that result;
see sratio.
Examples
# Proportional odds model (p.179) of McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = TRUE, reverse = TRUE), pneumo))
#>
#> Call:
#> vglm(formula = cbind(normal, mild, severe) ~ let, family = cumulative(parallel = TRUE,
#> reverse = TRUE), data = pneumo)
#>
#>
#> Coefficients:
#> (Intercept):1 (Intercept):2 let
#> -9.676093 -10.581725 2.596807
#>
#> Degrees of Freedom: 16 Total; 13 Residual
#> Residual deviance: 5.026826
#> Log-likelihood: -25.09026
depvar(fit) # Sample proportions (good technique)
#> normal mild severe
#> 1 1.0000000 0.00000000 0.00000000
#> 2 0.9444444 0.03703704 0.01851852
#> 3 0.7906977 0.13953488 0.06976744
#> 4 0.7291667 0.10416667 0.16666667
#> 5 0.6274510 0.19607843 0.17647059
#> 6 0.6052632 0.18421053 0.21052632
#> 7 0.4285714 0.21428571 0.35714286
#> 8 0.3636364 0.18181818 0.45454545
fit@y # Sample proportions (bad technique)
#> normal mild severe
#> 1 1.0000000 0.00000000 0.00000000
#> 2 0.9444444 0.03703704 0.01851852
#> 3 0.7906977 0.13953488 0.06976744
#> 4 0.7291667 0.10416667 0.16666667
#> 5 0.6274510 0.19607843 0.17647059
#> 6 0.6052632 0.18421053 0.21052632
#> 7 0.4285714 0.21428571 0.35714286
#> 8 0.3636364 0.18181818 0.45454545
weights(fit, type = "prior") # Number of observations
#> [,1]
#> 1 98
#> 2 54
#> 3 43
#> 4 48
#> 5 51
#> 6 38
#> 7 28
#> 8 11
coef(fit, matrix = TRUE)
#> logitlink(P[Y>=2]) logitlink(P[Y>=3])
#> (Intercept) -9.676093 -10.581725
#> let 2.596807 2.596807
constraints(fit) # Constraint matrices
#> $`(Intercept)`
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
#>
#> $let
#> [,1]
#> [1,] 1
#> [2,] 1
#>
apply(fitted(fit), 1, which.max) # Classification
#> 1 2 3 4 5 6 7 8
#> 1 1 1 1 1 1 1 3
apply(predict(fit, newdata = pneumo, type = "response"),
1, which.max) # Classification
#> 1 2 3 4 5 6 7 8
#> 1 1 1 1 1 1 1 3
R2latvar(fit)
#> [1] 0.5142885
# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
cumulative(reverse = TRUE), data = pneumo)
if (FALSE) { # \dontrun{
plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2) } # }
# Check the proportional odds assumption with a LRT -----
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = FALSE, reverse = TRUE), pneumo))
#>
#> Call:
#> vglm(formula = cbind(normal, mild, severe) ~ let, family = cumulative(parallel = FALSE,
#> reverse = TRUE), data = pneumo)
#>
#>
#> Coefficients:
#> (Intercept):1 (Intercept):2 let:1 let:2
#> -9.593308 -11.104791 2.571300 2.743550
#>
#> Degrees of Freedom: 16 Total; 12 Residual
#> Residual deviance: 4.884404
#> Log-likelihood: -25.01905
pchisq(2 * (logLik(fit3) - logLik(fit)), df =
length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
#> [1] 0.7058849
lrtest(fit3, fit) # More elegant
#> Likelihood ratio test
#>
#> Model 1: cbind(normal, mild, severe) ~ let
#> Model 2: cbind(normal, mild, severe) ~ let
#> #Df LogLik Df Chisq Pr(>Chisq)
#> 1 12 -25.019
#> 2 13 -25.090 1 0.1424 0.7059
# A factor() version of fit ----------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs) # apply(Nobs, 2, sum)
pneumo.long <-
data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
times = c(t(Nobs))),
levels = colnames(Nobs)),
let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms)) # Should be same as pneumo
#> symptoms
#> let normal mild severe
#> 1.75785791755237 98 0 0
#> 2.70805020110221 51 2 1
#> 3.06805293513362 34 6 3
#> 3.31418600467253 35 5 8
#> 3.51154543883102 32 10 9
#> 3.67630067190708 23 7 8
#> 3.8286413964891 12 6 10
#> 3.94158180766969 4 2 5
(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
cumulative(parallel = TRUE, reverse = TRUE)))
#> Iteration 1: deviance = 428.98474
#> Iteration 2: deviance = 411.73439
#> Iteration 3: deviance = 408.69065
#> Iteration 4: deviance = 408.54867
#> Iteration 5: deviance = 408.54833
#> Iteration 6: deviance = 408.54833
#>
#> Call:
#> vglm(formula = symptoms ~ let, family = cumulative(parallel = TRUE,
#> reverse = TRUE), data = pneumo.long, trace = TRUE)
#>
#>
#> Coefficients:
#> (Intercept):1 (Intercept):2 let
#> -9.676092 -10.581724 2.596806
#>
#> Degrees of Freedom: 742 Total; 739 Residual
#> Residual deviance: 408.5483
#> Log-likelihood: -204.2742
coef(fit.long1, matrix = TRUE) # cf. coef(fit, matrix = TRUE)
#> logitlink(P[Y>=2]) logitlink(P[Y>=3])
#> (Intercept) -9.676092 -10.581724
#> let 2.596806 2.596806
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
cumulative(parallel = TRUE, reverse = TRUE),
data = pneumo.long, trace = TRUE)
#> Iteration 1: deviance = 428.98474
#> Iteration 2: deviance = 411.73439
#> Iteration 3: deviance = 408.69065
#> Iteration 4: deviance = 408.54867
#> Iteration 5: deviance = 408.54833
#> Iteration 6: deviance = 408.54833
coef(fit.long2, matrix = TRUE) # cf. coef(fit, matrix = TRUE)
#> logitlink(P[Y>=2]) logitlink(P[Y>=3])
#> (Intercept) -9.676092 -10.581724
#> let 2.596806 2.596806