Cumulative link models
clmOld.RdA new improved implementation of CLMs is available in clm.
Fits cumulative link models with an additive model for the location and a multiplicative model for the scale. The function allows for structured thresholds. A popular special case of a CLM is the proportional odds model. In addition to the standard link functions, two flexible link functions, "Arandar-Ordaz" and "log-gamma" are available, where an extra link function parameter provides additional flexibility. A subset of the predictors can be allowed to have nominal rather than ordinal effects. This has been termed "partial proportional odds" when the link is the logistic.
Arguments
- location
a formula expression as for regression models, of the form
response ~ predictors. The response should be a factor (preferably an ordered factor), which will be interpreted as an ordinal response with levels ordered as in the factor. The model must have an intercept: attempts to remove one will lead to a warning and will be ignored. An offset may be used. See the documentation offormulafor other details.- scale
a optional formula expression as for the location part, of the form
~ predictors, i.e. with an empty left hand side. An offset may be used. See the documentation offormulafor other details.- nominal
an optional formula of the form
~ predictors, i.e. with an empty left hand side. The effects of the predictors in this formula are assumed to nominal.- data
an optional data frame in which to interpret the variables occurring in the formulas.
- weights
optional case weights in fitting. Defaults to 1.
- start
initial values for the parameters in the format
c(alpha, beta, log(zeta), lambda).- subset
expression saying which subset of the rows of the data should be used in the fit. All observations are included by default.
- na.action
a function to filter missing data. Applies to terms in all three formulae.
- contrasts
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula.
- Hess
logical for whether the Hessian (the inverse of the observed information matrix) should be computed. Use
Hess = TRUEif you intend to callsummaryorvcovon the fit andHess = FALSEin all other instances to save computing time. The argument is ignored ifmethod = "Newton"where the Hessian is always computed and returned. Defaults toTRUE.- model
logical for whether the model frames should be part of the returned object.
- link
link function, i.e. the type of location-scale distribution assumed for the latent distribution. The
Aranda-Ordazandlog-gammalinks add additional flexibility with a link function parameter,lambda. TheAranda-Ordazlink (Aranda-Ordaz, 1983) equals the logistic link, whenlambda = 1and approaches thelogloglink whenlambdaapproaches zero. Thelog-gammalink (Genter and Farewell, 1985) equals thelogloglink whenlambda = 1, theprobitlink whenlambda = 0and theclogloglink whenlambda = -1.- lambda
numerical scalar: the link function parameter. Used in combination with link
Aranda-Ordazorlog-gammaand otherwise ignored. If lambda is specified, the model is estimated with lambda fixed at this value and otherwise lambda is estimated by ML. ForAranda-Ordazlambda has to be positive;> 1e-5for numerical reasons.- doFit
logical for whether the model should be fit or the model environment should be returned.
- control
a call to
clm2.control.- threshold
specifies a potential structure for the thresholds (cut-points).
"flexible"provides the standard unstructured thresholds,"symmetric"restricts the distance between the thresholds to be symmetric around the central one or two thresholds for odd or equal numbers or thresholds respectively, and"equidistant"restricts the distance between consecutive thresholds to the same value.- ...
additional arguments are passed on to
clm2.controland possibly further on to the optimizer, which can lead to surprising error or warning messages when mistyping arguments etc.
Details
There are methods for the standard model-fitting functions, including
summary, vcov,
predict,
anova, logLik,
profile,
plot.profile,
confint,
update,
dropterm,
addterm, and an extractAIC method.
The design of the implementation is inspired by an idea proposed by Douglas Bates in the talk "Exploiting sparsity in model matrices" presented at the DSC conference in Copenhagen, July 14 2009. Basically an environment is set up with all the information needed to optimize the likelihood function. Extractor functions are then used to get the value of likelihood at current or given parameter values and to extract current values of the parameters. All computations are performed inside the environment and relevant variables are updated during the fitting process. After optimizer termination relevant variables are extracted from the environment and the remaining are discarded.
Some aspects of clm2, for instance, how starting values are
obtained, and of the associated methods are
inspired by polr from package MASS.
Value
If doFit = FALSE the result is an environment
representing the model ready to be optimized.
If doFit = TRUE the result is an
object of class "clm2" with the following components:
- beta
the parameter estimates of the location part.
- zeta
the parameter estimates of the scale part on the log scale; the scale parameter estimates on the original scale are given by
exp(zeta).- Alpha
vector or matrix of the threshold parameters.
- Theta
vector or matrix of the thresholds.
- xi
vector of threshold parameters, which, given a threshold function (e.g.
"equidistant"), and possible nominal effects define the class boundaries,Theta.- lambda
the value of lambda if lambda is supplied or estimated, otherwise missing.
- coefficients
the coefficients of the intercepts (
theta), the location (beta), the scale (zeta), and the link function parameter (lambda).- df.residual
the number of residual degrees of freedoms, calculated using the weights.
- fitted.values
vector of fitted values for each observation. An observation here is each of the scalar elements of the multinomial table and not a multinomial vector.
- convergence
TRUEif the gradient based convergence criterion is met andFALSEotherwise.- gradient
vector of gradients for all the parameters at termination of the optimizer.
- optRes
list with results from the optimizer. The contents of the list depends on the choice of optimizer.
- logLik
the log likelihood of the model at optimizer termination.
- Hessian
if the model was fitted with
Hess = TRUE, this is the Hessian matrix of the parameters at the optimum.- scale
model.framefor the scale model.- location
model.framefor the location model.- nominal
model.framefor the nominal model.- edf
the (effective) number of degrees of freedom used by the model.
- start
the starting values.
- convTol
convergence tolerance for the maximum absolute gradient of the parameters at termination of the optimizer.
- method
character, the optimizer.
- y
the response variable.
- lev
the names of the levels of the response variable.
- nobs
the (effective) number of observations, calculated as the sum of the weights.
- threshold
character, the threshold function used in the model.
- estimLambda
1if lambda is estimated in one of the flexible link functions and0otherwise.- link
character, the link function used in the model.
- call
the matched call.
- contrasts
contrasts applied to terms in location and scale models.
- na.action
the function used to filter missing data.
References
Agresti, A. (2002) Categorical Data Analysis. Second edition. Wiley.
Aranda-Ordaz, F. J. (1983) An Extension of the Proportional-Hazards Model for Grouped Data. Biometrics, 39, 109-117.
Genter, F. C. and Farewell, V. T. (1985) Goodness-of-link testing in ordinal regression models. The Canadian Journal of Statistics, 13(1), 37-44.
Christensen, R. H. B., Cleaver, G. and Brockhoff, P. B. (2011) Statistical and Thurstonian models for the A-not A protocol with and without sureness. Food Quality and Preference, 22, pp. 542-549.
Examples
options(contrasts = c("contr.treatment", "contr.poly"))
## A tabular data set:
(tab26 <- with(soup, table("Product" = PROD, "Response" = SURENESS)))
#> Response
#> Product 1 2 3 4 5 6
#> Ref 132 161 65 41 121 219
#> Test 96 99 50 57 156 650
dimnames(tab26)[[2]] <- c("Sure", "Not Sure", "Guess", "Guess", "Not Sure", "Sure")
dat26 <- expand.grid(sureness = as.factor(1:6), prod = c("Ref", "Test"))
dat26$wghts <- c(t(tab26))
m1 <- clm2(sureness ~ prod, scale = ~prod, data = dat26,
weights = wghts, link = "logistic")
## print, summary, vcov, logLik, AIC:
m1
#> Call:
#> clm2(location = sureness ~ prod, scale = ~prod, data = dat26,
#> weights = wghts, link = "logistic")
#>
#> Location coefficients:
#> prodTest
#> 1.295878
#>
#> Scale coefficients:
#> prodTest
#> 0.1479862
#>
#> Threshold coefficients:
#> 1|2 2|3 3|4 4|5 5|6
#> -1.4912570 -0.4521846 -0.1072083 0.1633653 0.8829135
#>
#> log-likelihood: -2687.745
#> AIC: 5389.489
summary(m1)
#> Call:
#> clm2(location = sureness ~ prod, scale = ~prod, data = dat26,
#> weights = wghts, link = "logistic")
#>
#> Location coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> prodTest 1.2959 0.1190 10.8868 < 2.22e-16
#>
#> Scale coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> prodTest 0.1480 0.0651 2.2731 0.023022
#>
#> Threshold coefficients:
#> Estimate Std. Error z value
#> 1|2 -1.4913 0.0922 -16.1828
#> 2|3 -0.4522 0.0718 -6.2957
#> 3|4 -0.1072 0.0700 -1.5326
#> 4|5 0.1634 0.0703 2.3253
#> 5|6 0.8829 0.0796 11.0959
#>
#> log-likelihood: -2687.745
#> AIC: 5389.489
#> Condition number of Hessian: 103.9533
vcov(m1)
#> 1|2 2|3 3|4 4|5 5|6
#> 1|2 0.0084917522 0.0046258800 0.0038492232 0.0033160193 0.002036182
#> 2|3 0.0046258800 0.0051587119 0.0044974259 0.0040993160 0.003352337
#> 3|4 0.0038492232 0.0044974259 0.0048935909 0.0045281978 0.003948063
#> 4|5 0.0033160193 0.0040993160 0.0045281978 0.0049357376 0.004489068
#> 5|6 0.0020361818 0.0033523369 0.0039480629 0.0044890684 0.006331536
#> prodTest 0.0009111872 0.0031074309 0.0039832980 0.0047389737 0.007064841
#> prodTest -0.0024312131 -0.0007825927 -0.0001820518 0.0003389766 0.001989991
#> prodTest prodTest
#> 1|2 0.0009111872 -0.0024312131
#> 2|3 0.0031074309 -0.0007825927
#> 3|4 0.0039832980 -0.0001820518
#> 4|5 0.0047389737 0.0003389766
#> 5|6 0.0070648413 0.0019899908
#> prodTest 0.0141687265 0.0045752918
#> prodTest 0.0045752918 0.0042385586
logLik(m1)
#> 'log Lik.' -2687.745 (df=7)
AIC(m1)
#> [1] 5389.489
coef(m1)
#> 1|2 2|3 3|4 4|5 5|6 prodTest prodTest
#> -1.4912570 -0.4521846 -0.1072083 0.1633653 0.8829135 1.2958776 0.1479862
coef(summary(m1))
#> Estimate Std. Error z value Pr(>|z|)
#> 1|2 -1.4912570 0.09215070 -16.182807 6.668397e-59
#> 2|3 -0.4521846 0.07182417 -6.295716 3.059834e-10
#> 3|4 -0.1072083 0.06995421 -1.532550 1.253868e-01
#> 4|5 0.1633653 0.07025480 2.325325 2.005457e-02
#> 5|6 0.8829135 0.07957095 11.095927 1.312915e-28
#> prodTest 1.2958776 0.11903246 10.886758 1.333011e-27
#> prodTest 0.1479862 0.06510421 2.273066 2.302222e-02
## link functions:
m2 <- update(m1, link = "probit")
m3 <- update(m1, link = "cloglog")
m4 <- update(m1, link = "loglog")
m5 <- update(m1, link = "cauchit", start = coef(m1))
m6 <- update(m1, link = "Aranda-Ordaz", lambda = 1)
m7 <- update(m1, link = "Aranda-Ordaz")
#> Changing to nlminb optimizer to accommodate optimization with bounds
#> Warning: clm2 may not have converged:
#> optimizer nlminb terminated with max|gradient|: 0.00131668415215813
m8 <- update(m1, link = "log-gamma", lambda = 1)
m9 <- update(m1, link = "log-gamma")
## nominal effects:
mN1 <- clm2(sureness ~ 1, nominal = ~ prod, data = dat26,
weights = wghts, link = "logistic")
anova(m1, mN1)
#> Likelihood ratio tests of cumulative link models
#>
#> Response: sureness
#> Model Resid. df -2logLik Test Df LR stat. Pr(Chi)
#> 1 prod | prod | 1840 5375.489
#> 2 1 | | prod 1837 5370.114 1 vs 2 3 5.375473 0.1462793
## optimizer / method:
update(m1, scale = ~ 1, method = "Newton")
#> Call:
#> clm2(location = sureness ~ prod, scale = ~1, data = dat26, weights = wghts,
#> link = "logistic", method = "Newton")
#>
#> Location coefficients:
#> prodTest
#> 1.144436
#>
#> No Scale coefficients
#>
#> Threshold coefficients:
#> 1|2 2|3 3|4 4|5 5|6
#> -1.4050044 -0.4247426 -0.1012663 0.1507757 0.8125538
#>
#> log-likelihood: -2690.332
#> AIC: 5392.664
update(m1, scale = ~ 1, method = "nlminb")
#> Warning: clm2 may not have converged:
#> optimizer nlminb terminated with max|gradient|: 0.00149784675352294
#> Call:
#> clm2(location = sureness ~ prod, scale = ~1, data = dat26, weights = wghts,
#> link = "logistic", method = "nlminb")
#>
#> Location coefficients:
#> prodTest
#> 1.144437
#>
#> No Scale coefficients
#>
#> Threshold coefficients:
#> 1|2 2|3 3|4 4|5 5|6
#> -1.4050069 -0.4247426 -0.1012651 0.1507768 0.8125536
#>
#> log-likelihood: -2690.332
#> AIC: 5392.664
update(m1, scale = ~ 1, method = "optim")
#> Warning: clm2 may not have converged:
#> optimizer optim terminated with max|gradient|: 0.00274642688958693
#> Call:
#> clm2(location = sureness ~ prod, scale = ~1, data = dat26, weights = wghts,
#> link = "logistic", method = "optim")
#>
#> Location coefficients:
#> prodTest
#> 1.144436
#>
#> No Scale coefficients
#>
#> Threshold coefficients:
#> 1|2 2|3 3|4 4|5 5|6
#> -1.4050057 -0.4247413 -0.1012670 0.1507745 0.8125535
#>
#> log-likelihood: -2690.332
#> AIC: 5392.664
## threshold functions
mT1 <- update(m1, threshold = "symmetric")
mT2 <- update(m1, threshold = "equidistant")
anova(m1, mT1, mT2)
#> Likelihood ratio tests of cumulative link models
#>
#> Response: sureness
#> Model Resid. df -2logLik Test Df LR stat. Pr(Chi)
#> 1 prod | prod | 1843 5577.566
#> 2 prod | prod | 1842 5397.760 1 vs 2 1 179.8062 0.000000e+00
#> 3 prod | prod | 1840 5375.489 2 vs 3 2 22.2708 1.458668e-05
## Extend example from polr in package MASS:
## Fit model from polr example:
if(require(MASS)) {
fm1 <- clm2(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
fm1
summary(fm1)
## With probit link:
summary(update(fm1, link = "probit"))
## Allow scale to depend on Cont-variable
summary(fm2 <- update(fm1, scale =~ Cont))
anova(fm1, fm2)
## which seems to improve the fit
}
#> Likelihood ratio tests of cumulative link models
#>
#> Response: Sat
#> Model Resid. df -2logLik Test Df LR stat.
#> 1 Infl + Type + Cont | | 1673 3479.149
#> 2 Infl + Type + Cont | Cont | 1672 3473.493 1 vs 2 1 5.655882
#> Pr(Chi)
#> 1
#> 2 0.01739692
#################################
## It is possible to fit multinomial models (i.e. with nominal
## effects) as the following example shows:
if(require(nnet)) {
(hous1.mu <- multinom(Sat ~ 1, weights = Freq, data = housing))
(hous1.clm <- clm2(Sat ~ 1, weights = Freq, data = housing))
## It is the same likelihood:
all.equal(logLik(hous1.mu), logLik(hous1.clm))
## and the same fitted values:
fitHous.mu <-
t(fitted(hous1.mu))[t(col(fitted(hous1.mu)) == unclass(housing$Sat))]
all.equal(fitted(hous1.clm), fitHous.mu)
## The coefficients of multinom can be retrieved from the clm2-object
## by:
Pi <- diff(c(0, plogis(hous1.clm$xi), 1))
log(Pi[2:3]/Pi[1])
## A larger model with explanatory variables:
(hous.mu <- multinom(Sat ~ Infl + Type + Cont, weights = Freq, data = housing))
(hous.clm <- clm2(Sat ~ 1, nominal = ~ Infl + Type + Cont, weights = Freq,
data = housing))
## Almost the same likelihood:
all.equal(logLik(hous.mu), logLik(hous.clm))
## And almost the same fitted values:
fitHous.mu <-
t(fitted(hous.mu))[t(col(fitted(hous.mu)) == unclass(housing$Sat))]
all.equal(fitted(hous.clm), fitHous.mu)
all.equal(round(fitted(hous.clm), 5), round(fitHous.mu), 5)
}
#> Loading required package: nnet
#> # weights: 6 (2 variable)
#> initial value 1846.767257
#> final value 1824.438811
#> converged
#> # weights: 24 (14 variable)
#> initial value 1846.767257
#> iter 10 value 1747.045232
#> final value 1735.041933
#> converged
#> [1] TRUE