Ordinal Poisson Family Function

Fits a Poisson regression where the response is ordinal (the Poisson counts are grouped between known cutpoints).

Usage

ordpoisson(cutpoints, countdata = FALSE, NOS = NULL,
           Levels = NULL, init.mu = NULL, parallel = FALSE,
           zero = NULL, link = "loglink")

Arguments

cutpoints

Numeric. The cutpoints, \(K_l\). These must be non-negative integers. Inf values may be included. See below for further details.

countdata

Logical. Is the response (LHS of formula) in count-data format? If not then the response is a matrix or vector with values 1, 2, ..., L, say, where L is the number of levels. Such input can be generated with cut with argument labels = FALSE. If countdata = TRUE then the response is expected to be in the same format as depvar(fit) where fit is a fitted model with ordpoisson as the VGAM family function. That is, the response is matrix of counts with L columns (if NOS = 1).

NOS

Integer. The number of species, or more generally, the number of response random variates. This argument must be specified when countdata = TRUE. Usually NOS = 1.

Levels

Integer vector, recycled to length NOS if necessary. The number of levels for each response random variate. This argument should agree with cutpoints. This argument must be specified when countdata = TRUE.

init.mu

Numeric. Initial values for the means of the Poisson regressions. Recycled to length NOS if necessary. Use this argument if the default initial values fail (the default is to compute an initial value internally).

parallel, zero, link

See poissonff. See CommonVGAMffArguments for information.

Details

This VGAM family function uses maximum likelihood estimation (Fisher scoring) to fit a Poisson regression to each column of a matrix response. The data, however, is ordinal, and is obtained from known integer cutpoints. Here, \(l=1,\ldots,L\) where \(L\) (\(L \geq 2\)) is the number of levels. In more detail, let \(Y^*=l\) if \(K_{l-1} < Y \leq K_{l}\) where the \(K_l\) are the cutpoints. We have \(K_0=-\infty\) and \(K_L=\infty\). The response for this family function corresponds to \(Y^*\) but we are really interested in the Poisson regression of \(Y\).

If NOS=1 then the argument cutpoints is a vector \((K_1,K_2,\ldots,K_L)\) where the last value (Inf) is optional. If NOS>1 then the vector should have NOS-1 Inf values separating the cutpoints. For example, if there are NOS=3 responses, then something like ordpoisson(cut = c(0, 5, 10, Inf, 20, 30, Inf, 0, 10, 40, Inf)) is valid.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

References

Yee, T. W. (2020). Ordinal ordination with normalizing link functions for count data, (in preparation).

Author

Thomas W. Yee

Note

Sometimes there are no observations between two cutpoints. If so, the arguments Levels and NOS need to be specified too. See below for an example.

Warning

The input requires care as little to no checking is done. If fit is the fitted object, have a look at fit@extra and depvar(fit) to check.

Examples

set.seed(123)  # Example 1
x2 <- runif(n <- 1000); x3 <- runif(n)
mymu <- exp(3 - 1 * x2 + 2 * x3)
y1 <- rpois(n, lambda = mymu)
cutpts <- c(-Inf, 20, 30, Inf)
fcutpts <- cutpts[is.finite(cutpts)]  # finite cutpoints
ystar <- cut(y1, breaks = cutpts, labels = FALSE)
if (FALSE) { # \dontrun{
plot(x2, x3, col = ystar, pch = as.character(ystar))
} # }
table(ystar) / sum(table(ystar))
#> ystar
#>     1     2     3 
#> 0.260 0.194 0.546 
fit <- vglm(ystar ~ x2 + x3, fam = ordpoisson(cutpoi = fcutpts))
head(depvar(fit))  # This can be input if countdata = TRUE
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    1    0    0
#> [4,]    0    0    1
#> [5,]    0    0    1
#> [6,]    0    0    1
head(fitted(fit))
#>         mu
#> 1 26.37788
#> 2 29.70400
#> 3 18.82698
#> 4 44.47479
#> 5 41.56519
#> 6 49.54660
head(predict(fit))
#>      loglink(mu)
#> [1,]    3.272526
#> [2,]    3.391282
#> [3,]    2.935291
#> [4,]    3.794923
#> [5,]    3.727263
#> [6,]    3.902914
coef(fit, matrix = TRUE)
#>             loglink(mu)
#> (Intercept)   3.0324949
#> x2           -0.9879523
#> x3            1.9155716
fit@extra
#> $NOS
#> [1] 1
#> 
#> $Levels
#> [1] 3
#> 
#> $y.integer
#> [1] TRUE
#> 
#> $ncoly
#> [1] 3
#> 
#> $countdata
#> [1] FALSE
#> 
#> $cutpoints
#> [1]  20  30 Inf
#> 
#> $n
#> [1] 1000
#> 

# Example 2: multivariate and there are no obsns between some cutpoints
cutpts2 <- c(-Inf, 0, 9, 10, 20, 70, 200, 201, Inf)
fcutpts2 <- cutpts2[is.finite(cutpts2)]  # finite cutpoints
y2 <- rpois(n, lambda = mymu)   # Same model as y1
ystar2 <- cut(y2, breaks = cutpts2, labels = FALSE)
table(ystar2) / sum(table(ystar2))
#> ystar2
#>     2     3     4     5     6 
#> 0.037 0.029 0.214 0.571 0.149 
fit <- vglm(cbind(ystar,ystar2) ~ x2 + x3, fam =
            ordpoisson(cutpoi = c(fcutpts,Inf,fcutpts2,Inf),
                       Levels = c(length(fcutpts)+1,length(fcutpts2)+1),
                       parallel = TRUE), trace = TRUE)
#> Iteration 1: loglikelihood = -3421.6299
#> Iteration 2: loglikelihood = -975.56347
#> Iteration 3: loglikelihood = -763.12349
#> Iteration 4: loglikelihood = -759.96202
#> Iteration 5: loglikelihood = -759.9608
#> Iteration 6: loglikelihood = -759.9608
coef(fit, matrix = TRUE)
#>             loglink(mu1) loglink(mu2)
#> (Intercept)     2.993586     2.993586
#> x2             -1.017975    -1.017975
#> x3              2.032580     2.032580
fit@extra
#> $NOS
#> [1] 2
#> 
#> $Levels
#> [1] 3 8
#> 
#> $y.integer
#> [1] TRUE
#> 
#> $ncoly
#> [1] 11
#> 
#> $countdata
#> [1] FALSE
#> 
#> $cutpoints
#>  [1]  20  30 Inf   0   9  10  20  70 200 201 Inf
#> 
#> $n
#> [1] 1000
#> 
constraints(fit)
#> $`(Intercept)`
#>      [,1]
#> [1,]    1
#> [2,]    1
#> 
#> $x2
#>      [,1]
#> [1,]    1
#> [2,]    1
#> 
#> $x3
#>      [,1]
#> [1,]    1
#> [2,]    1
#> 
summary(depvar(fit))  # Some columns have all zeros
#>        V1             V2              V3              V4          V5       
#>  Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0   Min.   :0.000  
#>  1st Qu.:0.00   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0   1st Qu.:0.000  
#>  Median :0.00   Median :0.000   Median :1.000   Median :0   Median :0.000  
#>  Mean   :0.26   Mean   :0.194   Mean   :0.546   Mean   :0   Mean   :0.037  
#>  3rd Qu.:1.00   3rd Qu.:0.000   3rd Qu.:1.000   3rd Qu.:0   3rd Qu.:0.000  
#>  Max.   :1.00   Max.   :1.000   Max.   :1.000   Max.   :0   Max.   :1.000  
#>        V6              V7              V8              V9             V10   
#>  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0  
#>  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0  
#>  Median :0.000   Median :0.000   Median :1.000   Median :0.000   Median :0  
#>  Mean   :0.029   Mean   :0.214   Mean   :0.571   Mean   :0.149   Mean   :0  
#>  3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.:0  
#>  Max.   :1.000   Max.   :1.000   Max.   :1.000   Max.   :1.000   Max.   :0  
#>       V11   
#>  Min.   :0  
#>  1st Qu.:0  
#>  Median :0  
#>  Mean   :0  
#>  3rd Qu.:0  
#>  Max.   :0