Fitting a Dirichlet-Multinomial Distribution

Fits a Dirichlet-multinomial distribution to a matrix response.

Usage

dirmultinomial(lphi = "logitlink", iphi = 0.10, parallel = FALSE,
               zero = "M")

Arguments

lphi

Link function applied to the $\phi$ parameter, which lies in the open unit interval $(0,1)$. See Links for more choices.

iphi

Numeric. Initial value for $\phi$. Must be in the open unit interval $(0,1)$. If a failure to converge occurs then try assigning this argument a different value.

parallel

A logical (formula not allowed here) indicating whether the probabilities $\pi_1,\ldots,\pi_{M-1}$ are to be equal via equal coefficients. Note $\pi_M$ will generally be different from the other probabilities. Setting parallel = TRUE will only work if you also set zero = NULL because of interference between these arguments (with respect to the intercept term).

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set $\{1,2,\ldots,M\}$. If the character "M" then this means the numerical value $M$, which corresponds to linear/additive predictor associated with $\phi$. Setting zero = NULL means none of the values from the set $\{1,2,\ldots,M\}$. See CommonVGAMffArguments for more information.

Details

The Dirichlet-multinomial distribution arises from a multinomial distribution where the probability parameters are not constant but are generated from a multivariate distribution called the Dirichlet distribution. The Dirichlet-multinomial distribution has probability function $$P(Y_1=y_1,\ldots,Y_M=y_M) = {N_{*} \choose {y_1,\ldots,y_M}} \frac{ \prod_{j=1}^{M} \prod_{r=1}^{y_{j}} (\pi_j (1-\phi) + (r-1)\phi)}{ \prod_{r=1}^{N_{*}} (1-\phi + (r-1)\phi)}$$ where $\phi$ is the over-dispersion parameter and $N_{*} = y_1+\cdots+y_M$. Here, $a \choose b$ means “$a$ choose $b$” and refers to combinations (see choose). The above formula applies to each row of the matrix response. In this VGAM family function the first $M-1$ linear/additive predictors correspond to the first $M-1$ probabilities via $$\eta_j = \log(P[Y=j]/ P[Y=M]) = \log(\pi_j/\pi_M)$$ where $\eta_j$ is the $j$th linear/additive predictor ($\eta_M=0$ by definition for $P[Y=M]$ but not for $\phi$) and $j=1,\ldots,M-1$. The $M$th linear/additive predictor corresponds to lphi applied to $\phi$.

Note that $E(Y_j) = N_* \pi_j$ but the probabilities (returned as the fitted values) $\pi_j$ are bundled together as a $M$-column matrix. The quantities $N_*$ are returned as the prior weights.

The beta-binomial distribution is a special case of the Dirichlet-multinomial distribution when $M=2$; see betabinomial. It is easy to show that the first shape parameter of the beta distribution is $shape1=\pi(1/\phi-1)$ and the second shape parameter is $shape2=(1-\pi)(1/\phi-1)$. Also, $\phi=1/(1+shape1+shape2)$, which is known as the intra-cluster correlation coefficient.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

If the model is an intercept-only model then @misc (which is a list) has a component called shape which is a vector with the $M$ values $\pi_j(1/\phi-1)$.

References

Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.

Tvedebrink, T. (2010). Overdispersion in allelic counts and $\theta$-correction in forensic genetics. Theoretical Population Biology, 78, 200–210.

Yu, P. and Shaw, C. A. (2014). An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics, 30, 1547–54.

Author

Thomas W. Yee

Warning

This VGAM family function is prone to numerical problems, especially when there are covariates.

Note

The response can be a matrix of non-negative integers, or else a matrix of sample proportions and the total number of counts in each row specified using the weights argument. This dual input option is similar to multinomial.

To fit a `parallel' model with the $\phi$ parameter being an intercept-only you will need to use the constraints argument.

Currently, Fisher scoring is implemented. To compute the expected information matrix a for loop is used; this may be very slow when the counts are large. Additionally, convergence may be slower than usual due to round-off error when computing the expected information matrices.

Examples

nn <- 5; M <- 4; set.seed(1)
ydata <- data.frame(round(matrix(runif(nn * M, max = 100), nn, M)))
colnames(ydata) <- paste("y", 1:M, sep = "")  # Integer counts

fit <- vglm(cbind(y1, y2, y3, y4) ~ 1, dirmultinomial,
            data = ydata, trace = TRUE)
#> Iteration 1: loglikelihood = -1428.784
#> Iteration 2: loglikelihood = -1428.75
#> Iteration 3: loglikelihood = -1428.744
#> Iteration 4: loglikelihood = -1428.743
#> Iteration 5: loglikelihood = -1428.742
#> Iteration 6: loglikelihood = -1428.742
#> Iteration 7: loglikelihood = -1428.742
head(fitted(fit))
#>          y1        y2        y3        y4
#> 1 0.2180462 0.2530048 0.2077817 0.3211673
#> 2 0.2180462 0.2530048 0.2077817 0.3211673
#> 3 0.2180462 0.2530048 0.2077817 0.3211673
#> 4 0.2180462 0.2530048 0.2077817 0.3211673
#> 5 0.2180462 0.2530048 0.2077817 0.3211673
depvar(fit)  # Sample proportions
#>          y1         y2         y3        y4
#> 1 0.1436170 0.47872340 0.11170213 0.2659574
#> 2 0.1674208 0.42533937 0.08144796 0.3257919
#> 3 0.1958763 0.22680412 0.23711340 0.3402062
#> 4 0.3956522 0.27391304 0.16521739 0.1652174
#> 5 0.1104972 0.03314917 0.42541436 0.4309392
weights(fit, type = "prior", matrix = FALSE)  # Total counts per row
#> [1] 188 221 291 230 181

if (FALSE) { # \dontrun{
ydata <- transform(ydata, x2 = runif(nn))
fit <- vglm(cbind(y1, y2, y3, y4) ~ x2, dirmultinomial,
            data = ydata, trace = TRUE)
Coef(fit)
coef(fit, matrix = TRUE)
(sfit <- summary(fit))
vcov(sfit)
} # }