Fitting a Dirichlet-Multinomial Distribution
dirmultinomial.RdFits a Dirichlet-multinomial distribution to a matrix response.
Arguments
- lphi
Link function applied to the \(\phi\) parameter, which lies in the open unit interval \((0,1)\). See
Linksfor more choices.- iphi
Numeric. Initial value for \(\phi\). Must be in the open unit interval \((0,1)\). If a failure to converge occurs then try assigning this argument a different value.
- parallel
A logical (formula not allowed here) indicating whether the probabilities \(\pi_1,\ldots,\pi_{M-1}\) are to be equal via equal coefficients. Note \(\pi_M\) will generally be different from the other probabilities. Setting
parallel = TRUEwill only work if you also setzero = NULLbecause of interference between these arguments (with respect to the intercept term).- zero
An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set \(\{1,2,\ldots,M\}\). If the character
"M"then this means the numerical value \(M\), which corresponds to linear/additive predictor associated with \(\phi\). Settingzero = NULLmeans none of the values from the set \(\{1,2,\ldots,M\}\). SeeCommonVGAMffArgumentsfor more information.
Details
The Dirichlet-multinomial distribution
arises from a multinomial distribution where
the probability parameters are not constant but are generated
from a
multivariate distribution called the Dirichlet distribution.
The Dirichlet-multinomial distribution has probability function
$$P(Y_1=y_1,\ldots,Y_M=y_M) =
{N_{*} \choose {y_1,\ldots,y_M}}
\frac{
\prod_{j=1}^{M}
\prod_{r=1}^{y_{j}}
(\pi_j (1-\phi) + (r-1)\phi)}{
\prod_{r=1}^{N_{*}}
(1-\phi + (r-1)\phi)}$$
where \(\phi\) is the over-dispersion parameter
and \(N_{*} = y_1+\cdots+y_M\). Here,
\(a \choose b\) means “\(a\) choose \(b\)”
and refers to combinations (see choose).
The above formula applies to each row of the matrix response.
In this VGAM family function the first \(M-1\)
linear/additive predictors correspond to the first \(M-1\)
probabilities via
$$\eta_j = \log(P[Y=j]/ P[Y=M]) = \log(\pi_j/\pi_M)$$
where \(\eta_j\) is the \(j\)th linear/additive
predictor (\(\eta_M=0\) by definition for
\(P[Y=M]\) but not for \(\phi\))
and
\(j=1,\ldots,M-1\).
The \(M\)th linear/additive predictor corresponds to
lphi applied to \(\phi\).
Note that \(E(Y_j) = N_* \pi_j\) but the probabilities (returned as the fitted values) \(\pi_j\) are bundled together as a \(M\)-column matrix. The quantities \(N_*\) are returned as the prior weights.
The beta-binomial distribution is a special case of
the Dirichlet-multinomial distribution when \(M=2\);
see betabinomial. It is easy to show that
the first shape parameter of the beta distribution is
\(shape1=\pi(1/\phi-1)\)
and the second shape parameter is
\(shape2=(1-\pi)(1/\phi-1)\). Also,
\(\phi=1/(1+shape1+shape2)\), which
is known as the intra-cluster correlation coefficient.
Value
An object of class "vglmff" (see
vglmff-class). The object is used by modelling
functions such as vglm, rrvglm
and vgam.
If the model is an intercept-only model then @misc (which is a
list) has a component called shape which is a vector with the
\(M\) values \(\pi_j(1/\phi-1)\).
References
Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.
Tvedebrink, T. (2010). Overdispersion in allelic counts and \(\theta\)-correction in forensic genetics. Theoretical Population Biology, 78, 200–210.
Yu, P. and Shaw, C. A. (2014). An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics, 30, 1547–54.
Warning
This VGAM family function is prone to numerical problems, especially when there are covariates.
Note
The response can be a matrix of non-negative integers, or
else a matrix of sample proportions and the total number of
counts in each row specified using the weights argument.
This dual input option is similar to multinomial.
To fit a `parallel' model with the \(\phi\)
parameter being an intercept-only you will need to use the
constraints argument.
Currently, Fisher scoring is implemented. To compute the
expected information matrix a for loop is used; this
may be very slow when the counts are large. Additionally,
convergence may be slower than usual due to round-off error
when computing the expected information matrices.
Examples
nn <- 5; M <- 4; set.seed(1)
ydata <- data.frame(round(matrix(runif(nn * M, max = 100), nn, M)))
colnames(ydata) <- paste("y", 1:M, sep = "") # Integer counts
fit <- vglm(cbind(y1, y2, y3, y4) ~ 1, dirmultinomial,
data = ydata, trace = TRUE)
#> Iteration 1: loglikelihood = -1428.784
#> Iteration 2: loglikelihood = -1428.75
#> Iteration 3: loglikelihood = -1428.744
#> Iteration 4: loglikelihood = -1428.743
#> Iteration 5: loglikelihood = -1428.742
#> Iteration 6: loglikelihood = -1428.742
#> Iteration 7: loglikelihood = -1428.742
head(fitted(fit))
#> y1 y2 y3 y4
#> 1 0.2180462 0.2530048 0.2077817 0.3211673
#> 2 0.2180462 0.2530048 0.2077817 0.3211673
#> 3 0.2180462 0.2530048 0.2077817 0.3211673
#> 4 0.2180462 0.2530048 0.2077817 0.3211673
#> 5 0.2180462 0.2530048 0.2077817 0.3211673
depvar(fit) # Sample proportions
#> y1 y2 y3 y4
#> 1 0.1436170 0.47872340 0.11170213 0.2659574
#> 2 0.1674208 0.42533937 0.08144796 0.3257919
#> 3 0.1958763 0.22680412 0.23711340 0.3402062
#> 4 0.3956522 0.27391304 0.16521739 0.1652174
#> 5 0.1104972 0.03314917 0.42541436 0.4309392
weights(fit, type = "prior", matrix = FALSE) # Total counts per row
#> [1] 188 221 291 230 181
if (FALSE) { # \dontrun{
ydata <- transform(ydata, x2 = runif(nn))
fit <- vglm(cbind(y1, y2, y3, y4) ~ x2, dirmultinomial,
data = ydata, trace = TRUE)
Coef(fit)
coef(fit, matrix = TRUE)
(sfit <- summary(fit))
vcov(sfit)
} # }