Fitting a Dirichlet Distribution

Fits a Dirichlet distribution to a matrix of compositions.

Usage

dirichlet(link = "loglink", parallel = FALSE, zero = NULL,
          imethod = 1)

Arguments

link

Link function applied to each of the $M$ (positive) shape parameters $\alpha_j$. See Links for more choices. The default gives $\eta_j=\log(\alpha_j)$.

parallel, zero, imethod

See CommonVGAMffArguments for more information.

Details

In this help file the response is assumed to be a $M$-column matrix with positive values and whose rows each sum to unity. Such data can be thought of as compositional data. There are $M$ linear/additive predictors $\eta_j$.

The Dirichlet distribution is commonly used to model compositional data, including applications in genetics. Suppose $(Y_1,\ldots,Y_{M})^T$ is the response. Then it has a Dirichlet distribution if $(Y_1,\ldots,Y_{M-1})^T$ has density $$\frac{\Gamma(\alpha_{+})} {\prod_{j=1}^{M} \Gamma(\alpha_{j})} \prod_{j=1}^{M} y_j^{\alpha_{j} -1}$$ where $\alpha_+=\alpha_1+\cdots+ \alpha_M$, $\alpha_j > 0$, and the density is defined on the unit simplex $$\Delta_{M} = \left\{ (y_1,\ldots,y_{M})^T : y_1 > 0, \ldots, y_{M} > 0, \sum_{j=1}^{M} y_j = 1 \right\}. $$ One has $E(Y_j) = \alpha_j / \alpha_{+}$, which are returned as the fitted values. For this distribution Fisher scoring corresponds to Newton-Raphson.

The Dirichlet distribution can be motivated by considering the random variables $(G_1,\ldots,G_{M})^T$ which are each independent and identically distributed as a gamma distribution with density $f(g_j)=g_j^{\alpha_j - 1} e^{-g_j} / \Gamma(\alpha_j)$. Then the Dirichlet distribution arises when $Y_j=G_j / (G_1 + \cdots + G_M)$.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

When fitted, the fitted.values slot of the object contains the $M$-column matrix of means.

References

Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

Author

Thomas W. Yee

Note

The response should be a matrix of positive values whose rows each sum to unity. Similar to this is count data, where probably a multinomial logit model (multinomial) may be appropriate. Another similar distribution to the Dirichlet is the Dirichlet-multinomial (see dirmultinomial).

Examples

ddata <- data.frame(rdiric(1000,
                    shape = exp(c(y1 = -1, y2 = 1, y3 = 0))))
fit <- vglm(cbind(y1, y2, y3)  ~ 1, dirichlet,
            data = ddata, trace = TRUE, crit = "coef")
#> Iteration 1: coefficients = 
#> -1.79144517,  0.26331848, -0.76855048
#> Iteration 2: coefficients = 
#> -1.26089705,  0.78819312, -0.26236508
#> Iteration 3: coefficients = 
#> -1.019638688,  1.031466071, -0.024880427
#> Iteration 4: coefficients = 
#> -0.978444064,  1.072997653,  0.016434721
#> Iteration 5: coefficients = 
#> -0.977407238,  1.074041410,  0.017484204
#> Iteration 6: coefficients = 
#> -0.977406600,  1.074042054,  0.017484855
#> Iteration 7: coefficients = 
#> -0.977406600,  1.074042054,  0.017484855
Coef(fit)
#>    shape1    shape2    shape3 
#> 0.3762857 2.9271875 1.0176386 
coef(fit, matrix = TRUE)
#>             loglink(shape1) loglink(shape2) loglink(shape3)
#> (Intercept)      -0.9774066        1.074042      0.01748486
head(fitted(fit))
#>           y1        y2        y3
#> 1 0.08708076 0.6774154 0.2355039
#> 2 0.08708076 0.6774154 0.2355039
#> 3 0.08708076 0.6774154 0.2355039
#> 4 0.08708076 0.6774154 0.2355039
#> 5 0.08708076 0.6774154 0.2355039
#> 6 0.08708076 0.6774154 0.2355039