Extended Beta-binomial Distribution Family Function
extbetabinomial.RdFits an extended beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.
Usage
extbetabinomial(lmu = "logitlink", lrho = "cloglink",
zero = "rho", irho = 0, grho = c(0, 0.05, 0.1, 0.2),
vfl = FALSE, Form2 = NULL,
imethod = 1, ishrinkage = 0.95)Arguments
- lmu, lrho
Link functions applied to the two parameters. See
Linksfor more choices. The first default ensure the mean remain in \((0, 1)\), while the second allows for a slightly negative correlation parameter: you could say it lies in \((\max(-\mu/(N-\mu-1), -(1 - \mu)/(N-(1-\mu)-1)), 1)\) where \(\mu\) is the mean (probability) and \(N\) issize. See below for details. Forlrho,cloglinkis a good choice because it handles parameter values from 1 downwards. Other good choices includelogofflink(offset = 1)andrhobitlink.- irho, grho
The first is similar to
betabinomialand it is a good idea to use this argument because to conduct a grid search based ongrhois expensive. The default is effectively a binomial distribution. Setirho = NULLto perform a grid search which is more reliable but slow.- imethod
Similar to
betabinomial.- zero
Similar to
betabinomial. Also, seeCommonVGAMffArgumentsfor more information. Modellingrhowith covariates requires large samples.- ishrinkage
See
betabinomialandCommonVGAMffArgumentsfor information.- vfl, Form2
See
CommonVGAMffArguments. Ifvfl = TRUEthenForm2should be a formula specifying the terms for \(\eta_2\) and all others are used for \(\mu\). It is similar touninormal. If these arguments are used thencbind(0, log(size1 / (size1 - 1)))should be used as an offset, and setzero = NULLtoo.
Details
The extended beta-binomial
distribution (EBBD) proposed
by Prentice (1986)
allows for a slightly negative correlation
parameter whereas the ordinary BBD
betabinomial
only allows values in \((0, 1)\) so it
handles overdispersion only.
When negative, the data is underdispersed
relative to an ordinary binomial distribution.
Argument rho is used here for the
\(\delta\) used in Prentice (1986) because
it is the correlation between the
(almost) Bernoulli trials.
(They are actually simple binary variates.)
We use here
\(N\) for the number of trials
(e.g., litter size),
\(T=NY\) is the number of successes, and
\(p\) (or \(\mu\))
is the probability of a success
(e.g., a malformation).
That is, \(Y\) is the proportion
of successes. Like
binomialff, the fitted values
are the
estimated probability
of success
(i.e., \(E[Y]\) and not \(E[T]\))
and the prior weights \(N\) are attached
separately on the object in a slot.
The probability function is difficult to write but it involves three series of products. Recall \(Y = T/N\) is the real response being modelled, where \(T\) is the (total) sum of \(N\) correlated (almost) Bernoulli trials.
The default model is
\(\eta_1 = logit(\mu)\)
and \(\eta_2 = clog(\rho)\)
because the first
parameter lies between 0 and 1.
The second link is cloglink.
The mean of \(Y\) is
\(p=\mu\)
and the variance of \(Y\) is
\(\mu(1-\mu)(1+(N-1)\rho)/N\).
Here, the correlation \(\rho\)
may be slightly negative
and is the correlation between the \(N\)
individuals within a litter.
A litter effect is typically reflected
by a positive value of \(\rho\) and
corresponds to overdispersion with
respect to the binomial distribution.
Thus an exchangeable error structure
is assumed between units within a litter
for the EBBD.
This family function uses Fisher scoring.
Elements of the second-order expected
derivatives
are computed numerically, which may
fail for models very near the boundary of the
parameter space.
Usually, the computations
are expensive for large \(N\) because of
a for loop, so
it may take a long time.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm.
Suppose fit is a fitted EBB
model. Then depvar(fit)
are the sample proportions \(y\),
fitted(fit) returns
estimates of \(E(Y)\),
and weights(fit, type = "prior") returns
the number of trials \(N\).
References
Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321–327.
Note
This function is recommended over
betabinomial and
betabinomialff.
It processes the input in the
same way as binomialff.
But it does not handle the case \(N \leq 2\)
very well because there are two parameters to
estimate, not one, for each row of the input.
Cases where \(N > 2\)
can be selected via the
subset argument of vglm.
Warning
Modelling rho using covariates well
requires much data
so it is usually best to leave zero
alone.
It is good to set trace = TRUE and
play around with irho if there are
problems achieving convergence.
Convergence problems will occur when the
estimated rho is close to the
lower bound,
i.e., the underdispersion
is almost too severe for the EBB to cope.
Examples
if (FALSE) { # \dontrun{
# Example 1
edata <- data.frame(N = 10, mu = logitlink(1, inverse = TRUE),
rho = cloglink(0.15, inverse = TRUE))
edata <- transform(edata,
y = rextbetabinom(100, N, mu, rho = rho))
with(edata, plot(table(y)))
fit1 <- vglm(cbind(y, N-y) ~ 1, extbetabinomial, edata, trace = TRUE)
coef(fit1, matrix = TRUE)
Coef(fit1)
head(cbind(depvar(fit1), weights(fit1, type = "prior")))
# Example 2: VFL model
N <- size1 <- 10; nn <- 2000; set.seed(1)
edata <- # Generate the data set. Expensive.
data.frame(x2 = runif(nn),
ooo = log(size1 / (size1 - 1)))
edata <- transform(edata, x1copy = 1, x2copy = x2,
y2 = rextbetabinom(nn, size1, # Expensive
logitlink(1 + x2, inverse = TRUE),
cloglink(ooo + 1 - 0.5 * x2, inv = TRUE)))
fit2 <- vglm(data = edata,
cbind(y2, N - y2) ~ x2 + x1copy + x2copy,
extbetabinomial(zero = NULL, vfl = TRUE,
Form2 = ~ x1copy + x2copy - 1),
offset = cbind(0, ooo), trace = TRUE)
coef(fit2, matrix = TRUE)
wald.stat(fit2, values0 = c(1, 1, -0.5))
} # }