Zero-Altered Binomial Distribution
zabinomial.RdFits a zero-altered binomial distribution based on a conditional model involving a Bernoulli distribution and a positive-binomial distribution.
Usage
zabinomial(lpobs0 = "logitlink", lprob = "logitlink",
type.fitted = c("mean", "prob", "pobs0"),
ipobs0 = NULL, iprob = NULL, imethod = 1, zero = NULL)
zabinomialff(lprob = "logitlink", lonempobs0 = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
iprob = NULL, ionempobs0 = NULL, imethod = 1, zero = "onempobs0")Arguments
- lprob
Parameter link function applied to the probability parameter of the binomial distribution. See
Linksfor more choices.- lpobs0
Link function for the parameter \(p_0\), called
pobs0here. SeeLinksfor more choices.- type.fitted
See
CommonVGAMffArgumentsandfittedvlmfor information.- iprob, ipobs0
- lonempobs0, ionempobs0
Corresponding argument for the other parameterization. See details below.
- imethod, zero
Details
The response \(Y\) is zero with probability \(p_0\),
else \(Y\) has a positive-binomial distribution with
probability \(1-p_0\). Thus \(0 < p_0 < 1\),
which may be modelled as a function of the covariates.
The zero-altered binomial distribution differs from the
zero-inflated binomial distribution in that the former
has zeros coming from one source, whereas the latter
has zeros coming from the binomial distribution too. The
zero-inflated binomial distribution is implemented in
zibinomial.
Some people call the zero-altered binomial a hurdle model.
The input is currently a vector or one-column matrix.
By default, the two linear/additive
predictors for zabinomial()
are \((logit(p_0), \log(p))^T\).
The VGAM family function zabinomialff() has a few
changes compared to zabinomial().
These are:
(i) the order of the linear/additive predictors is switched so the
binomial probability comes first;
(ii) argument onempobs0 is now 1 minus the probability of an observed 0,
i.e., the probability of the positive binomial distribution,
i.e., onempobs0 is 1-pobs0;
(iii) argument zero has a new default so that the onempobs0
is intercept-only by default.
Now zabinomialff() is generally recommended over
zabinomial().
Both functions implement Fisher scoring and neither can handle
multiple responses.
Value
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm,
and vgam.
The fitted.values slot of the fitted object,
which should be extracted by the generic function fitted, returns
the mean \(\mu\) (default) which is given by
$$\mu = (1-p_0) \mu_{b} / [1 - (1 - \mu_{b})^N]$$
where \(\mu_{b}\) is the usual binomial mean.
If type.fitted = "pobs0" then \(p_0\) is returned.
Note
The response should be a two-column matrix of counts, with first column giving the number of successes.
Note this family function allows \(p_0\) to be modelled as
functions of the covariates by having zero = NULL.
It is a conditional model, not a mixture model.
These family functions effectively combine
posbinomial and binomialff into
one family function.
Examples
zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata, size = 10,
prob = logitlink(-2 + 3*x2, inverse = TRUE),
pobs0 = logitlink(-1 + 2*x2, inverse = TRUE))
zdata <- transform(zdata,
y1 = rzabinom(nn, size = size, prob = prob, pobs0 = pobs0))
with(zdata, table(y1))
#> y1
#> 0 1 2 3 4 5 6 7 8 9 10
#> 507 95 110 81 68 49 31 29 23 6 1
zfit <- vglm(cbind(y1, size - y1) ~ x2, zabinomial(zero = NULL),
data = zdata, trace = TRUE)
#> Iteration 1: loglikelihood = -1534.8662
#> Iteration 2: loglikelihood = -1456.2039
#> Iteration 3: loglikelihood = -1455.4103
#> Iteration 4: loglikelihood = -1455.4089
#> Iteration 5: loglikelihood = -1455.4089
coef(zfit, matrix = TRUE)
#> logitlink(pobs0) logitlink(prob)
#> (Intercept) -0.8510155 -2.112235
#> x2 1.7969075 3.121779
head(fitted(zfit))
#> [,1]
#> 1 0.2078083
#> 2 0.1448188
#> 3 0.1542440
#> 4 0.1115435
#> 5 0.1499594
#> 6 0.1602373
head(predict(zfit))
#> logitlink(pobs0) logitlink(prob)
#> 1 0.8876933 0.9084355
#> 2 -0.3262211 -1.2005061
#> 3 -0.2227005 -1.0206591
#> 4 -0.8399335 -2.0929821
#> 5 -0.2688375 -1.1008132
#> 6 -0.1599471 -0.9116372
summary(zfit)
#>
#> Call:
#> vglm(formula = cbind(y1, size - y1) ~ x2, family = zabinomial(zero = NULL),
#> data = zdata, trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -0.85102 0.13103 -6.495 8.32e-11 ***
#> (Intercept):2 -2.11223 0.07701 -27.427 < 2e-16 ***
#> x2:1 1.79691 0.23308 7.709 1.27e-14 ***
#> x2:2 3.12178 0.13481 23.156 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(pobs0), logitlink(prob)
#>
#> Log-likelihood: -1455.409 on 1996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 5
#>