Zero-Altered Poisson Distribution
zapoisson.RdFits a zero-altered Poisson distribution based on a conditional model involving a Bernoulli distribution and a positive-Poisson distribution.
Usage
zapoisson(lpobs0 = "logitlink", llambda = "loglink", type.fitted =
c("mean", "lambda", "pobs0", "onempobs0"), imethod = 1,
ipobs0 = NULL, ilambda = NULL, ishrinkage = 0.95, probs.y = 0.35,
zero = NULL)
zapoissonff(llambda = "loglink", lonempobs0 = "logitlink", type.fitted =
c("mean", "lambda", "pobs0", "onempobs0"), imethod = 1,
ilambda = NULL, ionempobs0 = NULL, ishrinkage = 0.95,
probs.y = 0.35, zero = "onempobs0")Arguments
- lpobs0
Link function for the parameter \(p_0\), called
pobs0here. SeeLinksfor more choices.- llambda
Link function for the usual \(\lambda\) parameter. See
Linksfor more choices.- type.fitted
See
CommonVGAMffArgumentsandfittedvlmfor information.- lonempobs0
Corresponding argument for the other parameterization. See details below.
- imethod, ipobs0, ionempobs0, ilambda, ishrinkage
See
CommonVGAMffArgumentsfor information.- probs.y, zero
See
CommonVGAMffArgumentsfor information.
Details
The response \(Y\) is zero with probability \(p_0\), else \(Y\) has a positive-Poisson(\(\lambda)\) distribution with probability \(1-p_0\). Thus \(0 < p_0 < 1\), which is modelled as a function of the covariates. The zero-altered Poisson distribution differs from the zero-inflated Poisson distribution in that the former has zeros coming from one source, whereas the latter has zeros coming from the Poisson distribution too. Some people call the zero-altered Poisson a hurdle model.
For one response/species, by default, the two linear/additive
predictors for zapoisson()
are \((logit(p_0), \log(\lambda))^T\).
The VGAM family function zapoissonff() has a few
changes compared to zapoisson().
These are:
(i) the order of the linear/additive predictors is switched so the
Poisson mean comes first;
(ii) argument onempobs0 is now 1 minus the probability of an observed 0,
i.e., the probability of the positive Poisson distribution,
i.e., onempobs0 is 1-pobs0;
(iii) argument zero has a new default so that the onempobs0
is intercept-only by default.
Now zapoissonff() is generally recommended over
zapoisson().
Both functions implement Fisher scoring and can handle
multiple responses.
Value
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm,
and vgam.
The fitted.values slot of the fitted object,
which should be extracted by the generic function fitted,
returns the mean \(\mu\) (default) which is given by
$$\mu = (1-p_0) \lambda / [1 - \exp(-\lambda)].$$
If type.fitted = "pobs0" then \(p_0\) is returned.
References
Welsh, A. H., Cunningham, R. B., Donnelly, C. F. and Lindenmayer, D. B. (1996). Modelling the abundances of rare species: statistical models for counts with extra zeros. Ecological Modelling, 88, 297–308.
Angers, J-F. and Biswas, A. (2003). A Bayesian analysis of zero-inflated generalized Poisson model. Computational Statistics & Data Analysis, 42, 37–46.
Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
Note
There are subtle differences between this family function and
zipoisson and yip88.
In particular, zipoisson is a
mixture model whereas zapoisson() and yip88
are conditional models.
Note this family function allows \(p_0\) to be modelled as functions of the covariates.
This family function effectively combines pospoisson
and binomialff into one family function.
This family function can handle multiple responses,
e.g., more than one species.
It is recommended that Gaitdpois be used, e.g.,
rgaitdpois(nn, lambda, pobs.mlm = pobs0, a.mlm = 0)
instead of
rzapois(nn, lambda, pobs0 = pobs0).
Examples
zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata, pobs0 = logitlink( -1 + 1*x2, inverse = TRUE),
lambda = loglink(-0.5 + 2*x2, inverse = TRUE))
zdata <- transform(zdata, y = rgaitdpois(nn, lambda, pobs.mlm = pobs0,
a.mlm = 0))
with(zdata, table(y))
#> y
#> 0 1 2 3 4 5 6 7 8 9 10
#> 368 261 177 94 51 33 9 4 1 1 1
fit <- vglm(y ~ x2, zapoisson, data = zdata, trace = TRUE)
#> Iteration 1: loglikelihood = -1484.4837
#> Iteration 2: loglikelihood = -1475.1763
#> Iteration 3: loglikelihood = -1474.9834
#> Iteration 4: loglikelihood = -1474.9833
#> Iteration 5: loglikelihood = -1474.9833
fit <- vglm(y ~ x2, zapoisson, data = zdata, trace = TRUE, crit = "coef")
#> Iteration 1: coefficients =
#> -1.21946191, -0.20902047, 1.16203558, 1.66100595
#> Iteration 2: coefficients =
#> -1.06441157, -0.48172053, 1.03446863, 1.94748763
#> Iteration 3: coefficients =
#> -1.06900269, -0.53506685, 1.04008399, 2.01003589
#> Iteration 4: coefficients =
#> -1.06900594, -0.53646505, 1.04008826, 2.01183839
#> Iteration 5: coefficients =
#> -1.06900594, -0.53647432, 1.04008826, 2.01185370
#> Iteration 6: coefficients =
#> -1.06900594, -0.53647441, 1.04008826, 2.01185384
#> Iteration 7: coefficients =
#> -1.06900594, -0.53647441, 1.04008826, 2.01185385
head(fitted(fit))
#> [,1]
#> 1 1.9759912
#> 2 0.9904521
#> 3 1.5515691
#> 4 1.3733370
#> 5 1.9054654
#> 6 1.1606096
head(predict(fit))
#> logitlink(pobs0) loglink(lambda)
#> 1 -0.1271217 1.2854223
#> 2 -1.0411610 -0.4826136
#> 3 -0.3328105 0.8875563
#> 4 -0.4567221 0.6478728
#> 5 -0.1559350 1.2296884
#> 6 -0.6773613 0.2210881
head(predict(fit, untransform = TRUE))
#> pobs0 lambda
#> 1 0.4682623 3.6161949
#> 2 0.2609260 0.6171683
#> 3 0.4175569 2.4291863
#> 4 0.3877637 1.9114704
#> 5 0.4610950 3.4201638
#> 6 0.3368505 1.2474333
coef(fit, matrix = TRUE)
#> logitlink(pobs0) loglink(lambda)
#> (Intercept) -1.069006 -0.5364744
#> x2 1.040088 2.0118538
summary(fit)
#>
#> Call:
#> vglm(formula = y ~ x2, family = zapoisson, data = zdata, trace = TRUE,
#> crit = "coef")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -1.06901 0.13918 -7.681 1.58e-14 ***
#> (Intercept):2 -0.53647 0.09228 -5.814 6.11e-09 ***
#> x2:1 1.04009 0.23602 4.407 1.05e-05 ***
#> x2:2 2.01185 0.13091 15.368 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(pobs0), loglink(lambda)
#>
#> Log-likelihood: -1474.983 on 1996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 7
#>
# Another example ------------------------------
# Data from Angers and Biswas (2003)
abdata <- data.frame(y = 0:7, w = c(182, 41, 12, 2, 2, 0, 0, 1))
abdata <- subset(abdata, w > 0)
Abdata <- data.frame(yy = with(abdata, rep(y, w)))
fit3 <- vglm(yy ~ 1, zapoisson, data = Abdata, trace = TRUE, crit = "coef")
#> Iteration 1: coefficients = 1.25650524, 0.12608894
#> Iteration 2: coefficients = 1.14002466, -0.14124406
#> Iteration 3: coefficients = 1.14356045, -0.16530315
#> Iteration 4: coefficients = 1.14356368, -0.16572625
#> Iteration 5: coefficients = 1.14356368, -0.16572636
#> Iteration 6: coefficients = 1.14356368, -0.16572636
coef(fit3, matrix = TRUE)
#> logitlink(pobs0) loglink(lambda)
#> (Intercept) 1.143564 -0.1657264
Coef(fit3) # Estimate lambda (they get 0.6997 with SE 0.1520)
#> pobs0 lambda
#> 0.7583333 0.8472781
head(fitted(fit3), 1)
#> [,1]
#> 1 0.3583333
with(Abdata, mean(yy)) # Compare this with fitted(fit3)
#> [1] 0.3583333