Zero-Inflated Geometric Distribution Family Function
zigeometric.RdFits a zero-inflated geometric distribution by maximum likelihood estimation.
Usage
zigeometric(lpstr0 = "logitlink", lprob = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
ipstr0 = NULL, iprob = NULL,
imethod = 1, bias.red = 0.5, zero = NULL)
zigeometricff(lprob = "logitlink", lonempstr0 = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
iprob = NULL, ionempstr0 = NULL,
imethod = 1, bias.red = 0.5, zero = "onempstr0")Arguments
- lpstr0, lprob
Link functions for the parameters \(\phi\) and \(p\) (
prob). The usual geometric probability parameter is the latter. The probability of a structural zero is the former. SeeLinksfor more choices. For the zero-deflated model see below.
- lonempstr0, ionempstr0
Corresponding arguments for the other parameterization. See details below.
- bias.red
A constant used in the initialization process of
pstr0. It should lie between 0 and 1, with 1 having no effect.- type.fitted
See
CommonVGAMffArgumentsandfittedvlmfor information.- ipstr0, iprob
See
CommonVGAMffArgumentsfor information.- zero, imethod
See
CommonVGAMffArgumentsfor information.
Details
Function zigeometric() is based on
$$P(Y=0) = \phi + (1-\phi) p,$$
for \(y=0\), and
$$P(Y=y) = (1-\phi) p (1 - p)^{y}.$$
for \(y=1,2,\ldots\).
The parameter \(\phi\) satisfies \(0 < \phi < 1\). The mean of \(Y\) is \(E(Y)=(1-\phi) p / (1-p)\) and these are returned as the fitted values
by default.
By default, the two linear/additive predictors
are \((logit(\phi), logit(p))^T\).
Multiple responses are handled.
Estimated probabilities of a structural zero and an
observed zero can be returned, as in zipoisson;
see fittedvlm for information.
The VGAM family function zigeometricff() has a few
changes compared to zigeometric().
These are:
(i) the order of the linear/additive predictors is switched so the
geometric probability comes first;
(ii) argument onempstr0 is now 1 minus
the probability of a structural zero, i.e.,
the probability of the parent (geometric) component,
i.e., onempstr0 is 1-pstr0;
(iii) argument zero has a new default so that the onempstr0
is intercept-only by default.
Now zigeometricff() is generally recommended over
zigeometric().
Both functions implement Fisher scoring and can handle
multiple responses.
Value
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm
and vgam.
Note
The zero-deflated geometric distribution might
be fitted by setting lpstr0 = identitylink, albeit,
not entirely reliably. See zipoisson
for information that can be applied here. Else
try the zero-altered geometric distribution (see
zageometric).
Examples
gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta1 = 1.0 - 1.0 * x2 + 2.0 * x3,
eta2 = -1.0,
eta3 = 0.5)
gdata <- transform(gdata, prob1 = logitlink(eta1, inverse = TRUE),
prob2 = logitlink(eta2, inverse = TRUE),
prob3 = logitlink(eta3, inverse = TRUE))
gdata <- transform(gdata, y1 = rzigeom(nn, prob1, pstr0 = prob3),
y2 = rzigeom(nn, prob2, pstr0 = prob3),
y3 = rzigeom(nn, prob2, pstr0 = prob3))
with(gdata, table(y1))
#> y1
#> 0 1 2 3 4 5 6 7
#> 890 78 15 9 3 3 1 1
with(gdata, table(y2))
#> y2
#> 0 1 2 3 4 5 6 7 8 9 10 11 13 14 21
#> 717 88 45 36 31 15 24 9 11 7 7 3 3 3 1
with(gdata, table(y3))
#> y3
#> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 17 22
#> 734 80 47 38 28 19 12 10 9 6 6 5 1 2 1 1 1
head(gdata)
#> x2 x3 x4 eta1 eta2 eta3 prob1 prob2
#> 1 0.04401446 -0.2597102 -0.35692367 0.4365652 -1 0.5 0.6074403 0.2689414
#> 2 0.16675910 -0.3665792 -0.27325186 0.1000826 -1 0.5 0.5249998 0.2689414
#> 3 -0.08202979 0.1915942 -0.02811507 1.4652181 -1 0.5 0.8123295 0.2689414
#> 4 0.17102154 0.3177276 0.10254225 1.4644336 -1 0.5 0.8122098 0.2689414
#> 5 0.10573920 0.4613931 0.46927029 1.8170469 -1 0.5 0.8602114 0.2689414
#> 6 -0.45357397 0.1587968 0.12485817 1.7711676 -1 0.5 0.8546028 0.2689414
#> prob3 y1 y2 y3
#> 1 0.6224593 0 0 0
#> 2 0.6224593 0 0 0
#> 3 0.6224593 0 0 1
#> 4 0.6224593 0 0 0
#> 5 0.6224593 0 1 0
#> 6 0.6224593 1 0 0
fit1 <- vglm(y1 ~ x2 + x3 + x4, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -476.04541
#> Iteration 2: loglikelihood = -454.20131
#> Iteration 3: loglikelihood = -442.86763
#> Iteration 4: loglikelihood = -441.14748
#> Iteration 5: loglikelihood = -441.10756
#> Iteration 6: loglikelihood = -441.10705
#> Iteration 7: loglikelihood = -441.10704
#> Iteration 8: loglikelihood = -441.10704
coef(fit1, matrix = TRUE)
#> logitlink(pstr0) logitlink(prob)
#> (Intercept) 0.4420665 1.04765323
#> x2 0.0000000 -0.90381469
#> x3 0.0000000 2.28400122
#> x4 0.0000000 0.06262285
head(fitted(fit1, type = "pstr0"))
#> [,1]
#> [1,] 0.6087513
#> [2,] 0.6087513
#> [3,] 0.6087513
#> [4,] 0.6087513
#> [5,] 0.6087513
#> [6,] 0.6087513
fit2 <- vglm(cbind(y2, y3) ~ 1, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -2357.5611
#> Iteration 2: loglikelihood = -2356.4175
#> Iteration 3: loglikelihood = -2356.4146
#> Iteration 4: loglikelihood = -2356.4146
coef(fit2, matrix = TRUE)
#> logitlink(pstr01) logitlink(prob1) logitlink(pstr02)
#> (Intercept) 0.4570263 -0.9944289 0.5478511
#> logitlink(prob2)
#> (Intercept) -0.9747015
summary(fit2)
#>
#> Call:
#> vglm(formula = cbind(y2, y3) ~ 1, family = zigeometric(zero = 1),
#> data = gdata, trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 0.45703 0.08774 5.209 1.90e-07 ***
#> (Intercept):2 -0.99443 0.06958 -14.293 < 2e-16 ***
#> (Intercept):3 0.54785 0.08855 6.187 6.13e-10 ***
#> (Intercept):4 -0.97470 0.07196 -13.546 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(pstr01), logitlink(prob1),
#> logitlink(pstr02), logitlink(prob2)
#>
#> Log-likelihood: -2356.415 on 3996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 4
#>