Zero-Inflated Geometric Distribution Family Function

Fits a zero-inflated geometric distribution by maximum likelihood estimation.

Usage

zigeometric(lpstr0  = "logitlink", lprob = "logitlink",
            type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
            ipstr0  = NULL, iprob = NULL,
            imethod = 1, bias.red = 0.5, zero = NULL)
zigeometricff(lprob = "logitlink", lonempstr0 = "logitlink",
              type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
              iprob = NULL, ionempstr0 = NULL,
              imethod = 1, bias.red = 0.5, zero = "onempstr0")

Arguments

lpstr0, lprob

Link functions for the parameters $\phi$ and $p$ (prob). The usual geometric probability parameter is the latter. The probability of a structural zero is the former. See Links for more choices. For the zero-deflated model see below.

lonempstr0, ionempstr0

Corresponding arguments for the other parameterization. See details below.

bias.red

A constant used in the initialization process of pstr0. It should lie between 0 and 1, with 1 having no effect.

type.fitted

See CommonVGAMffArguments and fittedvlm for information.

ipstr0, iprob

See CommonVGAMffArguments for information.

zero, imethod

See CommonVGAMffArguments for information.

Details

Function zigeometric() is based on $$P(Y=0) = \phi + (1-\phi) p,$$ for $y=0$, and $$P(Y=y) = (1-\phi) p (1 - p)^{y}.$$ for $y=1,2,\ldots$. The parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $E(Y)=(1-\phi) p / (1-p)$ and these are returned as the fitted values by default. By default, the two linear/additive predictors are $(logit(\phi), logit(p))^T$. Multiple responses are handled.

Estimated probabilities of a structural zero and an observed zero can be returned, as in zipoisson; see fittedvlm for information.

The VGAM family function zigeometricff() has a few changes compared to zigeometric(). These are: (i) the order of the linear/additive predictors is switched so the geometric probability comes first; (ii) argument onempstr0 is now 1 minus the probability of a structural zero, i.e., the probability of the parent (geometric) component, i.e., onempstr0 is 1-pstr0; (iii) argument zero has a new default so that the onempstr0 is intercept-only by default. Now zigeometricff() is generally recommended over zigeometric(). Both functions implement Fisher scoring and can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

Author

T. W. Yee

Note

The zero-deflated geometric distribution might be fitted by setting lpstr0 = identitylink, albeit, not entirely reliably. See zipoisson for information that can be applied here. Else try the zero-altered geometric distribution (see zageometric).

Examples

gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
                          x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta1 =  1.0 - 1.0 * x2 + 2.0 * x3,
                          eta2 = -1.0,
                          eta3 =  0.5)
gdata <- transform(gdata, prob1 = logitlink(eta1, inverse = TRUE),
                          prob2 = logitlink(eta2, inverse = TRUE),
                          prob3 = logitlink(eta3, inverse = TRUE))
gdata <- transform(gdata, y1 = rzigeom(nn, prob1, pstr0 = prob3),
                          y2 = rzigeom(nn, prob2, pstr0 = prob3),
                          y3 = rzigeom(nn, prob2, pstr0 = prob3))
with(gdata, table(y1))
#> y1
#>   0   1   2   3   4   5   6   7 
#> 890  78  15   9   3   3   1   1 
with(gdata, table(y2))
#> y2
#>   0   1   2   3   4   5   6   7   8   9  10  11  13  14  21 
#> 717  88  45  36  31  15  24   9  11   7   7   3   3   3   1 
with(gdata, table(y3))
#> y3
#>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  16  17  22 
#> 734  80  47  38  28  19  12  10   9   6   6   5   1   2   1   1   1 
head(gdata)
#>            x2         x3          x4      eta1 eta2 eta3     prob1     prob2
#> 1  0.04401446 -0.2597102 -0.35692367 0.4365652   -1  0.5 0.6074403 0.2689414
#> 2  0.16675910 -0.3665792 -0.27325186 0.1000826   -1  0.5 0.5249998 0.2689414
#> 3 -0.08202979  0.1915942 -0.02811507 1.4652181   -1  0.5 0.8123295 0.2689414
#> 4  0.17102154  0.3177276  0.10254225 1.4644336   -1  0.5 0.8122098 0.2689414
#> 5  0.10573920  0.4613931  0.46927029 1.8170469   -1  0.5 0.8602114 0.2689414
#> 6 -0.45357397  0.1587968  0.12485817 1.7711676   -1  0.5 0.8546028 0.2689414
#>       prob3 y1 y2 y3
#> 1 0.6224593  0  0  0
#> 2 0.6224593  0  0  0
#> 3 0.6224593  0  0  1
#> 4 0.6224593  0  0  0
#> 5 0.6224593  0  1  0
#> 6 0.6224593  1  0  0

fit1 <- vglm(y1 ~ x2 + x3 + x4, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -476.04541
#> Iteration 2: loglikelihood = -454.20131
#> Iteration 3: loglikelihood = -442.86763
#> Iteration 4: loglikelihood = -441.14748
#> Iteration 5: loglikelihood = -441.10756
#> Iteration 6: loglikelihood = -441.10705
#> Iteration 7: loglikelihood = -441.10704
#> Iteration 8: loglikelihood = -441.10704
coef(fit1, matrix = TRUE)
#>             logitlink(pstr0) logitlink(prob)
#> (Intercept)        0.4420665      1.04765323
#> x2                 0.0000000     -0.90381469
#> x3                 0.0000000      2.28400122
#> x4                 0.0000000      0.06262285
head(fitted(fit1, type = "pstr0"))
#>           [,1]
#> [1,] 0.6087513
#> [2,] 0.6087513
#> [3,] 0.6087513
#> [4,] 0.6087513
#> [5,] 0.6087513
#> [6,] 0.6087513

fit2 <- vglm(cbind(y2, y3) ~ 1, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -2357.5611
#> Iteration 2: loglikelihood = -2356.4175
#> Iteration 3: loglikelihood = -2356.4146
#> Iteration 4: loglikelihood = -2356.4146
coef(fit2, matrix = TRUE)
#>             logitlink(pstr01) logitlink(prob1) logitlink(pstr02)
#> (Intercept)         0.4570263       -0.9944289         0.5478511
#>             logitlink(prob2)
#> (Intercept)       -0.9747015
summary(fit2)
#> 
#> Call:
#> vglm(formula = cbind(y2, y3) ~ 1, family = zigeometric(zero = 1), 
#>     data = gdata, trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1  0.45703    0.08774   5.209 1.90e-07 ***
#> (Intercept):2 -0.99443    0.06958 -14.293  < 2e-16 ***
#> (Intercept):3  0.54785    0.08855   6.187 6.13e-10 ***
#> (Intercept):4 -0.97470    0.07196 -13.546  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: logitlink(pstr01), logitlink(prob1), 
#> logitlink(pstr02), logitlink(prob2)
#> 
#> Log-likelihood: -2356.415 on 3996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 4 
#>