Exchangeable Bivariate cloglog Odds-ratio Model From a Zero-inflated Poisson Distribution

Fits an exchangeable bivariate odds-ratio model to two binary responses with a complementary log-log link. The data are assumed to come from a zero-inflated Poisson distribution that has been converted to presence/absence.

Usage

zipebcom(lmu12 = "clogloglink", lphi12 = "logitlink", loratio = "loglink",
         imu12 = NULL, iphi12 = NULL, ioratio = NULL,
         zero = c("phi12", "oratio"), tol = 0.001, addRidge = 0.001)

Arguments

lmu12, imu12

Link function, extra argument and optional initial values for the first (and second) marginal probabilities. Argument lmu12 should be left alone. Argument imu12 may be of length 2 (one element for each response).

lphi12

Link function applied to the $\phi$ parameter of the zero-inflated Poisson distribution (see zipoisson). See Links for more choices.

loratio

Link function applied to the odds ratio. See Links for more choices.

iphi12, ioratio

Optional initial values for $\phi$ and the odds ratio. See CommonVGAMffArguments for more details. In general, good initial values (especially for iphi12) are often required, therefore use these arguments if convergence failure occurs. If inputted, the value of iphi12 cannot be more than the sample proportions of zeros in either response.

zero

Which linear/additive predictor is modelled as an intercept only? A NULL means none. The default has both $\phi$ and the odds ratio as not being modelled as a function of the explanatory variables (apart from an intercept). See CommonVGAMffArguments for information.

tol

Tolerance for testing independence. Should be some small positive numerical value.

addRidge

Some small positive numerical value. The first two diagonal elements of the working weight matrices are multiplied by 1+addRidge to make it diagonally dominant, therefore positive-definite.

Details

This VGAM family function fits an exchangeable bivariate odds ratio model (binom2.or) with a clogloglink link. The data are assumed to come from a zero-inflated Poisson (ZIP) distribution that has been converted to presence/absence. Explicitly, the default model is $$cloglog[P(Y_j=1)/(1-\phi)] = \eta_1,\ \ \ j=1,2$$ for the (exchangeable) marginals, and $$logit[\phi] = \eta_2,$$ for the mixing parameter, and $$\log[P(Y_{00}=1) P(Y_{11}=1) / (P(Y_{01}=1) P(Y_{10}=1))] = \eta_3,$$ specifies the dependency between the two responses. Here, the responses equal 1 for a success and a 0 for a failure, and the odds ratio is often written $\psi=p_{00}p_{11}/(p_{10}p_{01})$. We have $p_{10} = p_{01}$ because of the exchangeability.

The second linear/additive predictor models the $\phi$ parameter (see zipoisson). The third linear/additive predictor is the same as binom2.or, viz., the log odds ratio.

Suppose a dataset1 comes from a Poisson distribution that has been converted to presence/absence, and that both marginal probabilities are the same (exchangeable). Then binom2.or("clogloglink", exch=TRUE) is appropriate. Now suppose a dataset2 comes from a zero-inflated Poisson distribution. The first linear/additive predictor of zipebcom() applied to dataset2 is the same as that of binom2.or("clogloglink", exch=TRUE) applied to dataset1. That is, the $\phi$ has been taken care of by zipebcom() so that it is just like the simpler binom2.or.

Note that, for $\eta_1$, mu12 = prob12 / (1-phi12) where prob12 is the probability of a 1 under the ZIP model. Here, mu12 correspond to mu1 and mu2 in the binom2.or-Poisson model.

If $\phi=0$ then zipebcom() should be equivalent to binom2.or("clogloglink", exch=TRUE). Full details are given in Yee and Dirnbock (2009).

The leading $2 \times 2$ submatrix of the expected information matrix (EIM) is of rank-1, not 2! This is due to the fact that the parameters corresponding to the first two linear/additive predictors are unidentifiable. The quick fix around this problem is to use the addRidge adjustment. The model is fitted by maximum likelihood estimation since the full likelihood is specified. Fisher scoring is implemented.

The default models $\eta_2$ and $\eta_3$ as single parameters only, but this can be circumvented by setting zero=NULL in order to model the $\phi$ and odds ratio as a function of all the explanatory variables.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

When fitted, the fitted.values slot of the object contains the four joint probabilities, labelled as $(Y_1,Y_2)$ = (0,0), (0,1), (1,0), (1,1), respectively. These estimated probabilities should be extracted with the fitted generic function.

Warning

The fact that the EIM is not of full rank may mean the model is naturally ill-conditioned. Not sure whether there are any negative consequences wrt theory. For now it is certainly safer to fit binom2.or to bivariate binary responses.

References

Yee, T. W. and Dirnbock, T. (2009). Models for analysing species' presence/absence data at two time points. Journal of Theoretical Biology, 259(4), 684–694.

Note

The "12" in the argument names reinforce the user about the exchangeability assumption. The name of this VGAM family function stands for zero-inflated Poisson exchangeable bivariate complementary log-log odds-ratio model or ZIP-EBCOM.

See binom2.or for details that are pertinent to this VGAM family function too. Even better initial values are usually needed here.

The xij (see vglm.control) argument enables environmental variables with different values at the two time points to be entered into an exchangeable binom2.or model. See the author's webpage for sample code.

Examples

if (FALSE) { # \dontrun{
zdata <- data.frame(x2 = seq(0, 1, len = (nsites <- 2000)))
zdata <- transform(zdata, eta1 =  -3 + 5 * x2,
                         phi1 = logitlink(-1, inverse = TRUE),
                         oratio = exp(2))
zdata <- transform(zdata, mu12 = clogloglink(eta1, inverse = TRUE) * (1-phi1))
tmat <-  with(zdata, rbinom2.or(nsites, mu1 = mu12, oratio = oratio, exch = TRUE))
zdata <- transform(zdata, ybin1 = tmat[, 1], ybin2 = tmat[, 2])

with(zdata, table(ybin1, ybin2)) / nsites  # For interest only
# Various plots of the data, for interest only
par(mfrow = c(2, 2))
plot(jitter(ybin1) ~ x2, data = zdata, col = "blue")

plot(jitter(ybin2) ~ jitter(ybin1), data = zdata, col = "blue")

plot(mu12 ~ x2, data = zdata, col = "blue", type = "l", ylim = 0:1,
     ylab = "Probability", main = "Marginal probability and phi")
with(zdata, abline(h = phi1[1], col = "red", lty = "dashed"))

tmat2 <- with(zdata, dbinom2.or(mu1 = mu12, oratio = oratio, exch = TRUE))
with(zdata, matplot(x2, tmat2, col = 1:4, type = "l", ylim = 0:1,
     ylab = "Probability", main = "Joint probabilities"))

# Now fit the model to the data.
fit <- vglm(cbind(ybin1, ybin2) ~ x2, zipebcom, data = zdata, trace = TRUE)
coef(fit, matrix = TRUE)
summary(fit)
vcov(fit)   } # }