Linear Model and Binomial Mixed Data Type Family Function
N1binomial.RdEstimate the four parameters of the (bivariate) \(N_1\)–binomial copula mixed data type model by maximum likelihood estimation.
Usage
N1binomial(lmean = "identitylink", lsd = "loglink",
lvar = "loglink", lprob = "logitlink", lapar = "rhobitlink",
zero = c(if (var.arg) "var" else "sd", "apar"),
nnodes = 20, copula = "gaussian", var.arg = FALSE,
imethod = 1, isd = NULL, iprob = NULL, iapar = NULL)Arguments
- lmean, lsd, lvar, lprob, lapar
Details at
CommonVGAMffArguments. SeeLinksfor more link function choices.- imethod, isd, iprob, iapar
Initial values. Details at
CommonVGAMffArguments.- zero
Details at
CommonVGAMffArguments.- nnodes
Number of nodes and weights for the Gauss–Hermite (GH) quadrature. While a higher value should be more accurate, setting an excessive value runs the risk of evaluating some special functions near the boundary of the parameter space and producing numerical problems.
- copula
Type of copula used. Currently only the bivariate normal is used but more might be implemented in the future.
- var.arg
See
uninormal.
Details
The bivariate response comprises \(Y_1\)
from the linear model having parameters
mean and sd for
\(\mu_1\) and \(\sigma_1\),
and the binary
\(Y_2\) having parameter
prob for its mean \(\mu_2\).
The
joint probability density/mass function is
\(P(y_1, Y_2 = 0) = \phi_1(y_1; \mu_1, \sigma_1)
(1 - \Delta)\)
and
\(P(y_1, Y_2 = 1) = \phi_1(y_1; \mu_1, \sigma_1)
\Delta\)
where \(\Delta\) adjusts \(\mu_2\)
according to the association parameter
\(\alpha\).
The quantity \(\Delta\) is
\(\Phi((\Phi^{-1}(\mu_2) - \alpha Z_1)/
\sqrt{1 - \alpha^2})\).
The quantity \(Z_1\) is \((Y_1-\mu_1) / \sigma_1\).
Thus there is an underlying bivariate normal
distribution, and a copula is used to bring the
two marginal distributions together.
Here,
\(-1 < \alpha < 1\), and
\(\Phi\) is the
cumulative distribution function
pnorm
of a standard univariate normal.
The first marginal
distribution is a normal distribution
for the linear model.
The second column of the response must
have values 0 or 1,
e.g.,
Bernoulli random variables.
When \(\alpha = 0\)
then \(\Delta=\mu_2\).
Together, this family function combines
uninormal and
binomialff.
If the response are correlated then
a more efficient joint analysis
should result.
This VGAM family function cannot handle multiple responses. Only a two-column matrix is allowed. The two-column fitted value matrix has columns \(\mu_1\) and \(\mu_2\).
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm
and vgam.
References
Song, P. X.-K. (2007). Correlated Data Analysis: Modeling, Analytics, and Applications. Springer.
Note
This VGAM family function is fragile. Because the EIMs are approximated by GH quadrature it is found that convergence may be a little slower than for other models whose EIM is tractable. Also, the log-likelihood may be flat at the MLE with respect to \(\alpha\) especially because the correlation between the two mixed data types may be weak.
It pays to set trace = TRUE to
monitor convergence, especially when
abs(apar) is high.
Examples
nn <- 1000; mymu <- 1; sdev <- exp(1)
apar <- rhobitlink(0.5, inverse = TRUE)
prob <- logitlink(0.5, inverse = TRUE)
mat <- rN1binom(nn, mymu, sdev, prob, apar)
nbdata <- data.frame(y1 = mat[, 1], y2 = mat[, 2])
fit1 <- vglm(cbind(y1, y2) ~ 1, N1binomial,
nbdata, trace = TRUE)
#> Iteration 1: loglikelihood = -6125.0207
#> Iteration 2: loglikelihood = -6124.1736
#> Iteration 3: loglikelihood = -6124.128
#> Iteration 4: loglikelihood = -6124.1258
#> Iteration 5: loglikelihood = -6124.1257
#> Iteration 6: loglikelihood = -6124.1257
coef(fit1, matrix = TRUE)
#> mean loglink(sd) logitlink(prob) rhobitlink(apar)
#> (Intercept) 0.9993132 1.004576 0.4807467 0.5943222
Coef(fit1)
#> mean sd prob apar
#> 0.9993132 2.7307496 0.6179242 0.2887125
head(fitted(fit1))
#> y1 y2
#> 1 0.9993132 0.6179242
#> 2 0.9993132 0.6179242
#> 3 0.9993132 0.6179242
#> 4 0.9993132 0.6179242
#> 5 0.9993132 0.6179242
#> 6 0.9993132 0.6179242
summary(fit1)
#>
#> Call:
#> vglm(formula = cbind(y1, y2) ~ 1, family = N1binomial, data = nbdata,
#> trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 0.99931 0.08635 11.572 < 2e-16 ***
#> (Intercept):2 1.00458 0.02236 44.926 < 2e-16 ***
#> (Intercept):3 0.48075 0.07137 6.736 1.63e-11 ***
#> (Intercept):4 0.59432 0.09024 6.586 4.51e-11 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: mean, loglink(sd), logitlink(prob),
#> rhobitlink(apar)
#>
#> Log-likelihood: -6124.126 on 3996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 6
#>
confint(fit1)
#> 2.5 % 97.5 %
#> (Intercept):1 0.8300627 1.1685637
#> (Intercept):2 0.9607500 1.0484023
#> (Intercept):3 0.3408653 0.6206282
#> (Intercept):4 0.4174590 0.7711853