Skip to contents

Cross-section data originating from the 1977–1978 Australian Health Survey.

Usage

data("DoctorVisits")

Format

A data frame containing 5,190 observations on 12 variables.

visits

Number of doctor visits in past 2 weeks.

gender

Factor indicating gender.

age

Age in years divided by 100.

income

Annual income in tens of thousands of dollars.

illness

Number of illnesses in past 2 weeks.

reduced

Number of days of reduced activity in past 2 weeks due to illness or injury.

health

General health questionnaire score using Goldberg's method.

private

Factor. Does the individual have private health insurance?

freepoor

Factor. Does the individual have free government health insurance due to low income?

freerepat

Factor. Does the individual have free government health insurance due to old age, disability or veteran status?

nchronic

Factor. Is there a chronic condition not limiting activity?

lchronic

Factor. Is there a chronic condition limiting activity?

Source

Journal of Applied Econometrics Data Archive.

http://qed.econ.queensu.ca/jae/1997-v12.3/mullahy/

References

Cameron, A.C. and Trivedi, P.K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1, 29–53.

Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.

Mullahy, J. (1997). Heterogeneity, Excess Zeros, and the Structure of Count Data Models. Journal of Applied Econometrics, 12, 337–350.

Examples

data("DoctorVisits", package = "AER")
library("MASS")

## Cameron and Trivedi (1986), Table III, col. (1)
dv_lm <- lm(visits ~ . + I(age^2), data = DoctorVisits)
summary(dv_lm)
#> 
#> Call:
#> lm(formula = visits ~ . + I(age^2), data = DoctorVisits)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.1352 -0.2588 -0.1435 -0.0433  7.0327 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   0.027632   0.072220   0.383  0.70202    
#> genderfemale  0.033811   0.021604   1.565  0.11764    
#> age           0.203201   0.410016   0.496  0.62020    
#> income       -0.057323   0.033089  -1.732  0.08326 .  
#> illness       0.059946   0.008357   7.173 8.39e-13 ***
#> reduced       0.103192   0.003657  28.216  < 2e-16 ***
#> health        0.016976   0.005190   3.271  0.00108 ** 
#> privateyes    0.035179   0.024882   1.414  0.15748    
#> freepooryes  -0.103314   0.052471  -1.969  0.04901 *  
#> freerepatyes  0.033241   0.038157   0.871  0.38371    
#> nchronicyes   0.004384   0.023740   0.185  0.85349    
#> lchronicyes   0.041617   0.035863   1.160  0.24592    
#> I(age^2)     -0.062103   0.458716  -0.135  0.89231    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.7139 on 5177 degrees of freedom
#> Multiple R-squared:  0.2018,	Adjusted R-squared:    0.2 
#> F-statistic: 109.1 on 12 and 5177 DF,  p-value: < 2.2e-16
#> 

## Cameron and Trivedi (1998), Table 3.3 
dv_pois <- glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)
summary(dv_pois)                  ## MLH standard errors
#> 
#> Call:
#> glm(formula = visits ~ . + I(age^2), family = poisson, data = DoctorVisits)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -2.223848   0.189816 -11.716   <2e-16 ***
#> genderfemale  0.156882   0.056137   2.795   0.0052 ** 
#> age           1.056299   1.000780   1.055   0.2912    
#> income       -0.205321   0.088379  -2.323   0.0202 *  
#> illness       0.186948   0.018281  10.227   <2e-16 ***
#> reduced       0.126846   0.005034  25.198   <2e-16 ***
#> health        0.030081   0.010099   2.979   0.0029 ** 
#> privateyes    0.123185   0.071640   1.720   0.0855 .  
#> freepooryes  -0.440061   0.179811  -2.447   0.0144 *  
#> freerepatyes  0.079798   0.092060   0.867   0.3860    
#> nchronicyes   0.114085   0.066640   1.712   0.0869 .  
#> lchronicyes   0.141158   0.083145   1.698   0.0896 .  
#> I(age^2)     -0.848704   1.077784  -0.787   0.4310    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 5634.8  on 5189  degrees of freedom
#> Residual deviance: 4379.5  on 5177  degrees of freedom
#> AIC: 6737.1
#> 
#> Number of Fisher Scoring iterations: 6
#> 
coeftest(dv_pois, vcov = vcovOPG) ## MLOP standard errors
#> 
#> z test of coefficients:
#> 
#>                Estimate Std. Error  z value  Pr(>|z|)    
#> (Intercept)  -2.2238482  0.1443306 -15.4080 < 2.2e-16 ***
#> genderfemale  0.1568820  0.0406153   3.8626 0.0001122 ***
#> age           1.0562990  0.7498654   1.4087 0.1589382    
#> income       -0.2053206  0.0619209  -3.3159 0.0009136 ***
#> illness       0.1869484  0.0141893  13.1753 < 2.2e-16 ***
#> reduced       0.1268465  0.0035073  36.1661 < 2.2e-16 ***
#> health        0.0300810  0.0073544   4.0902  4.31e-05 ***
#> privateyes    0.1231854  0.0560472   2.1979 0.0279571 *  
#> freepooryes  -0.4400609  0.1163511  -3.7822 0.0001555 ***
#> freerepatyes  0.0797984  0.0700594   1.1390 0.2546984    
#> nchronicyes   0.1140853  0.0514849   2.2159 0.0266986 *  
#> lchronicyes   0.1411583  0.0586310   2.4076 0.0160591 *  
#> I(age^2)     -0.8487036  0.8092146  -1.0488 0.2942705    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
logLik(dv_pois)
#> 'log Lik.' -3355.541 (df=13)
## standard errors denoted RS ("unspecified omega robust sandwich estimate")
coeftest(dv_pois, vcov = sandwich)
#> 
#> z test of coefficients:
#> 
#>                Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept)  -2.2238482  0.2544322 -8.7404 < 2.2e-16 ***
#> genderfemale  0.1568820  0.0792133  1.9805   0.04765 *  
#> age           1.0562990  1.3643427  0.7742   0.43880    
#> income       -0.2053206  0.1292447 -1.5886   0.11215    
#> illness       0.1869484  0.0239364  7.8102 5.709e-15 ***
#> reduced       0.1268465  0.0077691 16.3271 < 2.2e-16 ***
#> health        0.0300810  0.0142345  2.1132   0.03458 *  
#> privateyes    0.1231854  0.0951560  1.2946   0.19547    
#> freepooryes  -0.4400609  0.2899945 -1.5175   0.12915    
#> freerepatyes  0.0797984  0.1257832  0.6344   0.52581    
#> nchronicyes   0.1140853  0.0908453  1.2558   0.20918    
#> lchronicyes   0.1411583  0.1227108  1.1503   0.25001    
#> I(age^2)     -0.8487036  1.4595426 -0.5815   0.56091    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

## Cameron and Trivedi (1986), Table III, col. (4)
dv_nb <- glm.nb(visits ~ . + I(age^2), data = DoctorVisits)
summary(dv_nb)
#> 
#> Call:
#> glm.nb(formula = visits ~ . + I(age^2), data = DoctorVisits, 
#>     init.theta = 0.9284725333, link = log)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -2.190007   0.233592  -9.375  < 2e-16 ***
#> genderfemale  0.216644   0.069697   3.108  0.00188 ** 
#> age          -0.216159   1.266701  -0.171  0.86450    
#> income       -0.142202   0.108417  -1.312  0.18965    
#> illness       0.214341   0.023579   9.090  < 2e-16 ***
#> reduced       0.143754   0.007311  19.662  < 2e-16 ***
#> health        0.038060   0.013654   2.788  0.00531 ** 
#> privateyes    0.118064   0.085806   1.376  0.16884    
#> freepooryes  -0.496611   0.210803  -2.356  0.01848 *  
#> freerepatyes  0.144982   0.115970   1.250  0.21124    
#> nchronicyes   0.099355   0.079303   1.253  0.21026    
#> lchronicyes   0.190327   0.104357   1.824  0.06818 .  
#> I(age^2)      0.609158   1.383245   0.440  0.65966    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for Negative Binomial(0.9285) family taken to be 1)
#> 
#>     Null deviance: 3928.7  on 5189  degrees of freedom
#> Residual deviance: 3028.3  on 5177  degrees of freedom
#> AIC: 6425.5
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  0.9285 
#>           Std. Err.:  0.0864 
#> 
#>  2 x log-likelihood:  -6397.4880 
logLik(dv_nb)
#> 'log Lik.' -3198.744 (df=14)