Skip to contents

Cross-section data originating from the 1976 Panel Study of Income Dynamics (PSID), based on data for the previous year, 1975.

Usage

data("PSID1976")

Format

A data frame containing 753 observations on 21 variables.

participation

Factor. Did the individual participate in the labor force in 1975? (This is essentially wage > 0 or hours > 0.)

hours

Wife's hours of work in 1975.

youngkids

Number of children less than 6 years old in household.

oldkids

Number of children between ages 6 and 18 in household.

age

Wife's age in years.

education

Wife's education in years.

wage

Wife's average hourly wage, in 1975 dollars.

repwage

Wife's wage reported at the time of the 1976 interview (not the same as the 1975 estimated wage). To use the subsample with this wage, one needs to select 1975 workers with participation == "yes", then select only those women with non-zero wage. Only 325 women work in 1975 and have a non-zero wage in 1976.

hhours

Husband's hours worked in 1975.

hage

Husband's age in years.

heducation

Husband's education in years.

hwage

Husband's wage, in 1975 dollars.

fincome

Family income, in 1975 dollars. (This variable is used to construct the property income variable.)

tax

Marginal tax rate facing the wife, and is taken from published federal tax tables (state and local income taxes are excluded). The taxable income on which this tax rate is calculated includes Social Security, if applicable to wife.

meducation

Wife's mother's educational attainment, in years.

feducation

Wife's father's educational attainment, in years.

unemp

Unemployment rate in county of residence, in percentage points. (This is taken from bracketed ranges.)

city

Factor. Does the individual live in a large city?

experience

Actual years of wife's previous labor market experience.

college

Factor. Did the individual attend college?

hcollege

Factor. Did the individual's husband attend college?

Details

This data set is also known as the Mroz (1987) data.

Warning: Typical applications using these data employ the variable wage (aka earnings in previous versions of the data) as the dependent variable. The variable repwage is the reported wage in a 1976 interview, named RPWG by Greene (2003).

Source

Online complements to Greene (2003). Table F4.1.

https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm

References

Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.

McCullough, B.D. (2004). Some Details of Nonlinear Estimation. In: Altman, M., Gill, J., and McDonald, M.P.: Numerical Issues in Statistical Computing for the Social Scientist. Hoboken, NJ: John Wiley, Ch. 8, 199–218.

Mroz, T.A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions. Econometrica, 55, 765–799.

Winkelmann, R., and Boes, S. (2009). Analysis of Microdata, 2nd ed. Berlin and Heidelberg: Springer-Verlag.

Wooldridge, J.M. (2002). Econometric Analysis of Cross-Section and Panel Data. Cambridge, MA: MIT Press.

Examples

## data and transformations
data("PSID1976")
PSID1976$kids <- with(PSID1976, factor((youngkids + oldkids) > 0,
  levels = c(FALSE, TRUE), labels = c("no", "yes")))
PSID1976$nwincome <- with(PSID1976, (fincome - hours * wage)/1000)
PSID1976$partnum <- as.numeric(PSID1976$participation) - 1

###################
## Greene (2003) ##
###################

## Example 4.1, Table 4.2
## (reproduced in Example 7.1, Table 7.1)
gr_lm <- lm(log(hours * wage) ~ age + I(age^2) + education + kids,
  data = PSID1976, subset = participation == "yes")
summary(gr_lm)
#> 
#> Call:
#> lm(formula = log(hours * wage) ~ age + I(age^2) + education + 
#>     kids, data = PSID1976, subset = participation == "yes")
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.5305 -0.5266  0.3003  0.8474  1.7568 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  3.2400965  1.7674296   1.833  0.06747 . 
#> age          0.2005573  0.0838603   2.392  0.01721 * 
#> I(age^2)    -0.0023147  0.0009869  -2.345  0.01947 * 
#> education    0.0674727  0.0252486   2.672  0.00782 **
#> kidsyes     -0.3511952  0.1475326  -2.380  0.01773 * 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.19 on 423 degrees of freedom
#> Multiple R-squared:  0.041,	Adjusted R-squared:  0.03193 
#> F-statistic: 4.521 on 4 and 423 DF,  p-value: 0.001382
#> 
vcov(gr_lm)
#>             (Intercept)           age      I(age^2)     education       kidsyes
#> (Intercept)  3.12380756 -1.440901e-01  1.661740e-03 -9.260920e-03  2.674867e-02
#> age         -0.14409007  7.032544e-03 -8.232369e-05  5.085495e-05 -2.641203e-03
#> I(age^2)     0.00166174 -8.232369e-05  9.739279e-07 -4.976114e-07  3.841018e-05
#> education   -0.00926092  5.085495e-05 -4.976114e-07  6.374903e-04 -5.461931e-05
#> kidsyes      0.02674867 -2.641203e-03  3.841018e-05 -5.461931e-05  2.176587e-02

## Example 4.5
summary(gr_lm)
#> 
#> Call:
#> lm(formula = log(hours * wage) ~ age + I(age^2) + education + 
#>     kids, data = PSID1976, subset = participation == "yes")
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.5305 -0.5266  0.3003  0.8474  1.7568 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  3.2400965  1.7674296   1.833  0.06747 . 
#> age          0.2005573  0.0838603   2.392  0.01721 * 
#> I(age^2)    -0.0023147  0.0009869  -2.345  0.01947 * 
#> education    0.0674727  0.0252486   2.672  0.00782 **
#> kidsyes     -0.3511952  0.1475326  -2.380  0.01773 * 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.19 on 423 degrees of freedom
#> Multiple R-squared:  0.041,	Adjusted R-squared:  0.03193 
#> F-statistic: 4.521 on 4 and 423 DF,  p-value: 0.001382
#> 
## or equivalently
gr_lm1 <- lm(log(hours * wage) ~ 1, data = PSID1976, subset = participation == "yes")
anova(gr_lm1, gr_lm)
#> Analysis of Variance Table
#> 
#> Model 1: log(hours * wage) ~ 1
#> Model 2: log(hours * wage) ~ age + I(age^2) + education + kids
#>   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
#> 1    427 625.08                                
#> 2    423 599.46  4    25.625 4.5206 0.001382 **
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Example 21.4, p. 681, and Tab. 21.3, p. 682
gr_probit1 <- glm(participation ~ age + I(age^2) + I(fincome/10000) + education + kids,
  data = PSID1976, family = binomial(link = "probit") )  
gr_probit2 <- glm(participation ~ age + I(age^2) + I(fincome/10000) + education,
  data = PSID1976, family = binomial(link = "probit"))
gr_probit3 <- glm(participation ~ kids/(age + I(age^2) + I(fincome/10000) + education),
  data = PSID1976, family = binomial(link = "probit"))
## LR test of all coefficients
lrtest(gr_probit1)
#> Likelihood ratio test
#> 
#> Model 1: participation ~ age + I(age^2) + I(fincome/10000) + education + 
#>     kids
#> Model 2: participation ~ 1
#>   #Df  LogLik Df  Chisq Pr(>Chisq)    
#> 1   6 -490.85                         
#> 2   1 -514.87 -5 48.051  3.468e-09 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## Chow-type test
lrtest(gr_probit2, gr_probit3)
#> Likelihood ratio test
#> 
#> Model 1: participation ~ age + I(age^2) + I(fincome/10000) + education
#> Model 2: participation ~ kids/(age + I(age^2) + I(fincome/10000) + education)
#>   #Df  LogLik Df  Chisq Pr(>Chisq)  
#> 1   5 -496.87                       
#> 2  10 -489.48  5 14.774    0.01137 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## equivalently:
anova(gr_probit2, gr_probit3, test = "Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: participation ~ age + I(age^2) + I(fincome/10000) + education
#> Model 2: participation ~ kids/(age + I(age^2) + I(fincome/10000) + education)
#>   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
#> 1       748     993.73                       
#> 2       743     978.96  5   14.774  0.01137 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## Table 21.3
summary(gr_probit1)
#> 
#> Call:
#> glm(formula = participation ~ age + I(age^2) + I(fincome/10000) + 
#>     education + kids, family = binomial(link = "probit"), data = PSID1976)
#> 
#> Coefficients:
#>                    Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      -4.1568189  1.4040095  -2.961 0.003070 ** 
#> age               0.1853957  0.0662076   2.800 0.005107 ** 
#> I(age^2)         -0.0024259  0.0007762  -3.125 0.001775 ** 
#> I(fincome/10000)  0.0458029  0.0430557   1.064 0.287417    
#> education         0.0981824  0.0228932   4.289  1.8e-05 ***
#> kidsyes          -0.4489872  0.1300252  -3.453 0.000554 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 1029.7  on 752  degrees of freedom
#> Residual deviance:  981.7  on 747  degrees of freedom
#> AIC: 993.7
#> 
#> Number of Fisher Scoring iterations: 4
#> 

## Example 22.8, Table 22.7, p. 786
library("sampleSelection")
gr_2step <- selection(participation ~ age + I(age^2) + fincome + education + kids, 
  wage ~ experience + I(experience^2) + education + city,
  data = PSID1976, method = "2step")
gr_ml <- selection(participation ~ age + I(age^2) + fincome + education + kids, 
  wage ~ experience + I(experience^2) + education + city,
  data = PSID1976, method = "ml")
gr_ols <- lm(wage ~ experience + I(experience^2) + education + city,
  data = PSID1976, subset = participation == "yes")
## NOTE: ML estimates agree with Greene, 5e errata. 
## Standard errors are based on the Hessian (here), while Greene has BHHH/OPG. 


#######################
## Wooldridge (2002) ##
#######################

## Table 15.1, p. 468
wl_lpm <- lm(partnum ~ nwincome + education + experience + I(experience^2) +
  age + youngkids + oldkids, data = PSID1976)
wl_logit <- glm(participation ~ nwincome + education + experience + I(experience^2) +
  age + youngkids + oldkids, family = binomial, data = PSID1976)
wl_probit <- glm(participation ~ nwincome + education + experience + I(experience^2) +
  age + youngkids + oldkids, family = binomial(link = "probit"), data = PSID1976)
## (same as Altman et al.)

## convenience functions
pseudoR2 <- function(obj) 1 - as.vector(logLik(obj)/logLik(update(obj, . ~ 1)))
misclass <- function(obj) 1 - sum(diag(prop.table(table(
  model.response(model.frame(obj)), round(fitted(obj))))))

coeftest(wl_logit)
#> 
#> z test of coefficients:
#> 
#>                   Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept)      0.4254524  0.8603645  0.4945  0.620951    
#> nwincome        -0.0213452  0.0084214 -2.5346  0.011256 *  
#> education        0.2211704  0.0434393  5.0915 3.553e-07 ***
#> experience       0.2058695  0.0320567  6.4220 1.345e-10 ***
#> I(experience^2) -0.0031541  0.0010161 -3.1041  0.001909 ** 
#> age             -0.0880244  0.0145729 -6.0403 1.538e-09 ***
#> youngkids       -1.4433541  0.2035828 -7.0898 1.343e-12 ***
#> oldkids          0.0601122  0.0747893  0.8038  0.421539    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
logLik(wl_logit)
#> 'log Lik.' -401.7652 (df=8)
misclass(wl_logit)
#> [1] 0.2642762
pseudoR2(wl_logit)
#> [1] 0.2196814

coeftest(wl_probit)
#> 
#> z test of coefficients:
#> 
#>                    Estimate  Std. Error z value  Pr(>|z|)    
#> (Intercept)      0.27007357  0.50807817  0.5316  0.595031    
#> nwincome        -0.01202364  0.00493917 -2.4343  0.014919 *  
#> education        0.13090397  0.02539873  5.1540 2.550e-07 ***
#> experience       0.12334717  0.01875869  6.5755 4.850e-11 ***
#> I(experience^2) -0.00188707  0.00059993 -3.1455  0.001658 ** 
#> age             -0.05285244  0.00846236 -6.2456 4.222e-10 ***
#> youngkids       -0.86832468  0.11837727 -7.3352 2.213e-13 ***
#> oldkids          0.03600561  0.04403026  0.8177  0.413502    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
logLik(wl_probit)
#> 'log Lik.' -401.3022 (df=8)
misclass(wl_probit)
#> [1] 0.2656042
pseudoR2(wl_probit)
#> [1] 0.2205805

## Table 16.2, p. 528
form <- hours ~ nwincome + education + experience + I(experience^2) + age + youngkids + oldkids 
wl_ols <- lm(form, data = PSID1976)
wl_tobit <- tobit(form, data = PSID1976)
summary(wl_ols)
#> 
#> Call:
#> lm(formula = form, data = PSID1976)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1511.3  -537.8  -146.9   538.1  3555.6 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     1330.4824   270.7846   4.913 1.10e-06 ***
#> nwincome          -3.4466     2.5440  -1.355   0.1759    
#> education         28.7611    12.9546   2.220   0.0267 *  
#> experience        65.6725     9.9630   6.592 8.23e-11 ***
#> I(experience^2)   -0.7005     0.3246  -2.158   0.0312 *  
#> age              -30.5116     4.3639  -6.992 6.04e-12 ***
#> youngkids       -442.0899    58.8466  -7.513 1.66e-13 ***
#> oldkids          -32.7792    23.1762  -1.414   0.1577    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 750.2 on 745 degrees of freedom
#> Multiple R-squared:  0.2656,	Adjusted R-squared:  0.2587 
#> F-statistic:  38.5 on 7 and 745 DF,  p-value: < 2.2e-16
#> 
summary(wl_tobit)
#> 
#> Call:
#> tobit(formula = hours ~ nwincome + education + experience + I(experience^2) + 
#>     age + youngkids + oldkids, data = PSID1976)
#> 
#> Observations:
#>          Total  Left-censored     Uncensored Right-censored 
#>            753            325            428              0 
#> 
#> Coefficients:
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      965.30528  446.43614   2.162 0.030599 *  
#> nwincome          -8.81424    4.45910  -1.977 0.048077 *  
#> education         80.64561   21.58324   3.736 0.000187 ***
#> experience       131.56430   17.27939   7.614 2.66e-14 ***
#> I(experience^2)   -1.86416    0.53766  -3.467 0.000526 ***
#> age              -54.40501    7.41850  -7.334 2.24e-13 ***
#> youngkids       -894.02174  111.87804  -7.991 1.34e-15 ***
#> oldkids          -16.21800   38.64139  -0.420 0.674701    
#> Log(scale)         7.02289    0.03706 189.514  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Scale: 1122 
#> 
#> Gaussian distribution
#> Number of Newton-Raphson Iterations: 4 
#> Log-likelihood: -3819 on 9 Df
#> Wald-statistic: 253.9 on 7 Df, p-value: < 2.22e-16 
#> 


#######################
## McCullough (2004) ##
#######################

## p. 203
mc_probit <- glm(participation ~ nwincome + education + experience + I(experience^2) +
  age + youngkids + oldkids, family = binomial(link = "probit"), data = PSID1976)
mc_tobit <- tobit(hours ~ nwincome + education + experience + I(experience^2) + age +
  youngkids + oldkids, data = PSID1976)
coeftest(mc_probit)
#> 
#> z test of coefficients:
#> 
#>                    Estimate  Std. Error z value  Pr(>|z|)    
#> (Intercept)      0.27007357  0.50807817  0.5316  0.595031    
#> nwincome        -0.01202364  0.00493917 -2.4343  0.014919 *  
#> education        0.13090397  0.02539873  5.1540 2.550e-07 ***
#> experience       0.12334717  0.01875869  6.5755 4.850e-11 ***
#> I(experience^2) -0.00188707  0.00059993 -3.1455  0.001658 ** 
#> age             -0.05285244  0.00846236 -6.2456 4.222e-10 ***
#> youngkids       -0.86832468  0.11837727 -7.3352 2.213e-13 ***
#> oldkids          0.03600561  0.04403026  0.8177  0.413502    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
coeftest(mc_tobit)
#> 
#> z test of coefficients:
#> 
#>                    Estimate  Std. Error  z value  Pr(>|z|)    
#> (Intercept)      965.305283  446.436144   2.1622 0.0305991 *  
#> nwincome          -8.814243    4.459100  -1.9767 0.0480771 *  
#> education         80.645606   21.583237   3.7365 0.0001866 ***
#> experience       131.564299   17.279392   7.6139 2.659e-14 ***
#> I(experience^2)   -1.864158    0.537662  -3.4672 0.0005260 ***
#> age              -54.405011    7.418502  -7.3337 2.239e-13 ***
#> youngkids       -894.021739  111.878035  -7.9910 1.338e-15 ***
#> oldkids          -16.217996   38.641391  -0.4197 0.6747008    
#> Log(scale)         7.022887    0.037057 189.5142 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
coeftest(mc_tobit, vcov = vcovOPG)
#> 
#> z test of coefficients:
#> 
#>                    Estimate  Std. Error  z value  Pr(>|z|)    
#> (Intercept)      965.305283  449.286601   2.1485 0.0316718 *  
#> nwincome          -8.814243    4.416137  -1.9959 0.0459429 *  
#> education         80.645606   21.683531   3.7192 0.0001998 ***
#> experience       131.564299   16.283950   8.0794 6.509e-16 ***
#> I(experience^2)   -1.864158    0.506061  -3.6837 0.0002299 ***
#> age              -54.405011    7.809651  -6.9664 3.252e-12 ***
#> youngkids       -894.021739  112.257814  -7.9640 1.666e-15 ***
#> oldkids          -16.217996   38.742552  -0.4186 0.6755016    
#> Log(scale)         7.022887    0.037274 188.4131 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>