Skip to contents

Income distribution (percentages) in the Northeast US in 1960 and 1970 adopted from McCullagh (1980).

Usage

income

Format

year

year.

pct

percentage of population in income class per year.

income

income groups. The unit is thousands of constant (1973) US dollars.

Source

Data are adopted from McCullagh (1980).

References

McCullagh, P. (1980) Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 42, No. 2., pp. 109-142.

Examples


print(income)
#>    year  pct income
#> 1  1960  6.5    0-3
#> 2  1960  8.2    3-5
#> 3  1960 11.3    5-7
#> 4  1960 23.5   7-10
#> 5  1960 15.6  10-12
#> 6  1960 12.7  12-15
#> 7  1960 22.2    15+
#> 8  1970  4.3    0-3
#> 9  1970  6.0    3-5
#> 10 1970  7.7    5-7
#> 11 1970 13.2   7-10
#> 12 1970 10.5  10-12
#> 13 1970 16.3  12-15
#> 14 1970 42.1    15+

## Convenient table:
(tab <- xtabs(pct ~ year + income, income))
#>       income
#> year    0-3  3-5  5-7 7-10 10-12 12-15  15+
#>   1960  6.5  8.2 11.3 23.5  15.6  12.7 22.2
#>   1970  4.3  6.0  7.7 13.2  10.5  16.3 42.1

## small rounding error in 1970:
rowSums(tab)
#>  1960  1970 
#> 100.0 100.1 

## compare link functions via the log-likelihood:
links <- c("logit", "probit", "cloglog", "loglog", "cauchit")
sapply(links, function(link) {
  clm(income ~ year, data=income, weights=pct, link=link)$logLik })
#>     logit    probit   cloglog    loglog   cauchit 
#> -353.3589 -353.8036 -352.8980 -355.6028 -352.8434 
## a heavy tailed (cauchy) or left skew (cloglog) latent distribution
## is fitting best.

## The data are defined as:
income.levels <- c(0, 3, 5, 7, 10, 12, 15)
income <- paste(income.levels, c(rep("-", 6), "+"),
                c(income.levels[-1], ""), sep = "")
income <-
  data.frame(year=factor(rep(c("1960", "1970"), each = 7)),
             pct = c(6.5, 8.2, 11.3, 23.5, 15.6, 12.7, 22.2,
               4.3, 6, 7.7, 13.2, 10.5, 16.3, 42.1),
             income=factor(rep(income, 2), ordered=TRUE,
               levels=income))