Income distribution (percentages) in the Northeast US
income.RdIncome distribution (percentages) in the Northeast US in 1960 and 1970 adopted from McCullagh (1980).
Format
yearyear.
pctpercentage of population in income class per year.
incomeincome groups. The unit is thousands of constant (1973) US dollars.
References
McCullagh, P. (1980) Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 42, No. 2., pp. 109-142.
Examples
print(income)
#> year pct income
#> 1 1960 6.5 0-3
#> 2 1960 8.2 3-5
#> 3 1960 11.3 5-7
#> 4 1960 23.5 7-10
#> 5 1960 15.6 10-12
#> 6 1960 12.7 12-15
#> 7 1960 22.2 15+
#> 8 1970 4.3 0-3
#> 9 1970 6.0 3-5
#> 10 1970 7.7 5-7
#> 11 1970 13.2 7-10
#> 12 1970 10.5 10-12
#> 13 1970 16.3 12-15
#> 14 1970 42.1 15+
## Convenient table:
(tab <- xtabs(pct ~ year + income, income))
#> income
#> year 0-3 3-5 5-7 7-10 10-12 12-15 15+
#> 1960 6.5 8.2 11.3 23.5 15.6 12.7 22.2
#> 1970 4.3 6.0 7.7 13.2 10.5 16.3 42.1
## small rounding error in 1970:
rowSums(tab)
#> 1960 1970
#> 100.0 100.1
## compare link functions via the log-likelihood:
links <- c("logit", "probit", "cloglog", "loglog", "cauchit")
sapply(links, function(link) {
clm(income ~ year, data=income, weights=pct, link=link)$logLik })
#> logit probit cloglog loglog cauchit
#> -353.3589 -353.8036 -352.8980 -355.6028 -352.8434
## a heavy tailed (cauchy) or left skew (cloglog) latent distribution
## is fitting best.
## The data are defined as:
income.levels <- c(0, 3, 5, 7, 10, 12, 15)
income <- paste(income.levels, c(rep("-", 6), "+"),
c(income.levels[-1], ""), sep = "")
income <-
data.frame(year=factor(rep(c("1960", "1970"), each = 7)),
pct = c(6.5, 8.2, 11.3, 23.5, 15.6, 12.7, 22.2,
4.3, 6, 7.7, 13.2, 10.5, 16.3, 42.1),
income=factor(rep(income, 2), ordered=TRUE,
levels=income))