Direct standardization within domains
svystandardize.RdIn health surveys it is often of interest to standardize domains to have the same distribution of, eg, age as in a target population. The operation is similar to post-stratification, except that the totals for the domains are fixed at the current estimates, not at known population values. This function matches the estimates produced by the (US) National Center for Health Statistics.
Arguments
- design
survey design object
- by
A one-sided formula specifying the variables whose distribution will be standardised
- over
A one-sided formula specifying the domains within which the standardisation will occur, or
~1to use the whole population.- population
Desired population totals or proportions for the levels of combinations of variables in
by- excluding.missing
Optionally, a one-sided formula specifying variables whose missing values should be dropped before calculating the domain totals.
References
National Center for Health Statistics https://www.cdc.gov/nchs/tutorials/NHANES/NHANESAnalyses/agestandardization/age_standardization_intro.htm
Examples
## matches http://www.cdc.gov/nchs/data/databriefs/db92_fig1.png
data(nhanes)
popage <- c( 55901 , 77670 , 72816 , 45364 )
design<-svydesign(id=~SDMVPSU, strata=~SDMVSTRA, weights=~WTMEC2YR, data=nhanes, nest=TRUE)
stdes<-svystandardize(design, by=~agecat, over=~race+RIAGENDR,
population=popage, excluding.missing=~HI_CHOL)
svyby(~HI_CHOL, ~race+RIAGENDR, svymean, design=subset(stdes,
agecat!="(0,19]"))
#> race RIAGENDR HI_CHOL se
#> 1.1 1 1 0.1543786 0.008318204
#> 2.1 2 1 0.1142946 0.010182838
#> 3.1 3 1 0.1020776 0.013547678
#> 4.1 4 1 0.1358312 0.042274271
#> 1.2 1 2 0.1316436 0.013418637
#> 2.2 2 2 0.1543247 0.008932134
#> 3.2 3 2 0.1025411 0.018953586
#> 4.2 4 2 0.1197434 0.040091106
data(nhanes)
nhanes_design <- svydesign(ids = ~ SDMVPSU, strata = ~ SDMVSTRA,
weights = ~ WTMEC2YR, nest = TRUE, data = nhanes)
## These are the same
nhanes_adj <- svystandardize(update(nhanes_design, all_adults = "1"),
by = ~ agecat, over = ~ all_adults,
population = c(55901, 77670, 72816, 45364),
excluding.missing = ~ HI_CHOL)
svymean(~I(HI_CHOL == 1), nhanes_adj, na.rm = TRUE)
#> mean SE
#> I(HI_CHOL == 1)FALSE 0.89413 0.0053
#> I(HI_CHOL == 1)TRUE 0.10587 0.0053
nhanes_adj <- svystandardize(nhanes_design,
by = ~ agecat, over = ~ 1,
population = c(55901, 77670, 72816, 45364),
excluding.missing = ~ HI_CHOL)
svymean(~I(HI_CHOL == 1), nhanes_adj, na.rm = TRUE)
#> mean SE
#> I(HI_CHOL == 1)FALSE 0.89413 0.0053
#> I(HI_CHOL == 1)TRUE 0.10587 0.0053