Multiphase sampling designs
multiphase.RdThese objects represent designs with arbitrarily many nested phases of sampling, allowing estimation and calibration/raking at each phase
Arguments
- ids
List of as many model formulas as phases describing ids for each phase. Each formula may indicate multistage sampling
- subset
list of model formulas for each phase except the first, specifying a logical vector of which observations from the previous phase are included
- strata
List of as many model formulas as phases describing strata for each phase. Each formula may indicate multistage sampling, or
NULLfor no strata- probs
List of as many model formulas or
pps_specobjects as phases describing sampling probabilities for each phase. Each formula may indicate multistage sampling. Typically will either beNULLexcept for phase 1 ifstrataare specified, or a matrix of classpps_specspecifying pairwise probabilities or covariances. Use~1at phase 1 to specify iid sampling from a generating model.- data
data frame of data
- fpc
Finite population correction for the first phase, if needed
- check.variable.phase
Work out which phase each variable is observed in by looking at missing value patterns. You may want
FALSEfor simulations where the values aren't actually missing
Details
Variance calculation uses a decomposition with sampling contributions
at each stage, which are returned as the phases attribution of
a variance-covariance matrix. The computations broadly follow the
description for two-phase sampling in chapter 9 of Sarndal et al
(1991); there is more detail in the vignette
See also
twophase for older implementations of two-phase sampling
vignette("multiphase") for computational details
Examples
data(nwtco)
dcchs<-twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
subset = ~I(in.subcohort | rel), data = nwtco)
mcchs<-multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL),
data = nwtco)
#> Warning: all variables measured at all phases
dcchs
#> Two-phase sparse-matrix design:
#> twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
#> subset = ~I(in.subcohort | rel), data = nwtco)
#> Phase 1:
#> Independent Sampling design (with replacement)
#> svydesign(ids = ~seqno)
#> Phase 2:
#> Stratified Independent Sampling design
#> svydesign(ids = ~seqno, strata = ~rel, fpc = `*phase1*`)
mcchs
#> Multiphase (2-phase) sampling design
#> Sampled: 4028 1154
#> Call: multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
#> subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL),
#> data = nwtco)
svymean(~edrel, dcchs)
#> mean SE
#> edrel 2360.8 57.867
svymean(~edrel, mcchs)
#> mean SE
#> edrel 2360.8 57.867
summary(svyglm(edrel~rel+histol+stage, design=dcchs))
#>
#> Call:
#> svyglm(formula = edrel ~ rel + histol + stage, design = dcchs)
#>
#> Survey design:
#> twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
#> subset = ~I(in.subcohort | rel), data = nwtco)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3041.53 196.67 15.465 <2e-16 ***
#> rel -2211.51 78.18 -28.288 <2e-16 ***
#> histol -206.35 146.98 -1.404 0.161
#> stage -65.30 56.03 -1.165 0.244
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for gaussian family taken to be 2172020)
#>
#> Number of Fisher Scoring iterations: 2
#>
summary(svyglm(edrel~rel+histol+stage, design=mcchs))
#>
#> Call:
#> svyglm(formula = edrel ~ rel + histol + stage, design = mcchs)
#>
#> Survey design:
#> multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel),
#> subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL),
#> data = nwtco)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3041.53 196.67 15.465 <2e-16 ***
#> rel -2211.51 78.18 -28.288 <2e-16 ***
#> histol -206.35 146.98 -1.404 0.161
#> stage -65.30 56.03 -1.165 0.244
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for gaussian family taken to be 2172020)
#>
#> Number of Fisher Scoring iterations: 2
#>
m<-calibrate(mcchs,~factor(stage)+rel, phase=2, calfun="raking")
vcov(svytotal(~factor(stage), m))
#> factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1 1.569770e+03 6.855740e-29 1.678685e-28 -8.642565e-29
#> factor(stage)2 6.855740e-29 1.025803e+03 6.747591e-30 -1.007366e-29
#> factor(stage)3 1.678685e-28 6.747591e-30 9.932950e+02 -1.786171e-29
#> factor(stage)4 -8.642565e-29 -1.007366e-29 -1.786171e-29 4.391321e+02
#> attr(,"phases")
#> attr(,"phases")[[1]]
#> factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1 1569.77 0.000 0.000 0.0000
#> factor(stage)2 0.00 1025.803 0.000 0.0000
#> factor(stage)3 0.00 0.000 993.295 0.0000
#> factor(stage)4 0.00 0.000 0.000 439.1321
#>
#> attr(,"phases")[[2]]
#> factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1 1.512119e-27 6.855740e-29 1.678685e-28 -8.642565e-29
#> factor(stage)2 6.855740e-29 1.688577e-29 6.747591e-30 -1.007366e-29
#> factor(stage)3 1.678685e-28 6.747591e-30 1.975176e-29 -1.786171e-29
#> factor(stage)4 -8.642565e-29 -1.007366e-29 -1.786171e-29 1.009380e-28
#>