Skip to contents

These objects represent designs with arbitrarily many nested phases of sampling, allowing estimation and calibration/raking at each phase

Usage

multiphase(ids, subset, strata, probs, data, fpc = NULL,
check.variable.phase=TRUE)

Arguments

ids

List of as many model formulas as phases describing ids for each phase. Each formula may indicate multistage sampling

subset

list of model formulas for each phase except the first, specifying a logical vector of which observations from the previous phase are included

strata

List of as many model formulas as phases describing strata for each phase. Each formula may indicate multistage sampling, or NULL for no strata

probs

List of as many model formulas or pps_spec objects as phases describing sampling probabilities for each phase. Each formula may indicate multistage sampling. Typically will either be NULL except for phase 1 if strata are specified, or a matrix of class pps_spec specifying pairwise probabilities or covariances. Use ~1 at phase 1 to specify iid sampling from a generating model.

data

data frame of data

fpc

Finite population correction for the first phase, if needed

check.variable.phase

Work out which phase each variable is observed in by looking at missing value patterns. You may want FALSE for simulations where the values aren't actually missing

Details

Variance calculation uses a decomposition with sampling contributions at each stage, which are returned as the phases attribution of a variance-covariance matrix. The computations broadly follow the description for two-phase sampling in chapter 9 of Sarndal et al (1991); there is more detail in the vignette

Value

Object of class multiphase

References

Sarndal, Swensson, and Wretman (1991) "Model Assisted Survey Sampling" (Chapter 9)

Note

There are currently methods for svytotal, svymean, svyglm, svyvar.

See also

twophase for older implementations of two-phase sampling

vignette("multiphase") for computational details

Examples

data(nwtco)
dcchs<-twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
    subset = ~I(in.subcohort | rel), data = nwtco)
mcchs<-multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
    subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL), 
    data = nwtco)
#> Warning: all variables measured at all phases
dcchs
#> Two-phase sparse-matrix design:
#>  twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
#>     subset = ~I(in.subcohort | rel), data = nwtco)
#> Phase 1:
#> Independent Sampling design (with replacement)
#> svydesign(ids = ~seqno)
#> Phase 2:
#> Stratified Independent Sampling design
#> svydesign(ids = ~seqno, strata = ~rel, fpc = `*phase1*`)
mcchs
#> Multiphase (2-phase) sampling design
#> Sampled: 4028 1154
#> Call: multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
#>     subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL), 
#>     data = nwtco)
svymean(~edrel, dcchs)
#>         mean     SE
#> edrel 2360.8 57.867
svymean(~edrel, mcchs)
#>         mean     SE
#> edrel 2360.8 57.867

summary(svyglm(edrel~rel+histol+stage, design=dcchs))
#> 
#> Call:
#> svyglm(formula = edrel ~ rel + histol + stage, design = dcchs)
#> 
#> Survey design:
#> twophase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
#>     subset = ~I(in.subcohort | rel), data = nwtco)
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  3041.53     196.67  15.465   <2e-16 ***
#> rel         -2211.51      78.18 -28.288   <2e-16 ***
#> histol       -206.35     146.98  -1.404    0.161    
#> stage         -65.30      56.03  -1.165    0.244    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for gaussian family taken to be 2172020)
#> 
#> Number of Fisher Scoring iterations: 2
#> 
summary(svyglm(edrel~rel+histol+stage, design=mcchs))
#> 
#> Call:
#> svyglm(formula = edrel ~ rel + histol + stage, design = mcchs)
#> 
#> Survey design:
#> multiphase(id = list(~seqno, ~seqno), strata = list(NULL, ~rel), 
#>     subset = list(~I(in.subcohort | rel)), probs = list(~1, NULL), 
#>     data = nwtco)
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  3041.53     196.67  15.465   <2e-16 ***
#> rel         -2211.51      78.18 -28.288   <2e-16 ***
#> histol       -206.35     146.98  -1.404    0.161    
#> stage         -65.30      56.03  -1.165    0.244    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for gaussian family taken to be 2172020)
#> 
#> Number of Fisher Scoring iterations: 2
#> 

m<-calibrate(mcchs,~factor(stage)+rel, phase=2, calfun="raking")
vcov(svytotal(~factor(stage), m))
#>                factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1   1.569770e+03   6.855740e-29   1.678685e-28  -8.642565e-29
#> factor(stage)2   6.855740e-29   1.025803e+03   6.747591e-30  -1.007366e-29
#> factor(stage)3   1.678685e-28   6.747591e-30   9.932950e+02  -1.786171e-29
#> factor(stage)4  -8.642565e-29  -1.007366e-29  -1.786171e-29   4.391321e+02
#> attr(,"phases")
#> attr(,"phases")[[1]]
#>                factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1        1569.77          0.000          0.000         0.0000
#> factor(stage)2           0.00       1025.803          0.000         0.0000
#> factor(stage)3           0.00          0.000        993.295         0.0000
#> factor(stage)4           0.00          0.000          0.000       439.1321
#> 
#> attr(,"phases")[[2]]
#>                factor(stage)1 factor(stage)2 factor(stage)3 factor(stage)4
#> factor(stage)1   1.512119e-27   6.855740e-29   1.678685e-28  -8.642565e-29
#> factor(stage)2   6.855740e-29   1.688577e-29   6.747591e-30  -1.007366e-29
#> factor(stage)3   1.678685e-28   6.747591e-30   1.975176e-29  -1.786171e-29
#> factor(stage)4  -8.642565e-29  -1.007366e-29  -1.786171e-29   1.009380e-28
#>