Survey statistics on subsets
svyby.RdCompute survey statistics on subsets of a survey defined by factors.
Usage
svyby(formula, by ,design,...)
# Default S3 method
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
# S3 method for class 'survey.design2'
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
drop.empty.groups=TRUE, covmat=FALSE, influence=covmat,
na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
# S3 method for class 'svyby'
SE(object,...)
# S3 method for class 'svyby'
deff(object,...)
# S3 method for class 'svyby'
coef(object,...)
# S3 method for class 'svyby'
confint(object, parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula, bys, design, FUN, ...)Arguments
- formula,x
A formula specifying the variables to pass to
FUN(or a matrix, data frame, or vector)- by
A formula specifying factors that define subsets, or a list of factors.
- design
A
svydesignorsvrepdesignobject- FUN
A function taking a formula and survey design object as its first two arguments and returning an object with suitable
coefandSEorvcovorconfintmethods- ...
Other arguments to
FUN. NOTE: if any of the names of these are partial matches toformula,by, ordesign, you must specify theformula,by, ordesignargument by name, not just by position.- deff
Request a design effect from
FUN- keep.var
If
FUNreturns asvystatobject, extract standard errors from it- keep.names
Define row names based on the subsets
- verbose
If
TRUE, print a label for each subset as it is processed.- vartype
Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance
- drop.empty.groups
If
FALSE, reportNAfor empty groups, ifTRUEdrop them from the output- na.rm.by
If true, omit groups defined by
NAvalues of thebyvariables
.
- na.rm.all
If true, check for groups with no non-missing observations for variables defined by
formulaand treat these groups as empty. Doesn't make much sense withoutna.rm=TRUE- covmat
If
TRUE, compute covariances between estimates for different subsets. Allowssvycontrastto be used on output. Requires thatFUNsupports eitherreturn.replicates=TRUEorinfluence=TRUE- return.replicates
Only for replicate-weight designs. If
TRUE, return all the replicates as the "replicates" attribute of the result- influence
Return the influence functions of the result
- multicore
Use
multicorepackage to distribute subsets over multiple processors?- stringsAsFactors
Convert any string variables in
formulato factors before callingFUN, so that the factor levels will be the same in all groups (See Note below). Potentially slow.- parm
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
- level
the confidence level required.
- df
degrees of freedom for t-distribution in confidence interval, use
degf(design)for number of PSUs minus number of strata- object
An object of class
"svyby"- bys
one-sided formula with each term specifying a grouping (rather than being combined to give a grouping
Value
An object of class "svyby": a data frame showing the factors and the results of FUN.
For unwtd.count, the unweighted number of non-missing observations in the data matrix specified by x for the design.
Details
The variance type "ci" asks for confidence intervals, which are produced
by confint. In some cases additional options to FUN will
be needed to produce confidence intervals, for example,
svyquantile needs ci=TRUE or keep.var=FALSE.
The results are extracted by calling coef, SE, vcov, and
confint on the returned objects, so these need to be
defined. The intent is for FUN to return a svystat or
svrepstat object, but that isn't required.
unwtd.count is designed to be passed to svyby to report
the number of non-missing observations in each subset. Observations
with exactly zero weight will also be counted as missing, since that's
how subsets are implemented for some designs.
Parallel processing with multicore=TRUE is useful only for
fairly large problems and on computers with sufficient
memory. Multicore processing is incompatible with some GUIs.
The variant svybys creates a separate table for each term in
bys rather than creating a joint table.
Note
The function works by making a lot of calls of the form
FUN(formula, subset(design, by==i)), where formula is
re-evaluated in each subset, so it is unwise to use data-dependent
terms in formula. In particular, svyby(~factor(a), ~b,
design=d, svymean), will create factor variables whose levels are
only those values of a present in each subset. If a
is a character variable then svyby(~a, ~b,
design=d, svymean) creates factor variables implicitly and so has
the same problem. Either use
update.survey.design to add variables to the design
object instead or specify the levels explicitly in the call to
factor. The stringsAsFactors=TRUE option converts
all character variables to factors, which can be slow, set it to
FALSE if you have predefined factors where necessary.
Note
Asking for a design effect (deff=TRUE) from a function
that does not produce one will cause an error or incorrect formatting
of the output. The same will occur with keep.var=TRUE if the
function does not compute a standard error.
See also
svytable and ftable.svystat for
contingency tables, ftable.svyby for pretty-printing of svyby
Examples
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svyby(~api99, ~stype, dclus1, svymean)
#> stype api99 se
#> E E 607.7917 22.81660
#> H H 595.7143 41.76400
#> M M 608.6000 32.56064
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
#> stype api99 ci_l ci_u
#> E E 615 520 684
#> H H 593 414 715
#> M M 615 575 700
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
#> stype 1 2 3 4
#> E E 615 520 684 38.23224
#> H H 593 414 715 63.64648
#> M M 615 575 700 28.39638
svyby(~api99, list(school.type=apiclus1$stype), dclus1, svymean)
#> school.type api99 se
#> E E 607.7917 22.81660
#> H H 595.7143 41.76400
#> M M 608.6000 32.56064
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
#> stype api99 api00 ci_l.api99 ci_l.api00 ci_u.api99 ci_u.api00
#> E E 607.7917 648.8681 563.0720 605.0385 652.5114 692.6976
#> H H 595.7143 618.5714 513.8583 544.0531 677.5702 693.0897
#> M M 608.6000 631.4400 544.7823 569.4866 672.4177 693.3934
#> DEff.api99 DEff.api00
#> E 5.895734 6.583674
#> H 2.211866 2.228259
#> M 2.226990 2.163900
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
#> stype sch.wide statistic.api99 statistic.api00
#> E.No E No 601.6667 596.3333
#> H.No H No 662.0000 659.3333
#> M.No M No 611.3750 606.3750
#> E.Yes E Yes 608.3485 653.6439
#> H.Yes H Yes 577.6364 607.4545
#> M.Yes M Yes 607.2941 643.2353
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)
#> stype sch.wide statistic
#> E.No E No 12
#> H.No H No 3
#> M.No M No 8
#> E.Yes E Yes 132
#> H.Yes H Yes 11
#> M.Yes M Yes 17
rclus1<-as.svrepdesign(dclus1)
svyby(~api99, ~stype, rclus1, svymean)
#> stype api99 se
#> E E 607.7917 25.83542
#> H H 595.7143 50.75106
#> M M 608.6000 34.82521
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
#> stype api99 se.api99
#> E E 615 41.96221
#> H H 593 NaN
#> M M 615 45.88854
svyby(~api99, list(school.type=apiclus1$stype), rclus1, svymean, vartype="cv")
#> school.type api99 cv.api99
#> E E 607.7917 0.04250704
#> H H 595.7143 0.08519362
#> M M 608.6000 0.05722184
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
#> stype enroll se DEff.enroll
#> E E 2109717.1 631349.4 125.039075
#> H H 535594.9 226716.6 4.645816
#> M M 759628.1 213635.5 13.014932
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
#> stype sch.wide statistic.api99 statistic.api00
#> E.No E No 601.6667 596.3333
#> H.No H No 662.0000 659.3333
#> M.No M No 611.3750 606.3750
#> E.Yes E Yes 608.3485 653.6439
#> H.Yes H Yes 577.6364 607.4545
#> M.Yes M Yes 607.2941 643.2353
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)
#> stype sch.wide statistic
#> E.No E No 12
#> H.No H No 3
#> M.No M No 8
#> E.Yes E Yes 132
#> H.Yes H Yes 11
#> M.Yes M Yes 17
## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
#> E H M
#> E 667.4691 752.7184 823.3275
#> H 752.7184 2575.6700 920.7676
#> M 823.3275 920.7676 1212.7954
svycontrast(mns, c(E = 1, M = -1))
#> contrast SE
#> contrast -0.80833 15.284
str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))
#> Classes ‘svyby’ and 'data.frame': 3 obs. of 3 variables:
#> $ stype: Factor w/ 3 levels "E","H","M": 1 2 3
#> $ api99: num 608 596 609
#> $ se : num 25.8 50.8 34.8
#> - attr(*, "svyby")=List of 7
#> ..$ margins : int 1
#> ..$ nstats : num 1
#> ..$ vars : int 1
#> ..$ deffs : logi FALSE
#> ..$ statistic: chr "svymean"
#> ..$ variables: chr "api99"
#> ..$ vartype : chr "se"
#> - attr(*, "replicates")= num [1:15, 1:3] 607 611 609 606 609 ...
#> ..- attr(*, "scale")= num 0.915
#> ..- attr(*, "rscales")= num [1:15] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "mse")= logi FALSE
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "E" "H" "M"
#> - attr(*, "call")= language svyby.default(~api99, ~stype, rclus1, svymean, return.replicates = TRUE)
tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
#> E H M
#> E 398602047550 77170909363 74761561157
#> H 77170909363 51400414315 34777311300
#> M 74761561157 34777311300 45640120138
svycontrast(tots, quote(E/H))
#> nlcon SE
#> contrast 3.939 1.4319
## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
#> nlcon SE
#> contrast 0.52377 0.262
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
#> nlcon SE
#> contrast 0.52377 0.2423
## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE,
vartype=c("se","cv","cvpct","var")))
#> [1] "E"
#> [1] "H"
#> [1] "M"
#> stype enroll se cv.enroll cv%.enroll var DEff.enroll
#> E E 2109717.1 631349.4 0.2992578 29.92578 398602047550 125.039075
#> H H 535594.9 226716.6 0.4232987 42.32987 51400414315 4.645816
#> M M 759628.1 213635.5 0.2812369 28.12369 45640120138 13.014932
deff(a)
#> [1] 125.039075 4.645816 13.014932
SE(a)
#> [1] 631349.4 226716.6 213635.5
cv(a)
#> E H M
#> 0.2992578 0.4232987 0.2812369
coef(a)
#> E H M
#> 2109717.1 535594.9 759628.1
confint(a, df=degf(rclus1))
#> 2.5 % 97.5 %
#> E 755607.37 3463827
#> H 49336.14 1021854
#> M 301425.60 1217831
## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)
#> stype api.stu/enroll se.api.stu/enroll
#> E E 0.8532672 0.01253361
#> H H 0.8300683 0.01472607
#> M M 0.8536738 0.01114203
ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)
#> E H M
#> E 0.0001570913 -1.178219e-04 1.186882e-04
#> H -0.0001178219 2.168572e-04 -9.293321e-05
#> M 0.0001186882 -9.293321e-05 1.241448e-04
## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
#> comp.imp sch.wide api00 se
#> No.No No No 608.0435 28.98769
#> No.Yes No Yes 654.0741 32.66871
#> Yes.Yes Yes Yes 648.4060 22.47502
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)
#> comp.imp sch.wide api00 se
#> No.No No No 608.0435 28.98769
#> Yes.No Yes No NA NA
#> No.Yes No Yes 654.0741 32.66871
#> Yes.Yes Yes Yes 648.4060 22.47502
## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
#> [[1]]
#> comp.imp api00 se
#> No No 632.900 29.41719
#> Yes Yes 648.406 22.47502
#>
#> [[2]]
#> sch.wide api00 se
#> No No 608.0435 28.98769
#> Yes Yes 649.3625 23.42657
#>