Statistics by Categories
bystats.RdFor any number of cross-classification variables, bystats
returns a matrix with the sample size, number missing y, and
fun(non-missing y), with the cross-classifications designated
by rows. Uses Harrell's modification of the interaction
function to produce cross-classifications. The default fun is
mean, and if y is binary, the mean is labeled as
Fraction. There is a print method as well as a
latex method for objects created by bystats.
bystats2 handles the special case in which there are 2
classifcation variables, and places the first one in rows and the
second in columns. The print method for bystats2 uses
the print.char.matrix function to organize statistics
for cells into boxes.
Usage
bystats(y, ..., fun, nmiss, subset)
# S3 method for class 'bystats'
print(x, ...)
# S3 method for class 'bystats'
latex(object, title, caption, rowlabel, ...)
bystats2(y, v, h, fun, nmiss, subset)
# S3 method for class 'bystats2'
print(x, abbreviate.dimnames=FALSE,
prefix.width=max(nchar(dimnames(x)[[1]])), ...)
# S3 method for class 'bystats2'
latex(object, title, caption, rowlabel, ...)Arguments
- y
a binary, logical, or continuous variable or a matrix or data frame of such variables. If
yis a data frame it is converted to a matrix. Ifyis a data frame or matrix, computations are done on subsets of the rows ofy, and you should specifyfunso as to be able to operate on the matrix. For matrixy, any column with a missing value causes the entire row to be considered missing, and the row is not passed tofun.- ...
For
bystats, one or more classifcation variables separated by commas. Forprint.bystats, options passed toprint.defaultsuch asdigits. Forlatex.bystats, andlatex.bystats2, options passed tolatex.defaultsuch asdigits. If you passcdectolatex.default, keep in mind that the first one or two positions (depending onnmiss) should have zeros since these correspond with frequency counts.- v
vertical variable for
bystats2. Will be converted tofactor.- h
horizontal variable for
bystats2. Will be converted tofactor.- fun
a function to compute on the non-missing
yfor a given subset. You must specifyfun=in front of the function name or definition.funmay return a single number or a vector or matrix of any length. Matrix results are rolled out into a vector, with names preserved. Whenyis a matrix, a commonfunisfunction(y) apply(y, 2, ff)whereffis the name of a function which operates on one column ofy.- nmiss
A column containing a count of missing values is included if
nmiss=TRUEor if there is at least one missing value.- subset
a vector of subscripts or logical values indicating the subset of data to analyze
- abbreviate.dimnames
set to
TRUEto abbreviatedimnamesin output- prefix.width
- title
titleto pass tolatex.default. Default is the first word of the character string version of the first calling argument.- caption
caption to pass to
latex.default. Default is theheadingattribute from the object produced bybystats.- rowlabel
rowlabelto pass tolatex.default. Default is thebyvarnamesattribute from the object produced bybystats. Forbystats2the default is"".- x
an object created by
bystatsorbystats2- object
an object created by
bystatsorbystats2
Value
for bystats, a matrix with row names equal to the classification labels and column
names N, Missing, funlab, where funlab is determined from fun.
A row is added to the end with the summary statistics computed
on all observations combined. The class of this matrix is bystats.
For bystats, returns a 3-dimensional array with the last dimension
corresponding to statistics being computed. The class of the array is
bystats2.
Author
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
Examples
if (FALSE) { # \dontrun{
bystats(sex==2, county, city)
bystats(death, race)
bystats(death, cut2(age,g=5), race)
bystats(cholesterol, cut2(age,g=4), sex, fun=median)
bystats(cholesterol, sex, fun=quantile)
bystats(cholesterol, sex, fun=function(x)c(Mean=mean(x),Median=median(x)))
latex(bystats(death,race,nmiss=FALSE,subset=sex=="female"), digits=2)
f <- function(y) c(Hazard=sum(y[,2])/sum(y[,1]))
# f() gets the hazard estimate for right-censored data from exponential dist.
bystats(cbind(d.time, death), race, sex, fun=f)
bystats(cbind(pressure, cholesterol), age.decile,
fun=function(y) c(Median.pressure =median(y[,1]),
Median.cholesterol=median(y[,2])))
y <- cbind(pressure, cholesterol)
bystats(y, age.decile,
fun=function(y) apply(y, 2, median)) # same result as last one
bystats(y, age.decile, fun=function(y) apply(y, 2, quantile, c(.25,.75)))
# The last one computes separately the 0.25 and 0.75 quantiles of 2 vars.
latex(bystats2(death, race, sex, fun=table))
} # }