An adaptation base R's by function, designed to
optimize the results' display.
Arguments
- data
an R object, normally a data frame, possibly a matrix.
- INDICES
a grouping variable or a list of grouping variables, each of length
nrow(data).- FUN
a function to be applied to (usually data-frame) subsets of data.
- ...
Further arguments to FUN.
- useNA
Make NA a valid grouping value in INDICES variable(s). Set to
FALSEexplicitly to eliminate message.
Details
When the grouping variable(s) contain NA values, the
base::by function (as well as summarytools
versions prior to 1.1.0) ignores corresponding groups. Version 1.1.0
allows setting useNA = TRUE to make new groups using
NA values on the grouping variable(s), just as
dplyr::group_by does.
When NA values are detected and useNA = FALSE, a message is
displayed; to disable this message, set check.nas = FALSE.
Examples
data("tobacco")
with(tobacco, stby(data = BMI, INDICES = gender, FUN = descr,
check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics
#> BMI by gender
#> Data Frame: tobacco
#> N: 978
#>
#> F M
#> ----------------- -------- --------
#> Mean 26.10 25.31
#> Std.Dev 4.95 3.98
#> Min 9.01 8.83
#> Q1 22.98 22.52
#> Median 25.87 25.14
#> Q3 29.48 27.96
#> Max 39.44 36.76
#> MAD 4.75 4.02
#> IQR 6.49 5.44
#> CV 0.19 0.16
#> Skewness -0.02 -0.04
#> SE.Skewness 0.11 0.11
#> Kurtosis 0.09 0.17
#> N.Valid 475.00 477.00
#> N 489.00 489.00
#> Pct.Valid 97.14 97.55
with(tobacco, stby(data = smoker, INDICES = gender, freq, useNA = TRUE))
#> Frequencies
#> tobacco$smoker
#> Type: Factor
#> Group: gender = F
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 147 30.06 30.06 30.06 30.06
#> No 342 69.94 100.00 69.94 100.00
#> <NA> 0 0.00 100.00
#> Total 489 100.00 100.00 100.00 100.00
#>
#> Group: gender = M
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 143 29.24 29.24 29.24 29.24
#> No 346 70.76 100.00 70.76 100.00
#> <NA> 0 0.00 100.00
#> Total 489 100.00 100.00 100.00 100.00
#>
#> Group: gender = NA
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 8 36.36 36.36 36.36 36.36
#> No 14 63.64 100.00 63.64 100.00
#> <NA> 0 0.00 100.00
#> Total 22 100.00 100.00 100.00 100.00
with(tobacco, stby(data = list(x = smoker, y = diseased),
INDICES = gender, FUN = ctable, useNA = TRUE))
#> Cross-Tabulation, Row Proportions
#> smoker * diseased
#> Data Frame: tobacco
#> Group: gender = F
#>
#> -------- ---------- ------------- ------------- --------------
#> diseased Yes No Total
#> smoker
#> Yes 62 (42.2%) 85 (57.8%) 147 (100.0%)
#> No 49 (14.3%) 293 (85.7%) 342 (100.0%)
#> Total 111 (22.7%) 378 (77.3%) 489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#>
#> Group: gender = M
#>
#> -------- ---------- ------------- ------------- --------------
#> diseased Yes No Total
#> smoker
#> Yes 63 (44.1%) 80 (55.9%) 143 (100.0%)
#> No 47 (13.6%) 299 (86.4%) 346 (100.0%)
#> Total 110 (22.5%) 379 (77.5%) 489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#>
#> Group: gender = NA
#>
#> -------- ---------- ----------- ------------- -------------
#> diseased Yes No Total
#> smoker
#> Yes 0 ( 0.0%) 8 (100.0%) 8 (100.0%)
#> No 3 (21.4%) 11 ( 78.6%) 14 (100.0%)
#> Total 3 (13.6%) 19 ( 86.4%) 22 (100.0%)
#> -------- ---------- ----------- ------------- -------------