Calculates mean, sd, min, Q1\*, median, Q3\*, max, MAD, IQR\*, CV, skewness\*, SE.skewness\*, and kurtosis\* on numerical vectors. (\*) Not available when using sampling weights.
descr(
x,
var = NULL,
stats = st_options("descr.stats"),
na.rm = TRUE,
round.digits = st_options("round.digits"),
transpose = st_options("descr.transpose"),
order = "sort",
style = st_options("style"),
plain.ascii = st_options("plain.ascii"),
justify = "r",
headings = st_options("headings"),
display.labels = st_options("display.labels"),
split.tables = 100,
weights = NULL,
rescale.weights = FALSE,
...
)A numerical vector or a data frame.
Unquoted expression referring to a specific column in x.
Provides support for piped function calls (e.g.
my_df |> descr(my_var).
Character. Which stats to produce. Either “all” (default),
“fivenum”, “common” (see Details), or a selection of :
“mean”, “sd”, “min”, “q1”, “med”,
“q3”, “max”, “mad”, “iqr”, “cv”,
“skewness”, “se.skewness”, “kurtosis”,
“n.valid”, “n”, and “pct.valid”. Can be set globally
via st_options, option “descr.stats”. See
Details.
Logical. Argument to be passed to statistical functions.
Defaults to TRUE.
Numeric. Number of significant digits to display.
Defaults to 2. Can be set globally with st_options.
Logical. Make variables appears as columns, and stats as
rows. Defaults to FALSE. Can be set globally with
st_options, option “descr.transpose”.
Character. When analyzing more than one variable, this parameter determines how to order variables. Valid values are “sort” (or simply “s”), “preserve” (or “p”), or a vector containing all variable names in the desired order. Defaults to “sort”.
Character. Style to be used by pander. One
of “simple” (default), “grid”, “rmarkdown”, or
“jira”. Can be set globally with st_options.
Logical. pander argument; when
TRUE (default), no markup characters will be used (useful when
printing to console). If style = 'rmarkdown' is specified, value
is set to FALSE automatically. Can be set globally using
st_options.
Character. Alignment of numbers in cells; “l” for left, “c” for center, or “r” for right (default). Has no effect on html tables.
Logical. Set to FALSE to omit heading section. Can be
set globally via st_options. TRUE by default.
Logical. Show variable / data frame labels in heading
section. Defaults to TRUE. Can be set globally with
st_options.
Character. pander argument that
specifies how many characters wide a table can be. 100 by default.
Numeric. Vector of weights having same length as x.
NULL (default) indicates that no weights are used.
Logical. When set to TRUE, a global constant is
apply to make the total count equal nrow(x). FALSE by default.
An object having classes “matrix” and “summarytools” containing the statistics, with extra attributes useful to other functions/methods.
Since version 1.1, the stats argument can be set in a more flexible
way; keywords (all, common, fivenum) can be combined
with single statistics, or their “negation”. For instance, using
stats = c("all", "-q1", "-q3") would show
all except q1 and q3.
For further customization, you could redefine any preset in the
following manner: .st_env$descr.stats$common <- c("mean", "sd", "n").
Use caution when modifying .st_env, and reload the package
if errors ensue. Changes are temporary and will not persist across
R sessions.
data("exams")
# All stats (default behavior) for all numerical variables
descr(exams)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> economics english french geography history math
#> ----------------- ----------- --------- -------- ----------- --------- -------
#> Mean 73.91 75.96 73.94 70.04 72.77 73.54
#> Std.Dev 8.62 7.92 10.79 10.65 10.20 9.19
#> Min 60.50 58.30 44.80 47.20 53.90 55.60
#> Q1 68.80 70.90 68.20 65.90 68.20 66.95
#> Median 71.60 74.10 73.60 68.50 72.75 73.75
#> Q3 77.00 80.60 76.70 77.80 76.50 80.35
#> Max 94.20 93.10 94.70 96.30 93.50 93.20
#> MAD 5.49 6.52 7.56 12.31 6.45 9.93
#> IQR 8.20 9.70 8.50 11.90 8.15 13.35
#> CV 0.12 0.10 0.15 0.15 0.14 0.12
#> Skewness 0.75 0.28 0.03 0.10 0.01 0.12
#> SE.Skewness 0.43 0.43 0.43 0.43 0.43 0.44
#> Kurtosis -0.42 -0.25 0.45 -0.03 -0.60 -0.58
#> N.Valid 29.00 29.00 29.00 29.00 30.00 28.00
#> N 30.00 30.00 30.00 30.00 30.00 30.00
#> Pct.Valid 96.67 96.67 96.67 96.67 100.00 93.33
# Show only "common" statistics, plus "n"
descr(exams, stats = c("common", "n"))
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> economics english french geography history math
#> --------------- ----------- --------- -------- ----------- --------- -------
#> Mean 73.91 75.96 73.94 70.04 72.77 73.54
#> Std.Dev 8.62 7.92 10.79 10.65 10.20 9.19
#> Min 60.50 58.30 44.80 47.20 53.90 55.60
#> Median 71.60 74.10 73.60 68.50 72.75 73.75
#> Max 94.20 93.10 94.70 96.30 93.50 93.20
#> N.Valid 29.00 29.00 29.00 29.00 30.00 28.00
#> N 30.00 30.00 30.00 30.00 30.00 30.00
#> Pct.Valid 96.67 96.67 96.67 96.67 100.00 93.33
# Selection of statistics, transposing the results
descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> Mean Std.Dev Min Max
#> --------------- ------- --------- ------- -------
#> economics 73.91 8.62 60.50 94.20
#> english 75.96 7.92 58.30 93.10
#> french 73.94 10.79 44.80 94.70
#> geography 70.04 10.65 47.20 96.30
#> history 72.77 10.20 53.90 93.50
#> math 73.54 9.19 55.60 93.20
# Rmarkdown-ready
descr(exams, plain.ascii = FALSE, style = "rmarkdown")
#> Non-numerical variable(s) ignored: student, gender
#> ### Descriptive Statistics
#> #### exams
#> **N:** 30
#>
#> | | economics | english | french | geography | history | math |
#> |----------------:|----------:|--------:|-------:|----------:|--------:|------:|
#> | **Mean** | 73.91 | 75.96 | 73.94 | 70.04 | 72.77 | 73.54 |
#> | **Std.Dev** | 8.62 | 7.92 | 10.79 | 10.65 | 10.20 | 9.19 |
#> | **Min** | 60.50 | 58.30 | 44.80 | 47.20 | 53.90 | 55.60 |
#> | **Q1** | 68.80 | 70.90 | 68.20 | 65.90 | 68.20 | 66.95 |
#> | **Median** | 71.60 | 74.10 | 73.60 | 68.50 | 72.75 | 73.75 |
#> | **Q3** | 77.00 | 80.60 | 76.70 | 77.80 | 76.50 | 80.35 |
#> | **Max** | 94.20 | 93.10 | 94.70 | 96.30 | 93.50 | 93.20 |
#> | **MAD** | 5.49 | 6.52 | 7.56 | 12.31 | 6.45 | 9.93 |
#> | **IQR** | 8.20 | 9.70 | 8.50 | 11.90 | 8.15 | 13.35 |
#> | **CV** | 0.12 | 0.10 | 0.15 | 0.15 | 0.14 | 0.12 |
#> | **Skewness** | 0.75 | 0.28 | 0.03 | 0.10 | 0.01 | 0.12 |
#> | **SE.Skewness** | 0.43 | 0.43 | 0.43 | 0.43 | 0.43 | 0.44 |
#> | **Kurtosis** | -0.42 | -0.25 | 0.45 | -0.03 | -0.60 | -0.58 |
#> | **N.Valid** | 29.00 | 29.00 | 29.00 | 29.00 | 30.00 | 28.00 |
#> | **N** | 30.00 | 30.00 | 30.00 | 30.00 | 30.00 | 30.00 |
#> | **Pct.Valid** | 96.67 | 96.67 | 96.67 | 96.67 | 100.00 | 93.33 |
# Grouped statistics
data("tobacco")
with(tobacco, stby(BMI, gender, descr, check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics
#> BMI by gender
#> Data Frame: tobacco
#> N: 978
#>
#> F M
#> ----------------- -------- --------
#> Mean 26.10 25.31
#> Std.Dev 4.95 3.98
#> Min 9.01 8.83
#> Q1 22.98 22.52
#> Median 25.87 25.14
#> Q3 29.48 27.96
#> Max 39.44 36.76
#> MAD 4.75 4.02
#> IQR 6.49 5.44
#> CV 0.19 0.16
#> Skewness -0.02 -0.04
#> SE.Skewness 0.11 0.11
#> Kurtosis 0.09 0.17
#> N.Valid 475.00 477.00
#> N 489.00 489.00
#> Pct.Valid 97.14 97.55
# Grouped statistics in tidy table:
tb(with(tobacco, stby(BMI, age.gr, descr, stats = "common")))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> # A tibble: 4 × 10
#> age.gr variable mean sd min med max n.valid n pct.valid
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18-34 BMI 23.8 4.23 8.83 24.0 34.8 252 258 97.7
#> 2 35-50 BMI 25.1 4.34 10.3 25.1 39.4 232 241 96.3
#> 3 51-70 BMI 26.9 4.26 9.01 26.8 39.2 312 317 98.4
#> 4 71 + BMI 27.4 4.37 16.4 27.5 38.4 155 159 97.5
if (FALSE) { # \dontrun{
# Show in Viewer (or browser if not in RStudio)
view(descr(exams))
# Save to html file with title
print(descr(exams),
file = "descr_exams.html",
report.title = "BMI by Age Group",
footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall")
} # }