Sorts numeric from discrete variables and returns separate summaries for those types of variables.
summarize.RdThe work is done by the functions summarizeNumerics and
summarizeFactors. Please see the help pages for those
functions for complete details.
Arguments
- dat
A data frame
- alphaSort
If TRUE, the columns are re-organized in alphabetical order. If FALSE, they are presented in the original order.
- stats
A vector of desired summary statistics. Set
stats = NULLto omit all stat summaries. Legal elements arec("min", "med", "max", "mean", "sd", "var", "skewness", "kurtosis", "entropy", "normedEntropy", "nobs", "nmiss"). The statisticsc("entropy", "normedEntropy")are available only for factor variables, while mean, variance, and so forth will be calculated only for numeric variables."nobs"is the number of observations with non-missing, finite scores (not NA, NaN, -Inf, or Inf)."nmiss"is the number of cases with values of NA. The default setting forprobswill causec("min", "med", "max")to be included, they need not be requested explicitly. To disable them, reviseprobs.- probs
For numeric variables, is used with the
quantilefunction. The default isprobs = c(0, .50, 1.0), which are labeled in output asc("min", "med", and "max"). Setprobs = NULLto prevent these in the output.- digits
Decimal values to display, defaults as 2.
- ...
Optional arguments that are passed to
summarizeNumericsandsummarizeFactors. For numeric variables, one can specifyna.rmandunbiased. For discrete variables, the key argument ismaxLevels, which determines the number of levels that will be reported in tables for discrete variables.
Value
Return is a list with two objects 1) output from
summarizeNumerics: a data frame with variable names on rows
and summary stats on columns, 2) output from summarizeFactors:
a list with summary information about each discrete
variable. The display on-screen is governed by a method
print.summarize.
Details
The major purpose here is to generate summary data structure that is more useful in subsequent data analysis. The numeric portion of the summaries are a data frame that can be used in plots or other diagnostics.
The term "factors" was used, but "discrete variables" would have been more accurate. The factor summaries will collect all logical, factor, ordered, and character variables.
Other variable types, such as Dates, will be ignored, with a warning.
Author
Paul E. Johnson pauljohn@ku.edu
Examples
library(rockchalk)
set.seed(23452345)
N <- 100
x1 <- gl(12, 2, labels = LETTERS[1:12])
x2 <- gl(8, 3, labels = LETTERS[12:24])
x1 <- sample(x = x1, size=N, replace = TRUE)
x2 <- sample(x = x2, size=N, replace = TRUE)
z1 <- rnorm(N)
a1 <- rnorm(N, mean = 1.2, sd = 11.7)
a2 <- rpois(N, lambda = 10 + abs(a1))
a3 <- rgamma(N, 0.5, 4)
b1 <- rnorm(N, mean = 211.3, sd = 0.4)
dat <- data.frame(z1, a1, x2, a2, x1, a3, b1)
summary(dat)
#> z1 a1 x2 a2 x1
#> Min. :-2.0855 Min. :-21.5918 N :16 Min. : 4.00 I :13
#> 1st Qu.:-0.8893 1st Qu.: -9.4553 S :16 1st Qu.:14.00 H :12
#> Median :-0.1856 Median : -1.7660 M :15 Median :20.50 L :11
#> Mean :-0.1872 Mean : -0.3338 P :15 Mean :21.19 G :10
#> 3rd Qu.: 0.4332 3rd Qu.: 8.6615 Q :12 3rd Qu.:26.25 J :10
#> Max. : 2.0231 Max. : 38.0818 O :11 Max. :53.00 E : 9
#> (Other):15 (Other):35
#> a3 b1
#> Min. :3.330e-06 Min. :210.1
#> 1st Qu.:1.037e-02 1st Qu.:211.0
#> Median :6.276e-02 Median :211.3
#> Mean :1.333e-01 Mean :211.3
#> 3rd Qu.:1.713e-01 3rd Qu.:211.6
#> Max. :1.129e+00 Max. :212.3
#>
summarize(dat)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> med -0.186 -1.766 20.500 0.063 211.322
#> max 2.023 38.082 53 1.129 212.291
#> mean -0.187 -0.334 21.190 0.133 211.316
#> sd 0.956 12.810 9.054 0.184 0.422
#> skewness 0.161 0.463 0.837 2.440 -0.166
#> kurtosis -0.606 -0.246 0.770 7.918 -0.240
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> nobs : 100.000 nobs : 100.000
#> nmiss : 0.000 nmiss : 0.000
#> entropy : 2.945 entropy : 3.503
#> normedEntropy: 0.982 normedEntropy: 0.977
summarize(dat, digits = 4)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.0855 -21.5918 4 0 210.1006
#> med -0.1856 -1.7660 20.5000 0.0628 211.3223
#> max 2.0231 38.0818 53 1.1294 212.2905
#> mean -0.1872 -0.3338 21.1900 0.1333 211.3164
#> sd 0.9558 12.8097 9.0539 0.1842 0.4216
#> skewness 0.1606 0.4632 0.8366 2.4396 -0.1657
#> kurtosis -0.6063 -0.2458 0.7702 7.9180 -0.2397
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> nobs : 100.0000 nobs : 100.0000
#> nmiss : 0.0000 nmiss : 0.0000
#> entropy : 2.9445 entropy : 3.5031
#> normedEntropy: 0.9815 normedEntropy: 0.9772
summarize(dat, stats = c("min", "max", "mean", "sd"),
probs = c(0.25, 0.75))
#> Numeric variables
#> z1 a1 a2 a3 b1
#> pctile_25% -0.889 -9.455 14 0.010 211.009
#> pctile_75% 0.433 8.661 26.250 0.171 211.601
#> mean -0.187 -0.334 21.190 0.133 211.316
#> sd 0.956 12.810 9.054 0.184 0.422
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
summarize(dat, probs = c(0, 0.20, 0.80),
stats = c("nobs", "mean", "med", "entropy"))
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> pctile_20% -1.020 -11.849 14 0.008 210.967
#> pctile_80% 0.702 9.784 29 0.210 211.711
#> mean -0.187 -0.334 21.190 0.133 211.316
#> nobs 100 100 100 100 100
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> nobs : 100.00 nobs : 100.0
#> entropy: 2.94 entropy: 3.5
summarize(dat, probs = c(0, 0.20, 0.50),
stats = c("nobs", "nmiss", "mean", "entropy"), maxLevels=10)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> pctile_20% -1.020 -11.849 14 0.008 210.967
#> med -0.186 -1.766 20.500 0.063 211.322
#> mean -0.187 -0.334 21.190 0.133 211.316
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> L: 8 I : 13
#> M: 15 H : 12
#> N: 16 L : 11
#> O: 11 G : 10
#> P: 15 J : 10
#> Q: 12 E : 9
#> R: 7 A : 7
#> S: 16 B : 7
#> F : 6
#> (All Others): 15
#> nobs : 100.00 nobs : 100.0
#> nmiss : 0.00 nmiss : 0.0
#> entropy: 2.94 entropy: 3.5
dat.sum <- summarize(dat, probs = c(0, 0.20, 0.50),
stats = c("nobs", "nmiss", "mean", "entropy"), maxLevels=10)
dat.sum
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> pctile_20% -1.020 -11.849 14 0.008 210.967
#> med -0.186 -1.766 20.500 0.063 211.322
#> mean -0.187 -0.334 21.190 0.133 211.316
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> L: 8 I : 13
#> M: 15 H : 12
#> N: 16 L : 11
#> O: 11 G : 10
#> P: 15 J : 10
#> Q: 12 E : 9
#> R: 7 A : 7
#> S: 16 B : 7
#> F : 6
#> (All Others): 15
#> nobs : 100.00 nobs : 100.0
#> nmiss : 0.00 nmiss : 0.0
#> entropy: 2.94 entropy: 3.5
## Inspect unformatted structure of objects within return
dat.sum[["numerics"]]
#> min pctile_20% med mean nobs nmissing
#> z1 -2.085521e+00 -1.019580912 -0.1856309 -0.1872083 100 0
#> a1 -2.159177e+01 -11.849475853 -1.7659806 -0.3337909 100 0
#> a2 4.000000e+00 14.000000000 20.5000000 21.1900000 100 0
#> a3 3.333462e-06 0.008299486 0.0627584 0.1332853 100 0
#> b1 2.101006e+02 210.967102024 211.3223015 211.3163960 100 0
dat.sum[["factors"]]
#> $x2
#> $x2$table
#> L M N O P Q R S
#> 8 15 16 11 15 12 7 16
#>
#> $x2$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 2.9445412 0.9815137
#>
#>
#> $x1
#> $x1$table
#> A B C D E F G H I J K L
#> 7 7 5 4 9 6 10 12 13 10 6 11
#>
#> $x1$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 3.5030656 0.9771554
#>
#>
#> attr(,"class")
#> [1] "summarizedFactors"
#> attr(,"maxLevels")
#> [1] 10
#> attr(,"stats")
#> [1] "nobs" "nmiss" "mean" "entropy"
#> attr(,"digits")
#> [1] 3
## Only quantile values, no summary stats for numeric variables
## Discrete variables get entropy
summarize(dat,
probs = c(0, 0.25, 0.50, 0.75, 1.0),
stats = "entropy", digits = 2)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.09 -21.59 4 0 210.10
#> pctile_25% -0.89 -9.46 14 0.01 211.01
#> med -0.19 -1.77 20.50 0.06 211.32
#> pctile_75% 0.43 8.66 26.25 0.17 211.60
#> max 2.02 38.08 53 1.13 212.29
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> entropy: 2.9 entropy: 3.5
## Quantiles and the mean for numeric variables.
## No diversity stats for discrete variables (entropy omitted)
summarize(dat,
probs = c(0, 0.25, 0.50, 0.75, 1.0),
stats = "mean")
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> pctile_25% -0.889 -9.455 14 0.010 211.009
#> med -0.186 -1.766 20.500 0.063 211.322
#> pctile_75% 0.433 8.661 26.250 0.171 211.601
#> max 2.023 38.082 53 1.129 212.291
#> mean -0.187 -0.334 21.190 0.133 211.316
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
summarize(dat,
probs = NULL,
stats = "mean")
#> Numeric variables
#> z1 a1 a2 a3 b1
#> mean -0.187 -0.334 21.190 0.133 211.316
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
## Note: output is not beautified by a print method
dat.sn <- summarizeNumerics(dat)
dat.sn
#> min med max mean sd skewness
#> z1 -2.085521e+00 -0.1856309 2.023081 -0.1872083 0.9558129 0.1605997
#> a1 -2.159177e+01 -1.7659806 38.081822 -0.3337909 12.8097455 0.4631669
#> a2 4.000000e+00 20.5000000 53.000000 21.1900000 9.0539293 0.8366467
#> a3 3.333462e-06 0.0627584 1.129405 0.1332853 0.1842008 2.4395965
#> b1 2.101006e+02 211.3223015 212.290534 211.3163960 0.4216084 -0.1656890
#> kurtosis nobs nmissing
#> z1 -0.6063152 100 0
#> a1 -0.2458057 100 0
#> a2 0.7701532 100 0
#> a3 7.9180262 100 0
#> b1 -0.2396525 100 0
formatSummarizedNumerics(dat.sn)
#> z1 a1 a2 a3 b1
#> min -2.09 -21.59 4 0 210.10
#> med -0.19 -1.77 20.50 0.06 211.32
#> max 2.02 38.08 53 1.13 212.29
#> mean -0.19 -0.33 21.19 0.13 211.32
#> sd 0.96 12.81 9.05 0.18 0.42
#> skewness 0.16 0.46 0.84 2.44 -0.17
#> kurtosis -0.61 -0.25 0.77 7.92 -0.24
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
formatSummarizedNumerics(dat.sn, digits = 5)
#> z1 a1 a2 a3 b1
#> min -2.08552 -21.59177 4 0 210.10061
#> med -0.18563 -1.76598 20.50000 0.06276 211.32230
#> max 2.02308 38.08182 53 1.12941 212.29053
#> mean -0.18721 -0.33379 21.19000 0.13329 211.31640
#> sd 0.95581 12.80975 9.05393 0.18420 0.42161
#> skewness 0.16060 0.46317 0.83665 2.43960 -0.16569
#> kurtosis -0.60632 -0.24581 0.77015 7.91803 -0.23965
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
dat.summ <- summarize(dat)
dat.sf <- summarizeFactors(dat, maxLevels = 20)
dat.sf
#> $x1
#> $x1$table
#> A B C D E F G H I J K L
#> 7 7 5 4 9 6 10 12 13 10 6 11
#>
#> $x1$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 3.5030656 0.9771554
#>
#>
#> $x2
#> $x2$table
#> L M N O P Q R S
#> 8 15 16 11 15 12 7 16
#>
#> $x2$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 2.9445412 0.9815137
#>
#>
#> attr(,"class")
#> [1] "summarizedFactors"
#> attr(,"maxLevels")
#> [1] 20
#> attr(,"stats")
#> [1] "entropy" "normedEntropy" "nobs" "nmiss"
#> attr(,"digits")
#> [1] 2
formatSummarizedFactors(dat.sf)
#> x1 x2
#> A: 7 L: 8
#> B: 7 M: 15
#> C: 5 N: 16
#> D: 4 O: 11
#> E: 9 P: 15
#> F: 6 Q: 12
#> G: 10 R: 7
#> H: 12 S: 16
#> I: 13
#> J: 10
#> K: 6
#> L: 11
#> nobs : 100.00 nobs : 100.00
#> nmiss : 0.00 nmiss : 0.00
#> entropy : 3.50 entropy : 2.94
#> normedEntropy: 0.98 normedEntropy: 0.98
## See actual values of factor summaries, without
## beautified printing
summarizeFactors(dat, maxLevels = 5)
#> $x1
#> $x1$table
#> A B C D E F G H I J K L
#> 7 7 5 4 9 6 10 12 13 10 6 11
#>
#> $x1$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 3.5030656 0.9771554
#>
#>
#> $x2
#> $x2$table
#> L M N O P Q R S
#> 8 15 16 11 15 12 7 16
#>
#> $x2$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 2.9445412 0.9815137
#>
#>
#> attr(,"class")
#> [1] "summarizedFactors"
#> attr(,"maxLevels")
#> [1] 5
#> attr(,"stats")
#> [1] "entropy" "normedEntropy" "nobs" "nmiss"
#> attr(,"digits")
#> [1] 2
formatSummarizedFactors(summarizeFactors(dat, maxLevels = 5))
#> x1 x2
#> I : 13 N : 16
#> H : 12 S : 16
#> L : 11 M : 15
#> G : 10 P : 15
#> (All Others): 54 (All Others): 38
#> nobs : 100.00 nobs : 100.00
#> nmiss : 0.00 nmiss : 0.00
#> entropy : 3.50 entropy : 2.94
#> normedEntropy: 0.98 normedEntropy: 0.98
summarize(dat, alphaSort = TRUE)
#> Numeric variables
#> a1 a2 a3 b1 z1
#> min -21.592 4 0 210.101 -2.086
#> med -1.766 20.500 0.063 211.322 -0.186
#> max 38.082 53 1.129 212.291 2.023
#> mean -0.334 21.190 0.133 211.316 -0.187
#> sd 12.810 9.054 0.184 0.422 0.956
#> skewness 0.463 0.837 2.440 -0.166 0.161
#> kurtosis -0.246 0.770 7.918 -0.240 -0.606
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x1 x2
#> I : 13 N : 16
#> H : 12 S : 16
#> L : 11 M : 15
#> G : 10 P : 15
#> (All Others): 54 (All Others): 38
#> nobs : 100.000 nobs : 100.000
#> nmiss : 0.000 nmiss : 0.000
#> entropy : 3.503 entropy : 2.945
#> normedEntropy: 0.977 normedEntropy: 0.982
summarize(dat, digits = 6, alphaSort = FALSE)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.085521 -21.591766 4 0.000003 210.100610
#> med -0.185631 -1.765981 20.500000 0.062758 211.322302
#> max 2.023081 38.081822 53 1.129405 212.290534
#> mean -0.187208 -0.333791 21.190000 0.133285 211.316396
#> sd 0.955813 12.809746 9.053929 0.184201 0.421608
#> skewness 0.160600 0.463167 0.836647 2.439596 -0.165689
#> kurtosis -0.606315 -0.245806 0.770153 7.918026 -0.239653
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> nobs : 100.000000 nobs : 100.000000
#> nmiss : 0.000000 nmiss : 0.000000
#> entropy : 2.944541 entropy : 3.503066
#> normedEntropy: 0.981514 normedEntropy: 0.977155
summarize(dat, maxLevels = 2)
#> Numeric variables
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> med -0.186 -1.766 20.500 0.063 211.322
#> max 2.023 38.082 53 1.129 212.291
#> mean -0.187 -0.334 21.190 0.133 211.316
#> sd 0.956 12.810 9.054 0.184 0.422
#> skewness 0.161 0.463 0.837 2.440 -0.166
#> kurtosis -0.606 -0.246 0.770 7.918 -0.240
#> nobs 100 100 100 100 100
#> nmissing 0 0 0 0 0
#>
#> Nonnumeric variables
#> x2 x1
#> N : 16 I : 13
#> (All Others): 84 (All Others): 87
#> nobs : 100.000 nobs : 100.000
#> nmiss : 0.000 nmiss : 0.000
#> entropy : 2.945 entropy : 3.503
#> normedEntropy: 0.982 normedEntropy: 0.977
datsumm <- summarize(dat, stats = c("mean", "sd", "var", "entropy", "nobs"))
## Unbeautified numeric data frame, variables on the rows
datsumm[["numerics"]]
#> min med max mean sd var
#> z1 -2.085521e+00 -0.1856309 2.023081 -0.1872083 0.9558129 0.91357829
#> a1 -2.159177e+01 -1.7659806 38.081822 -0.3337909 12.8097455 164.08958047
#> a2 4.000000e+00 20.5000000 53.000000 21.1900000 9.0539293 81.97363636
#> a3 3.333462e-06 0.0627584 1.129405 0.1332853 0.1842008 0.03392994
#> b1 2.101006e+02 211.3223015 212.290534 211.3163960 0.4216084 0.17775367
#> nobs
#> z1 100
#> a1 100
#> a2 100
#> a3 100
#> b1 100
## Beautified versions 1. shows saved version:
attr(datsumm, "numeric.formatted")
#> z1 a1 a2 a3 b1
#> min -2.086 -21.592 4 0 210.101
#> med -0.186 -1.766 20.500 0.063 211.322
#> max 2.023 38.082 53 1.129 212.291
#> mean -0.187 -0.334 21.190 0.133 211.316
#> sd 0.956 12.810 9.054 0.184 0.422
#> var 0.914 164.090 81.974 0.034 0.178
#> nobs 100 100 100 100 100
## 2. Run formatSummarizedNumerics to re-specify digits:
formatSummarizedNumerics(datsumm[["numerics"]], digits = 10)
#> z1 a1 a2 a3 b1
#> min -2.0855207951 -21.5917659362 4 0.0000033335 210.1006096596
#> med -0.1856309007 -1.7659806343 20.5000000000 0.0627584000 211.3223015364
#> max 2.0230808757 38.0818218927 53 1.1294052546 212.2905342958
#> mean -0.1872083340 -0.3337909315 21.1900000000 0.1332852775 211.3163960106
#> sd 0.9558128927 12.8097455271 9.0539293328 0.1842008171 0.4216084338
#> var 0.9135782858 164.0895804693 81.9736363636 0.0339299410 0.1777536715
#> nobs 100 100 100 100 100
datsumm[["factors"]]
#> $x2
#> $x2$table
#> L M N O P Q R S
#> 8 15 16 11 15 12 7 16
#>
#> $x2$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 2.9445412 0.9815137
#>
#>
#> $x1
#> $x1$table
#> A B C D E F G H I J K L
#> 7 7 5 4 9 6 10 12 13 10 6 11
#>
#> $x1$stats
#> nobs nmiss entropy normedEntropy
#> 100.0000000 0.0000000 3.5030656 0.9771554
#>
#>
#> attr(,"class")
#> [1] "summarizedFactors"
#> attr(,"maxLevels")
#> [1] 5
#> attr(,"stats")
#> [1] "mean" "sd" "var" "entropy" "nobs"
#> attr(,"digits")
#> [1] 3
formatSummarizedFactors(datsumm[["factors"]])
#> x2 x1
#> N : 16 I : 13
#> S : 16 H : 12
#> M : 15 L : 11
#> P : 15 G : 10
#> (All Others): 38 (All Others): 54
#> nobs : 100.00 nobs : 100.0
#> entropy: 2.94 entropy: 3.5
formatSummarizedFactors(datsumm[["factors"]], digits = 6, maxLevels = 10)
#> x2 x1
#> L: 8 I : 13
#> M: 15 H : 12
#> N: 16 L : 11
#> O: 11 G : 10
#> P: 15 J : 10
#> Q: 12 E : 9
#> R: 7 A : 7
#> S: 16 B : 7
#> F : 6
#> (All Others): 15
#> nobs : 100.00000 nobs : 100.00000
#> entropy: 2.94454 entropy: 3.50307