This function is somewhat similar to tapply, but is designed for
use in conjunction with id. It is simpler in that it only
accepts a single grouping vector (use id if you have more)
and uses vapply internally, using the .default value
as the template.
vaggregate(.value, .group, .fun, ..., .default = NULL, .n = nlevels(.group))vaggregate should be faster than tapply in most situations
because it avoids making a copy of the data.
# Some examples of use borrowed from ?tapply
n <- 17; fac <- factor(rep(1:3, length.out = n), levels = 1:5)
table(fac)
#> fac
#> 1 2 3 4 5
#> 6 6 5 0 0
vaggregate(1:n, fac, sum)
#> [1] 51 57 45 0 0
vaggregate(1:n, fac, sum, .default = NA_integer_)
#> [1] 51 57 45 NA NA
vaggregate(1:n, fac, range)
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 Inf Inf
#> [2,] 16 17 15 -Inf -Inf
vaggregate(1:n, fac, range, .default = c(NA, NA) + 0)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 NA NA
#> [2,] 16 17 15 NA NA
vaggregate(1:n, fac, quantile)
#> [,1] [,2] [,3] [,4] [,5]
#> 0% 1.00 2.00 3 NA NA
#> 25% 4.75 5.75 6 NA NA
#> 50% 8.50 9.50 9 NA NA
#> 75% 12.25 13.25 12 NA NA
#> 100% 16.00 17.00 15 NA NA
# Unlike tapply, vaggregate does not support multi-d output:
tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
#> tension
#> wool L M H
#> A 401 216 221
#> B 254 259 169
vaggregate(warpbreaks$breaks, id(warpbreaks[,-1]), sum)
#> [1] 401 216 221 254 259 169
# But it is about 10x faster
x <- rnorm(1e6)
y1 <- sample.int(10, 1e6, replace = TRUE)
system.time(tapply(x, y1, mean))
#> user system elapsed
#> 0.027 0.005 0.032
system.time(vaggregate(x, y1, mean))
#> user system elapsed
#> 0.010 0.000 0.011