Vector aggregate. — vaggregate • plyr

This function is somewhat similar to tapply, but is designed for use in conjunction with id. It is simpler in that it only accepts a single grouping vector (use id if you have more) and uses vapply internally, using the .default value as the template.

vaggregate(.value, .group, .fun, ..., .default = NULL, .n = nlevels(.group))

Arguments

.value: vector of values to aggregate
.group: grouping vector
.fun: aggregation function
...: other arguments passed on to .fun
.default: default value used for missing groups. This argument is also used as the template for function output.
.n: total number of groups

Details

vaggregate should be faster than tapply in most situations because it avoids making a copy of the data.

Examples

# Some examples of use borrowed from ?tapply
n <- 17; fac <- factor(rep(1:3, length.out = n), levels = 1:5)
table(fac)
#> fac
#> 1 2 3 4 5 
#> 6 6 5 0 0 
vaggregate(1:n, fac, sum)
#> [1] 51 57 45  0  0
vaggregate(1:n, fac, sum, .default = NA_integer_)
#> [1] 51 57 45 NA NA
vaggregate(1:n, fac, range)
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3  Inf  Inf
#> [2,]   16   17   15 -Inf -Inf
vaggregate(1:n, fac, range, .default = c(NA, NA) + 0)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3   NA   NA
#> [2,]   16   17   15   NA   NA
vaggregate(1:n, fac, quantile)
#>       [,1]  [,2] [,3] [,4] [,5]
#> 0%    1.00  2.00    3   NA   NA
#> 25%   4.75  5.75    6   NA   NA
#> 50%   8.50  9.50    9   NA   NA
#> 75%  12.25 13.25   12   NA   NA
#> 100% 16.00 17.00   15   NA   NA
# Unlike tapply, vaggregate does not support multi-d output:
tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
#>     tension
#> wool   L   M   H
#>    A 401 216 221
#>    B 254 259 169
vaggregate(warpbreaks$breaks, id(warpbreaks[,-1]), sum)
#> [1] 401 216 221 254 259 169

# But it is about 10x faster
x <- rnorm(1e6)
y1 <- sample.int(10, 1e6, replace = TRUE)
system.time(tapply(x, y1, mean))
#>    user  system elapsed 
#>   0.027   0.005   0.032 
system.time(vaggregate(x, y1, mean))
#>    user  system elapsed 
#>   0.010   0.000   0.011