Collapsing functions for batch processing

These are used in aggregating the chunks resulting from batch processing. They are usually called via do.call

ccbind(...)
crbind(...)
cfun(..., FUN, FUNARGS = list())
cquantile(..., probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7)
csummary(..., na.rm = "ignored")
cmedian(..., na.rm = FALSE)
clength(..., na.rm = FALSE)
csum(..., na.rm = FALSE)
cmean(..., na.rm = FALSE)

Arguments

...: ...
FUN: a aggregating function
FUNARGS: further arguments to the aggregating function
na.rm: TRUE to remove NAs
probs: see quantile
names: see quantile
type: see quantile

Details

CFUN	FUN	comment
`ccbind`	`cbind`	like `cbind` but respecting names
`crbind`	`rbind`	like `rbind` but respecting names
`cfun`		`crbind` the input chunks and then apply 'FUN' to each column
`cquantile`	`quantile`	`crbind` the input chunks and then apply 'quantile' to each column
`csummary`	`summary`	`crbind` the input chunks and then apply 'summary' to each column
`cmedian`	`median`	`crbind` the input chunks and then apply 'median' to each column
`clength`	`length`	`crbind` the input chunks and then determine the number of values in each column
`csum`	`sum`	`crbind` the input chunks and then determine the sum values in each column
`cmean`	`mean`	`crbind` the input chunks and then determine the (unweighted) mean in each column

In order to use CFUNs on the result of lapply or ffapply use do.call.

Note

Currently - for command line convenience - we map the elements of a single list argument to ..., but this may change in the future.

ff options

xx TODO: extend this for weighted means, weighted median etc.,
google "Re: [R] Weighted median"

Value

depends on the CFUN used

Author

Jens Oehlschlägel

Examples

   X <- lapply(split(rnorm(1000), 1:10), summary)
   do.call("crbind", X)
#>         Min.    1st Qu.      Median         Mean   3rd Qu.     Max.
#> 1  -2.179957 -0.6759597 -0.07476356  0.022443662 0.6674930 2.692372
#> 2  -2.453647 -0.7413104 -0.12028090 -0.048683670 0.6322505 2.654898
#> 3  -2.495365 -0.7692907  0.24293310  0.083299302 0.7930365 2.222845
#> 4  -2.169239 -0.6375154 -0.12533974 -0.043612481 0.5690050 1.976758
#> 5  -2.938978 -0.7469811  0.05030706 -0.007614433 0.5577891 2.126445
#> 6  -2.102152 -0.6594382  0.03001467  0.109053808 0.7841179 2.755418
#> 7  -2.612334 -0.9433000 -0.21847810 -0.208140309 0.4811833 2.322557
#> 8  -2.808011 -0.6679331 -0.03427611 -0.037660258 0.5303771 2.682557
#> 9  -2.362209 -0.3913615  0.21154279  0.156740987 0.7709486 2.411659
#> 10 -2.390200 -0.6567872  0.09066702  0.017569544 0.8097709 2.648932
   do.call("csummary", X)
#>              Min.    1st Qu.       Median         Mean   3rd Qu.     Max.
#> Min.    -2.938978 -0.9433000 -0.218478096 -0.208140309 0.4811833 1.976758
#> 1st Qu. -2.583092 -0.7455634 -0.108901563 -0.042124425 0.5605931 2.247773
#> Median  -2.421924 -0.6719464 -0.002130720  0.004977555 0.6498717 2.530296
#> Mean    -2.451209 -0.6889877  0.005232623  0.004339615 0.6595972 2.449444
#> 3rd Qu. -2.225520 -0.6574499  0.080577033  0.068085392 0.7808255 2.675642
#> Max.    -2.102152 -0.3913615  0.242933096  0.156740987 0.8097709 2.755418
   do.call("cmean", X)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> -2.451209 -0.688988  0.005233  0.004340  0.659597  2.449444 
   do.call("cfun", c(X, list(FUN=mean, FUNARGS=list(na.rm=TRUE))))
#>         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
#> -2.451209171 -0.688987727  0.005232623  0.004339615  0.659597178  2.449444171 
   rm(X)