Fast mean calculations in non-overlapping bins

Computes the sample means in non-overlapping bins

binMeans(y, x, idxs = NULL, bx, na.rm = TRUE, count = TRUE,
  right = FALSE, ...)

Arguments

y: A numeric or logical vector of K values to calculate means on.
x: A numeric vector of K positions for to be binned.
idxs: A vector indicating subset of elements to operate over. If NULL, no subsetting is done.
bx: A numeric vector of B + 1 ordered positions specifying the B > 0 bins [bx[1], bx[2]), [bx[2], bx[3]), ..., [bx[B], bx[B + 1]).
na.rm: If TRUE, missing values in y are dropped before calculating the mean, otherwise not.
count: If TRUE, the number of data points in each bins is returned as attribute count, which is an integer vector of length B.
right: If TRUE, the bins are right-closed (left open), otherwise left-closed (right open).
...: Not used.

Value

Returns a numeric vector of length B.

Details

binMeans(x, bx, right = TRUE) gives equivalent results as rev(binMeans(-x, bx = sort(-bx), right = FALSE)), but is faster.

Missing and non-finite values

Data points where either of y and x is missing are dropped (and therefore are also not counted). Non-finite values in y are not allowed and gives an error. Missing values in bx are not allowed and gives an error.

References

[1] R-devel thread Fastest non-overlapping binning mean function out there? on Oct 3, 2012

Author

Henrik Bengtsson with initial code contributions by Martin Morgan [1].

Examples

x <- 1:200
mu <- double(length(x))
mu[1:50] <- 5
mu[101:150] <- -5
y <- mu + rnorm(length(x))

# Binning
bx <- c(0, 50, 100, 150, 200) + 0.5
y_s <- binMeans(y, x = x, bx = bx)

plot(x, y)
for (kk in seq_along(y_s)) {
  lines(bx[c(kk, kk + 1)], y_s[c(kk, kk)], col = "blue", lwd = 2)
}