Last Observation Carried Forward

Generic function for replacing each NA with the most recent non-NA prior to it.

Usage

na.locf(object, na.rm = TRUE, ...)
# Default S3 method
na.locf(object, na.rm = TRUE, fromLast, rev,
        maxgap = Inf, rule = 2, ...)

na.locf0(object, fromLast = FALSE, maxgap = Inf, coredata = NULL)

Arguments

object: an object.
na.rm: logical. Should leading NAs be removed?
fromLast: logical. Causes observations to be carried backward rather than forward. Default is FALSE. With a value of TRUE this corresponds to NOCB (next observation carried backward). It is not supported if x or xout is specified.
rev: Use fromLast instead. This argument will be eliminated in the future in favor of fromLast.
maxgap: Runs of more than maxgap NAs are retained, other NAs are removed and the last occurrence in the resulting series prior to each time point in xout is used as that time point's output value. (If xout is not specified this reduces to retaining runs of more than maxgap NAs while filling other NAs with the last occurrence of a non-NA.)
rule: See approx.
...: further arguments passed to methods.
coredata: logical. Should LOCF be applied to the core data of a (time series) object and then assigned to the original object again? By default, this strategy is applied to time series classes (e.g., ts, zoo, xts, etc.) where it preserves the time index.

Value

An object in which each NA in the input object is replaced by the most recent non-NA prior to it. If there are no earlier non-NAs then the NA is omitted (if na.rm = TRUE) or it is not replaced (if na.rm = FALSE).

The arguments x and xout can be used in which case they have the same meaning as in approx.

Note that if a multi-column zoo object has a column entirely composed of NA then with na.rm = TRUE, the default, the above implies that the resulting object will have zero rows. Use na.rm = FALSE to preserve the NA values instead.

The function na.locf0 is the workhorse function underlying the default na.locf method. It has more limited capabilities but is faster for the special cases it covers. Implicitly, it uses na.rm=FALSE.

Examples

az <- zoo(1:6)

bz <- zoo(c(2,NA,1,4,5,2))
na.locf(bz)
#> 1 2 3 4 5 6 
#> 2 2 1 4 5 2 
na.locf(bz, fromLast = TRUE)
#> 1 2 3 4 5 6 
#> 2 1 1 4 5 2 

cz <- zoo(c(NA,9,3,2,3,2))
na.locf(cz)
#> 2 3 4 5 6 
#> 9 3 2 3 2 

# generate and fill in missing dates
z <- zoo(c(0.007306621, 0.007659046, 0.007681013,
  0.007817548, 0.007847579, 0.007867313),
  as.Date(c("1993-01-01", "1993-01-09", "1993-01-16",
  "1993-01-23", "1993-01-30", "1993-02-06")))
g <- seq(start(z), end(z), "day")
na.locf(z, xout = g)
#>  1993-01-01  1993-01-02  1993-01-03  1993-01-04  1993-01-05  1993-01-06 
#> 0.007306621 0.007306621 0.007306621 0.007306621 0.007306621 0.007306621 
#>  1993-01-07  1993-01-08  1993-01-09  1993-01-10  1993-01-11  1993-01-12 
#> 0.007306621 0.007306621 0.007659046 0.007659046 0.007659046 0.007659046 
#>  1993-01-13  1993-01-14  1993-01-15  1993-01-16  1993-01-17  1993-01-18 
#> 0.007659046 0.007659046 0.007659046 0.007681013 0.007681013 0.007681013 
#>  1993-01-19  1993-01-20  1993-01-21  1993-01-22  1993-01-23  1993-01-24 
#> 0.007681013 0.007681013 0.007681013 0.007681013 0.007817548 0.007817548 
#>  1993-01-25  1993-01-26  1993-01-27  1993-01-28  1993-01-29  1993-01-30 
#> 0.007817548 0.007817548 0.007817548 0.007817548 0.007817548 0.007847579 
#>  1993-01-31  1993-02-01  1993-02-02  1993-02-03  1993-02-04  1993-02-05 
#> 0.007847579 0.007847579 0.007847579 0.007847579 0.007847579 0.007847579 
#>  1993-02-06 
#> 0.007867313 

# similar but use a 2 second grid

z <- zoo(1:9, as.POSIXct(c("2010-01-04 09:30:02", "2010-01-04 09:30:06",
 "2010-01-04 09:30:07", "2010-01-04 09:30:08", "2010-01-04 09:30:09", 
 "2010-01-04 09:30:10", "2010-01-04 09:30:11", "2010-01-04 09:30:13",
 "2010-01-04 09:30:14")))

g <- seq(start(z), end(z), by = "2 sec")
na.locf(z, xout = g)
#> 2010-01-04 09:30:02 2010-01-04 09:30:04 2010-01-04 09:30:06 2010-01-04 09:30:08 
#>                   1                   1                   2                   4 
#> 2010-01-04 09:30:10 2010-01-04 09:30:12 2010-01-04 09:30:14 
#>                   6                   7                   9 

## get 5th of every month or most recent date prior to 5th if 5th missing.
## Result has index of the date actually used.

z <- zoo(c(1311.56, 1309.04, 1295.5, 1296.6, 1286.57, 1288.12, 
1289.12, 1289.12, 1285.33, 1307.65, 1309.93, 1311.46, 1311.28, 
1308.11, 1301.74, 1305.41, 1309.72, 1310.61, 1305.19, 1313.21, 
1307.85, 1312.25, 1325.76), as.Date(c(13242, 13244, 
13245, 13248, 13249, 13250, 13251, 13252, 13255, 13256, 13257, 
13258, 13259, 13262, 13263, 13264, 13265, 13266, 13269, 13270, 
13271, 13272, 13274)))

# z.na is same as z but with missing days added (with NAs)
# It is formed by merging z with a zero with series having all the dates.

rng <- range(time(z))
z.na <- merge(z, zoo(, seq(rng[1], rng[2], by = "day")))

# use na.locf to bring values forward picking off 5th of month
na.locf(z.na)[as.POSIXlt(time(z.na))$mday == 5]
#> 2006-04-05 2006-05-05 
#>    1311.56    1312.25 

## this is the same as the last one except instead of always using the
## 5th of month in the result we show the date actually used

# idx has NAs wherever z.na does but has 1, 2, 3, ... instead of
# z.na's data values (so idx can be used for indexing)

idx <- coredata(na.locf(seq_along(z.na) + (0 * z.na)))

# pick off those elements of z.na that correspond to 5th

z.na[idx[as.POSIXlt(time(z.na))$mday == 5]]
#> 2006-04-04 2006-05-04 
#>    1311.56    1312.25 

## only fill single-day gaps

merge(z.na, filled1 = na.locf(z.na, maxgap = 1))
#>               z.na filled1
#> 2006-04-04 1311.56 1311.56
#> 2006-04-05      NA 1311.56
#> 2006-04-06 1309.04 1309.04
#> 2006-04-07 1295.50 1295.50
#> 2006-04-08      NA      NA
#> 2006-04-09      NA      NA
#> 2006-04-10 1296.60 1296.60
#> 2006-04-11 1286.57 1286.57
#> 2006-04-12 1288.12 1288.12
#> 2006-04-13 1289.12 1289.12
#> 2006-04-14 1289.12 1289.12
#> 2006-04-15      NA      NA
#> 2006-04-16      NA      NA
#> 2006-04-17 1285.33 1285.33
#> 2006-04-18 1307.65 1307.65
#> 2006-04-19 1309.93 1309.93
#> 2006-04-20 1311.46 1311.46
#> 2006-04-21 1311.28 1311.28
#> 2006-04-22      NA      NA
#> 2006-04-23      NA      NA
#> 2006-04-24 1308.11 1308.11
#> 2006-04-25 1301.74 1301.74
#> 2006-04-26 1305.41 1305.41
#> 2006-04-27 1309.72 1309.72
#> 2006-04-28 1310.61 1310.61
#> 2006-04-29      NA      NA
#> 2006-04-30      NA      NA
#> 2006-05-01 1305.19 1305.19
#> 2006-05-02 1313.21 1313.21
#> 2006-05-03 1307.85 1307.85
#> 2006-05-04 1312.25 1312.25
#> 2006-05-05      NA 1312.25
#> 2006-05-06 1325.76 1325.76

## fill NAs in first column by inflating the most recent non-NA
## by the growth in second column.  Note that elements of x-x
## are NA if the corresponding element of x is NA and zero else

m <- zoo(cbind(c(1, 2, NA, NA, 5, NA, NA), seq(7)^2), as.Date(1:7))

r <- na.locf(m[,1]) * m[,2] / na.locf(m[,2] + (m[,1]-m[,1]))
cbind(V1 = r, V2 = m[,2])
#>             V1 V2
#> 1970-01-02 1.0  1
#> 1970-01-03 2.0  4
#> 1970-01-04 4.5  9
#> 1970-01-05 8.0 16
#> 1970-01-06 5.0 25
#> 1970-01-07 7.2 36
#> 1970-01-08 9.8 49

## repeat a quarterly value every month
## preserving NAs
zq <- zoo(c(1, NA, 3, 4), as.yearqtr(2000) + 0:3/4)
tt <- as.yearmon(start(zq)) + seq(0, len = 3 * length(zq))/12
na.locf(zq, xout = tt, maxgap = 0)
#> Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul 2000 Aug 2000 
#>        1        1        1       NA       NA       NA        3        3 
#> Sep 2000 Oct 2000 Nov 2000 Dec 2000 
#>        3        4        4        4 

## na.locf() can also be mimicked with ave()
x <- c(NA, 10, NA, NA, 20, NA)
f <- function(x) x[1]
ave(x, cumsum(!is.na(x)), FUN = f)
#> [1] NA 10 10 10 20 20

## by replacing f() with other functions various generalizations can be
## obtained, e.g.,
f <- function(x) if (length(x) > 3) x else x[1]  # like maxgap
f <- function(x) replace(x, 1:min(length(x)), 3) # replace up to 2 NAs
f <- function(x) if (!is.na(x[1]) && x[1] > 0) x[1] else x  # only positve numbers

Last Observation Carried Forward

Usage

Arguments

Value

See also

Examples