ffsort.rdSorting: sort an ff vector – optionally in-place
ffsort(x
, aux = NULL
, has.na = TRUE
, na.last = TRUE
, decreasing = FALSE
, inplace = FALSE
, decorate = FALSE
, BATCHBYTES = getOption("ffmaxbytes")
, VERBOSE = FALSE
)an ff vector
NULL or an ff vector of the same type for temporary storage
boolean scalar telling ffsort whether the vector might contain NAs.
Note that you risk a crash if there are unexpected NAs with has.na=FALSE
boolean scalar telling ffsort whether to sort NAs last or first.
Note that 'boolean' means that there is no third option NA as in sort
boolean scalar telling ffsort whether to sort increasing or decreasing
boolean scalar telling ffsort whether to sort the original ff vector (TRUE)
or to create a sorted copy (FALSE, the default)
boolean scalar telling ffsort whether to decorate the returned ff vector with is.sorted
and na.count attributes.
maximum number of RAM bytes ffsort should try not to exceed
cat some info about the sorting
ffsort tries to sort the vector in-RAM respecting the BATCHBYTES limit.
If a fast sort it not possible, it uses a slower in-place sort (shellsort).
If in-RAM is not possible, it uses (a yet simple) out-of-memory algorithm.
Like ramsort the in-RAM sorting method is choosen depending on context information.
If a key-index sort can be used, ffsort completely avoids merging disk based subsorts.
If argument decorate=TRUE is used, then na.count(x) will return the number of NAs
and is.sorted(x) will return TRUE if the sort was done with na.last=TRUE and decreasing=FALSE.
the ff vector may not have a names attribute
n <- 1e6
x <- ff(c(NA, 999999:1), vmode="double", length=n)
x <- ffsort(x)
x
#> ff (open) double length=1000000 (1000000)
#> [1] [2] [3] [4] [5] [6] [7] [8]
#> 1 2 3 4 5 6 7 8
#> [999993] [999994] [999995] [999996] [999997] [999998] [999999]
#> : 999993 999994 999995 999996 999997 999998 999999
#> [1000000]
#> NA
is.sorted(x)
#> [1] FALSE
na.count(x)
#> [1] NA
x <- ffsort(x, decorate=TRUE)
is.sorted(x)
#> [1] TRUE
na.count(x)
#> [1] 1
x <- ffsort(x, BATCHBYTES=n, VERBOSE=TRUE)
#> method=shell BATCHSIZE=125000