ffindexorder.rdFunction ffindexorder will calculate chunkwise the order positions to sort all positions in a chunk ascending.
Function ffindexordersize does the calculation of the chunksize for ffindexorder.
ffindexordersize(length, vmode, BATCHBYTES = getOption("ffmaxbytes"))
ffindexorder(index, BATCHSIZE, FF_RETURN = NULL, VERBOSE = FALSE)A ff integer vector with integer subscripts.
Limit for the chunksize (see details)
Limit for the number of bytes per batch
Optionally an ff integer vector in which the chunkwise order positions are stored.
Logical scalar for activating verbosing.
Number of elements in the index
The vmode of the ff vector to which the index shall be applied with ffindexget or ffindexset
Accessing integer positions in an ff vector is a non-trivial task, because it could easily lead to random-access to a disk file.
We avoid random access by loading batches of the subscript values into RAM, order them ascending, and only then access the ff values on disk.
Such an ordering can be done on-the-fly by ffindexget or it can be created upfront with ffindexorder, stored and re-used,
similar to storing and using hybrid index information with as.hi.
Function ffindexorder returns an ff integer vector with an attribute BATCHSIZE (the chunksize finally used, not the one given with argument BATCHSIZE).
Function ffindexordersize returns a balanced batchsize as returned from bbatch.
x <- ff(sample(40))
message("fforder requires sorting")
#> fforder requires sorting
i <- fforder(x)
message("applying this order i is done by ffindexget")
#> applying this order i is done by ffindexget
x[i]
#> ff (open) integer length=40 (40)
#> [1] [2] [3] [4] [5] [6] [7] [8] [33] [34] [35] [36] [37] [38] [39]
#> 1 2 3 4 5 6 7 8 : 33 34 35 36 37 38 39
#> [40]
#> 40
message("applying this order i requires random access,
therefore ffindexget does chunkwise sorting")
#> applying this order i requires random access,
#> therefore ffindexget does chunkwise sorting
ffindexget(x, i)
#> ff (open) integer length=40 (40)
#> [1] [2] [3] [4] [5] [6] [7] [8] [33] [34] [35] [36] [37] [38] [39]
#> 1 2 3 4 5 6 7 8 : 33 34 35 36 37 38 39
#> [40]
#> 40
message("if we want to apply the order i multiple times,
we can do the chunkwise sorting once and store it")
#> if we want to apply the order i multiple times,
#> we can do the chunkwise sorting once and store it
s <- ffindexordersize(length(i), vmode(i), BATCHBYTES = 100)
o <- ffindexorder(i, s$b)
message("this is how the stored chunkwise sorting is used")
#> this is how the stored chunkwise sorting is used
ffindexget(x, i, o)
#> ff (open) integer length=40 (40)
#> [1] [2] [3] [4] [5] [6] [7] [8] [33] [34] [35] [36] [37] [38] [39]
#> 1 2 3 4 5 6 7 8 : 33 34 35 36 37 38 39
#> [40]
#> 40
message("")
#>
rm(x,i,s,o)
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 1156551 61.8 1994352 106.6 1994352 106.6
#> Vcells 2152641 16.5 8388608 64.0 8318871 63.5