Searching and other uses of sorting for 64bit integers

This is roughly an implementation of hash functionality but based on sorting instead on a hashmap. Since sorting is more informative than hashing we can do some more interesting things.

sortnut(sorted, ...)

# S3 method for class 'integer64'
sortnut(sorted, ...)

ordernut(table, order, ...)

# S3 method for class 'integer64'
ordernut(table, order, ...)

sortfin(sorted, x, ...)

# S3 method for class 'integer64'
sortfin(sorted, x, method = NULL, ...)

orderfin(table, order, x, ...)

# S3 method for class 'integer64'
orderfin(table, order, x, method = NULL, ...)

orderpos(table, order, x, ...)

# S3 method for class 'integer64'
orderpos(table, order, x, nomatch = NA, method = NULL, ...)

sortorderpos(sorted, order, x, ...)

# S3 method for class 'integer64'
sortorderpos(sorted, order, x, nomatch = NA, method = NULL, ...)

orderdup(table, order, ...)

# S3 method for class 'integer64'
orderdup(table, order, method = NULL, ...)

sortorderdup(sorted, order, ...)

# S3 method for class 'integer64'
sortorderdup(sorted, order, method = NULL, ...)

sortuni(sorted, nunique, ...)

# S3 method for class 'integer64'
sortuni(sorted, nunique, ...)

orderuni(table, order, nunique, ...)

# S3 method for class 'integer64'
orderuni(table, order, nunique, keep.order = FALSE, ...)

sortorderuni(table, sorted, order, nunique, ...)

# S3 method for class 'integer64'
sortorderuni(table, sorted, order, nunique, ...)

orderupo(table, order, nunique, ...)

# S3 method for class 'integer64'
orderupo(table, order, nunique, keep.order = FALSE, ...)

sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)

# S3 method for class 'integer64'
sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)

ordertie(table, order, nties, ...)

# S3 method for class 'integer64'
ordertie(table, order, nties, ...)

sortordertie(sorted, order, nties, ...)

# S3 method for class 'integer64'
sortordertie(sorted, order, nties, ...)

sorttab(sorted, nunique, ...)

# S3 method for class 'integer64'
sorttab(sorted, nunique, ...)

ordertab(table, order, nunique, ...)

# S3 method for class 'integer64'
ordertab(table, order, nunique, denormalize = FALSE, keep.order = FALSE, ...)

sortordertab(sorted, order, ...)

# S3 method for class 'integer64'
sortordertab(sorted, order, denormalize = FALSE, ...)

orderkey(table, order, na.skip.num = 0L, ...)

# S3 method for class 'integer64'
orderkey(table, order, na.skip.num = 0L, ...)

sortorderkey(sorted, order, na.skip.num = 0L, ...)

# S3 method for class 'integer64'
sortorderkey(sorted, order, na.skip.num = 0L, ...)

orderrnk(table, order, na.count, ...)

# S3 method for class 'integer64'
orderrnk(table, order, na.count, ...)

sortorderrnk(sorted, order, na.count, ...)

# S3 method for class 'integer64'
sortorderrnk(sorted, order, na.count, ...)

sortqtl(sorted, na.count, probs, ...)

# S3 method for class 'integer64'
sortqtl(sorted, na.count, probs, ...)

orderqtl(table, order, na.count, probs, ...)

# S3 method for class 'integer64'
orderqtl(table, order, na.count, probs, ...)

Arguments

sorted: a sorted integer64 vector
...: further arguments, passed from generics, ignored in methods
table: the original data with original order under the sorted vector
order: an integer order vector that turns 'table' into 'sorted'
x: an integer64 vector
method: see Details
nomatch: the value to be returned if an element is not found in the hashmap
nunique: number of unique elements, usually we get this from cache or call sortnut or ordernut
keep.order: determines order of results and speed: FALSE (the default) is faster and returns in sorted order, TRUE returns in the order of first appearance in the original data, but this requires extra work
nties: number of tied values, usually we get this from cache or call sortnut or ordernut
denormalize: FALSE returns counts of unique values, TRUE returns each value with its counts
na.skip.num: 0 or the number of NAs. With 0, NAs are coded with 1L, with the number of NAs, these are coded with NA
na.count: the number of NAs, needed for this low-level function algorithm
probs: vector of probabilities in [0..1] for which we seek quantiles

Value

see details

Details

sortfun	orderfun	sortorderfun	see also	description
`sortnut`	`ordernut`			return number of tied and of unique values
`sortfin`	`orderfin`		`%in%.integer64`	return logical whether `x` is in `table`
	`orderpos`	`sortorderpos`	`match()`	return positions of `x` in `table`
	`orderdup`	`sortorderdup`	`duplicated()`	return logical whether values are duplicated
`sortuni`	`orderuni`	`sortorderuni`	`unique()`	return unique values (=dimensiontable)
	`orderupo`	`sortorderupo`	`unique()`	return positions of unique values
	`ordertie`	`sortordertie`		return positions of tied values
	`orderkey`	`sortorderkey`		positions of values in vector of unique values (match in dimensiontable)
`sorttab`	`ordertab`	`sortordertab`	`table()`	tabulate frequency of values
	`orderrnk`	`sortorderrnk`		rank averaging ties
`sortqtl`	`orderqtl`			return quantiles given probabilities

The functions sortfin, orderfin, orderpos and sortorderpos each offer three algorithms for finding x in table.

With method=1L each value of x is searched independently using binary search, this is fastest for small tables.

With method=2L the values of x are first sorted and then searched using doubly exponential search, this is the best allround method.

With method=3L the values of x are first sorted and then searched using simple merging, this is the fastest method if table is huge and x has similar size and distribution of values.

With method=NULL the functions use a heuristic to determine the fastest algorithm.

The functions orderdup and sortorderdup each offer two algorithms for setting the truth values in the return vector.

With method=1L the return values are set directly which causes random write access on a possibly large return vector.

With method=2L the return values are first set in a smaller bit-vector – random access limited to a smaller memory region – and finally written sequentially to the logical output vector.