fast combination of sort() and unique() for integers

bit_sort_unique(
  x,
  decreasing = FALSE,
  na.last = NA,
  has.dup = TRUE,
  range_na = NULL
)

Arguments

x

an integer vector

decreasing

FALSE (ascending) or TRUE (descending)

na.last

NA removes NAs, FALSE puts NAs at the beginning, TRUE puts NAs at the end

has.dup

TRUE (the default) assumes that x might have duplicates, set to FALSE if duplicates are impossible

range_na

NULL calls range_na(), optionally the result of range_na() can be given here to avoid calling it again

Value

a sorted unique integer vector

Details

determines the range of the integers and checks if the density justifies use of a bit vector; if yes, creates the result using a bit vector; if no, falls back to sort(unique())

Examples

bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L))
#> [1] 1 2
bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L), na.last=FALSE)
#> [1] NA  1  2
bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L), na.last=TRUE)
#> [1]  1  2 NA
bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L), decreasing = TRUE)
#> [1] 2 1
bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L), decreasing = TRUE, na.last=FALSE)
#> [1] NA  2  1
bit_sort_unique(c(2L, 1L, NA, NA, 1L, 2L), decreasing = TRUE, na.last=TRUE)
#> [1]  2  1 NA

if (FALSE) { # \dontrun{
x <- sample(1e7, replace=TRUE)
system.time(bit_sort_unique(x))
system.time(sort(unique(x)))
x <- sample(1e7)
system.time(bit_sort_unique(x))
system.time(sort(x))
} # }