stri_duplicated() determines which strings in a character vector
are duplicates of other elements.
stri_duplicated_any() determines if there are any duplicated
strings in a character vector.
Usage
stri_duplicated(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)
stri_duplicated_any(
str,
from_last = FALSE,
fromLast = from_last,
...,
opts_collator = NULL
)Arguments
- str
a character vector
- from_last
a single logical value; indicates whether search should be performed from the last to the first string
- fromLast
[DEPRECATED] alias of
from_last- ...
additional settings for
opts_collator- opts_collator
a named list with ICU Collator's options, see
stri_opts_collator,NULLfor default collation options
Value
stri_duplicated() returns a logical vector of the same length
as str. Each of its elements indicates whether a canonically
equivalent string was already found in str.
stri_duplicated_any() returns a single non-negative integer.
Value of 0 indicates that all the elements in str are unique.
Otherwise, it gives the index of the first non-unique element.
Details
Missing values are regarded as equal.
Unlike duplicated and anyDuplicated,
these functions test for canonical equivalence of strings
(and not whether the strings are just bytewise equal)
Such operations are locale-dependent.
Hence, stri_duplicated and stri_duplicated_any
are significantly slower (but much better suited for natural language
processing) than their base R counterparts.
See also stri_unique for extracting unique elements.
References
Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/
See also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other locale_sensitive:
%s<%(),
about_locale,
about_search_boundaries,
about_search_coll,
stri_compare(),
stri_count_boundaries(),
stri_enc_detect2(),
stri_extract_all_boundaries(),
stri_locate_all_boundaries(),
stri_opts_collator(),
stri_order(),
stri_rank(),
stri_sort(),
stri_sort_key(),
stri_split_boundaries(),
stri_trans_tolower(),
stri_unique(),
stri_wrap()
Author
Marek Gagolewski and other contributors
Examples
# In the following examples, we have 3 duplicated values,
# 'a' - 2 times, NA - 1 time
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA))
#> [1] FALSE FALSE TRUE FALSE TRUE TRUE
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA), from_last=TRUE)
#> [1] TRUE FALSE TRUE TRUE FALSE FALSE
stri_duplicated_any(c('a', 'b', 'a', NA, 'a', NA))
#> [1] 3
# compare the results:
stri_duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
#> [1] FALSE TRUE
duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
#> [1] FALSE FALSE
stri_duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
#> [1] FALSE TRUE TRUE TRUE
duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'))
#> [1] FALSE FALSE FALSE FALSE