These functions find the indexes (positions) where
there is a match to some pattern.
The functions stri_locate_all_* locate all the matches.
stri_locate_first_* and stri_locate_last_*
give the first and the last matches, respectively.
Usage
stri_locate_all(str, ..., regex, fixed, coll, charclass)
stri_locate_first(str, ..., regex, fixed, coll, charclass)
stri_locate_last(str, ..., regex, fixed, coll, charclass)
stri_locate(
str,
...,
regex,
fixed,
coll,
charclass,
mode = c("first", "all", "last")
)
stri_locate_all_charclass(
str,
pattern,
merge = TRUE,
omit_no_match = FALSE,
get_length = FALSE
)
stri_locate_first_charclass(str, pattern, get_length = FALSE)
stri_locate_last_charclass(str, pattern, get_length = FALSE)
stri_locate_all_coll(
str,
pattern,
omit_no_match = FALSE,
get_length = FALSE,
...,
opts_collator = NULL
)
stri_locate_first_coll(
str,
pattern,
get_length = FALSE,
...,
opts_collator = NULL
)
stri_locate_last_coll(
str,
pattern,
get_length = FALSE,
...,
opts_collator = NULL
)
stri_locate_all_regex(
str,
pattern,
omit_no_match = FALSE,
capture_groups = FALSE,
get_length = FALSE,
...,
opts_regex = NULL
)
stri_locate_first_regex(
str,
pattern,
capture_groups = FALSE,
get_length = FALSE,
...,
opts_regex = NULL
)
stri_locate_last_regex(
str,
pattern,
capture_groups = FALSE,
get_length = FALSE,
...,
opts_regex = NULL
)
stri_locate_all_fixed(
str,
pattern,
omit_no_match = FALSE,
get_length = FALSE,
...,
opts_fixed = NULL
)
stri_locate_first_fixed(
str,
pattern,
get_length = FALSE,
...,
opts_fixed = NULL
)
stri_locate_last_fixed(
str,
pattern,
get_length = FALSE,
...,
opts_fixed = NULL
)Arguments
- str
character vector; strings to search in
- ...
supplementary arguments passed to the underlying functions, including additional settings for
opts_collator,opts_regex,opts_fixed, and so on- mode
single string; one of:
'first'(the default),'all','last'- pattern, regex, fixed, coll, charclass
character vector; search patterns; for more details refer to stringi-search
- merge
single logical value; indicates whether consecutive sequences of indexes in the resulting matrix should be merged;
stri_locate_all_charclassonly- omit_no_match
single logical value; if
TRUE, a no-match will be indicated by a matrix with 0 rowsstri_locate_all_*only- get_length
single logical value; if
FALSE(default), generate from-to matrices; otherwise, output from-length ones- opts_collator, opts_fixed, opts_regex
named list used to tune up the selected search engine's settings; see
stri_opts_collator,stri_opts_fixed, andstri_opts_regex, respectively;NULLfor the defaults- capture_groups
single logical value; whether positions of matches to parenthesized subexpressions should be returned too (as
capture_groupsattribute);stri_locate_*_regexonly
Value
For stri_locate_all_*,
a list of integer matrices is returned. Each list element
represents the results of a separate search scenario.
The first column gives the start positions
of the matches, and the second column gives the end positions.
Moreover, two NAs in a row denote NA arguments
or a no-match (the latter only if omit_no_match is FALSE).
stri_locate_first_* and stri_locate_last_*
return an integer matrix with
two columns, giving the start and end positions of the first
or the last matches, respectively, and two NAs if and
only if they are not found.
For stri_locate_*_regex, if the match is of zero length,
end will be one character less than start.
Note that stri_locate_last_regex searches from start to end,
but skips overlapping matches, see the example below.
Setting get_length=TRUE results in the 2nd column representing
the length of the match instead of the end position. In this case,
negative length denotes a no-match.
If capture_groups=TRUE, then the outputs are equipped with the
capture_groups attribute, which is a list of matrices
giving the start-end positions of matches to parenthesized subexpressions.
Similarly to stri_match_regex, capture group names are extracted
unless looking for first/last occurrences of many different patterns.
Details
Vectorized over str and pattern (with recycling
of the elements in the shorter vector if necessary). This allows to,
for instance, search for one pattern in each string,
search for each pattern in one string,
and search for the i-th pattern within the i-th string.
The matches may be extracted by calling
stri_sub or stri_sub_all.
Alternatively, you may call stri_extract directly.
stri_locate, stri_locate_all, stri_locate_first,
and stri_locate_last are convenience functions.
They just call stri_locate_*_*, depending on the arguments used.
See also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other search_locate:
about_search,
stri_locate_all_boundaries()
Other indexing:
stri_locate_all_boundaries(),
stri_sub(),
stri_sub_all()
Author
Marek Gagolewski and other contributors
Examples
stri_locate_all('stringi', fixed='i')
#> [[1]]
#> start end
#> [1,] 4 4
#> [2,] 7 7
#>
stri_locate_first_coll('hladn\u00FD', 'HLADNY', strength=1, locale='sk_SK')
#> start end
#> [1,] 1 6
stri_locate_all_regex(
c('breakfast=eggs;lunch=pizza', 'breakfast=spam', 'no food here'),
'(?<when>\\w+)=(?<what>\\w+)',
capture_groups=TRUE
) # named capture groups
#> [[1]]
#> start end
#> [1,] 1 14
#> [2,] 16 26
#> attr(,"capture_groups")
#> attr(,"capture_groups")$when
#> start end
#> [1,] 1 9
#> [2,] 16 20
#>
#> attr(,"capture_groups")$what
#> start end
#> [1,] 11 14
#> [2,] 22 26
#>
#>
#> [[2]]
#> start end
#> [1,] 1 14
#> attr(,"capture_groups")
#> attr(,"capture_groups")$when
#> start end
#> [1,] 1 9
#>
#> attr(,"capture_groups")$what
#> start end
#> [1,] 11 14
#>
#>
#> [[3]]
#> start end
#> [1,] NA NA
#> attr(,"capture_groups")
#> attr(,"capture_groups")$when
#> start end
#> [1,] NA NA
#>
#> attr(,"capture_groups")$what
#> start end
#> [1,] NA NA
#>
#>
stri_locate_all_fixed("abababa", "ABA", case_insensitive=TRUE, overlap=TRUE)
#> [[1]]
#> start end
#> [1,] 1 3
#> [2,] 3 5
#> [3,] 5 7
#>
stri_locate_first_fixed("ababa", "aba")
#> start end
#> [1,] 1 3
stri_locate_last_fixed("ababa", "aba") # starts from end
#> start end
#> [1,] 3 5
stri_locate_last_regex("ababa", "aba") # no overlaps, from left to right
#> start end
#> [1,] 1 3
x <- c("yes yes", "no", NA)
stri_locate_all_fixed(x, "yes")
#> [[1]]
#> start end
#> [1,] 1 3
#> [2,] 5 7
#>
#> [[2]]
#> start end
#> [1,] NA NA
#>
#> [[3]]
#> start end
#> [1,] NA NA
#>
stri_locate_all_fixed(x, "yes", omit_no_match=TRUE)
#> [[1]]
#> start end
#> [1,] 1 3
#> [2,] 5 7
#>
#> [[2]]
#> start end
#>
#> [[3]]
#> start end
#> [1,] NA NA
#>
stri_locate_all_fixed(x, "yes", get_length=TRUE)
#> [[1]]
#> start length
#> [1,] 1 3
#> [2,] 5 3
#>
#> [[2]]
#> start length
#> [1,] -1 -1
#>
#> [[3]]
#> start length
#> [1,] NA NA
#>
stri_locate_all_fixed(x, "yes", get_length=TRUE, omit_no_match=TRUE)
#> [[1]]
#> start length
#> [1,] 1 3
#> [2,] 5 3
#>
#> [[2]]
#> start length
#>
#> [[3]]
#> start length
#> [1,] NA NA
#>
stri_locate_first_fixed(x, "yes")
#> start end
#> [1,] 1 3
#> [2,] NA NA
#> [3,] NA NA
stri_locate_first_fixed(x, "yes", get_length=TRUE)
#> start length
#> [1,] 1 3
#> [2,] -1 -1
#> [3,] NA NA
# Use regex positive-lookahead to locate overlapping pattern matches:
stri_locate_all_regex('ACAGAGACTTTAGATAGAGAAGA', '(?=AGA)')
#> [[1]]
#> start end
#> [1,] 3 2
#> [2,] 5 4
#> [3,] 12 11
#> [4,] 16 15
#> [5,] 18 17
#> [6,] 21 20
#>
# note that start > end here (match of length zero)