A function that performs pattern substitution
Usage
sf_gsub(subject, pattern, replacement, encode_mode = "auto", fixed = FALSE,
nthreads = getOption("stringfish.nthreads", 1L))Arguments
- subject
The subject character vector to search
- pattern
The pattern to search for
- replacement
The replacement string
- encode_mode
"auto", "UTF-8" or "byte". Determines multi-byte (UTF-8) characters or single-byte characters are used.
- fixed
determines whether the pattern parameter should be interpreted literally or as a regular expression
- nthreads
Number of threads to use
Details
The function uses the PCRE2 library, which is also used internally by R. However, syntax may be slightly different. E.g.: capture groups: "\1" in R, but "$1" in PCRE2 (as in Perl). The encoding of the output is determined by the pattern (or forced using encode_mode parameter) and encodings should be compatible. E.g: mixing ASCII and UTF-8 is okay, but not UTF-8 and latin1. Note: the order of paramters is switched compared to the `gsub` base R function, with subject being first. See also: https://www.pcre.org/current/doc/html/pcre2api.html for more documentation on match syntax.
Examples
if(getRversion() >= "3.5.0") {
x <- "hello world"
pattern <- "^hello (.+)"
replacement <- "goodbye $1"
sf_gsub(x, pattern, replacement)
}
#> [1] "goodbye world"