re_match wraps regexpr and returns the
match results in a convenient data frame. The data frame has one
column for each capture group if perl=TRUE, and one final columns
called .match for the matching (sub)string. The columns of the capture
groups are named if the groups themselves are named.
re_match(text, pattern, perl = TRUE, ...)A data frame of character vectors: one column per capture
group, named if the group was named, and additional columns for
the input text and the first matching (sub)string. Each row
corresponds to an element in the text vector.
re_match uses PCRE compatible regular expressions by default
(i.e. perl = TRUE in regexpr). You can switch
this off but if you do so capture groups will no longer be reported as they
are only supported by PCRE.
Other tidy regular expression matching:
re_exec_all(),
re_exec(),
re_match_all()
dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
"76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)
#> # A tibble: 7 × 5
#> `` `` `` .text .match
#> <chr> <chr> <chr> <chr> <chr>
#> 1 2016 04 20 2016-04-20 2016-04-20
#> 2 1977 08 08 1977-08-08 1977-08-08
#> 3 NA NA NA not a date NA
#> 4 NA NA NA 2016 NA
#> 5 NA NA NA 76-03-02 NA
#> 6 2012 06 30 2012-06-30 2012-06-30
#> 7 2015 01 21 2015-01-21 19:58 2015-01-21
# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)
#> # A tibble: 7 × 5
#> year month day .text .match
#> <chr> <chr> <chr> <chr> <chr>
#> 1 2016 04 20 2016-04-20 2016-04-20
#> 2 1977 08 08 1977-08-08 1977-08-08
#> 3 NA NA NA not a date NA
#> 4 NA NA NA 2016 NA
#> 5 NA NA NA 76-03-02 NA
#> 6 2012 06 30 2012-06-30 2012-06-30
#> 7 2015 01 21 2015-01-21 19:58 2015-01-21