Extract Regular Expression Matches Into a Data Frame

re_match wraps regexpr and returns the match results in a convenient data frame. The data frame has one column for each capture group if perl=TRUE, and one final columns called .match for the matching (sub)string. The columns of the capture groups are named if the groups themselves are named.

re_match(text, pattern, perl = TRUE, ...)

Arguments

text: Character vector.
pattern: A regular expression. See regex for more about regular expressions.
perl: logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
...: Additional arguments to pass to regexpr.

Value

A data frame of character vectors: one column per capture group, named if the group was named, and additional columns for the input text and the first matching (sub)string. Each row corresponds to an element in the text vector.

Note

re_match uses PCRE compatible regular expressions by default (i.e. perl = TRUE in regexpr). You can switch this off but if you do so capture groups will no longer be reported as they are only supported by PCRE.

Examples

dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
  "76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)
#> # A tibble: 7 × 5
#>   ``    ``    ``    .text            .match    
#>   <chr> <chr> <chr> <chr>            <chr>     
#> 1 2016  04    20    2016-04-20       2016-04-20
#> 2 1977  08    08    1977-08-08       1977-08-08
#> 3 NA    NA    NA    not a date       NA        
#> 4 NA    NA    NA    2016             NA        
#> 5 NA    NA    NA    76-03-02         NA        
#> 6 2012  06    30    2012-06-30       2012-06-30
#> 7 2015  01    21    2015-01-21 19:58 2015-01-21

# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)
#> # A tibble: 7 × 5
#>   year  month day   .text            .match    
#>   <chr> <chr> <chr> <chr>            <chr>     
#> 1 2016  04    20    2016-04-20       2016-04-20
#> 2 1977  08    08    1977-08-08       1977-08-08
#> 3 NA    NA    NA    not a date       NA        
#> 4 NA    NA    NA    2016             NA        
#> 5 NA    NA    NA    76-03-02         NA        
#> 6 2012  06    30    2012-06-30       2012-06-30
#> 7 2015  01    21    2015-01-21 19:58 2015-01-21

Extract Regular Expression Matches Into a Data Frame

Arguments

Value

Note

See also

Examples