Match works in the same way as join, but instead of return the combined dataset, it only returns the matching rows from the first dataset. This is particularly useful when you've summarised the data in some way and want to subset the original data by a characteristic of the subset.

match_df(x, y, on = NULL)

Arguments

x

data frame to subset.

y

data frame defining matching rows.

on

variables to match on - by default will use all variables common to both data frames.

Value

a data frame

Details

match_df shares the same semantics as join, not match:

  • the match criterion is ==, not identical).

  • it doesn't work for columns that are not atomic vectors

  • if there are no matches, the row will be omitted'

See also

join to combine the columns from both x and y and match for the base function selecting matching items

Examples

# count the occurrences of each id in the baseball dataframe, then get the subset with a freq >25
longterm <- subset(count(baseball, "id"), freq > 25)
# longterm
#             id freq
# 30   ansonca01   27
# 48   baineha01   27
# ...
# Select only rows from these longterm players from the baseball dataframe
# (match would default to match on shared column names, but here was explicitly set "id")
bb_longterm <- match_df(baseball, longterm, on="id")
bb_longterm[1:5,]
#>            id year stint team lg  g  ab  r   h X2b X3b hr rbi sb cs bb so ibb
#> 4   ansonca01 1871     1  RC1    25 120 29  39  11   3  0  16  6  2  2  1  NA
#> 121 ansonca01 1872     1  PH1    46 217 60  90  10   7  0  50  6  6 16  3  NA
#> 276 ansonca01 1873     1  PH1    52 254 53 101   9   2  0  36  0  2  5  1  NA
#> 398 ansonca01 1874     1  PH1    55 259 51  87   8   3  0  37  6  0  4  1  NA
#> 525 ansonca01 1875     1  PH1    69 326 84 106  15   3  0  58 11  6  4  2  NA
#>     hbp sh sf gidp
#> 4    NA NA NA   NA
#> 121  NA NA NA   NA
#> 276  NA NA NA   NA
#> 398  NA NA NA   NA
#> 525  NA NA NA   NA