These functions are used to subset a data frame, applying the expressions in
... to determine which rows should be kept (for filter()) or dropped (
for filter_out()).
Multiple conditions can be supplied separated by a comma. These will be
combined with the & operator. To combine comma separated conditions using
| instead, wrap them in when_any().
Both filter() and filter_out() treat NA like FALSE. This subtle
behavior can impact how you write your conditions when missing values are
involved. See the section on Missing values for important details and
examples.
Usage
filter(.data, ..., .by = NULL, .preserve = FALSE)
filter_out(.data, ..., .by = NULL, .preserve = FALSE)Arguments
- .data
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
- ...
<
data-masking> Expressions that return a logical vector, defined in terms of the variables in.data. If multiple expressions are included, they are combined with the&operator. To combine expressions using|instead, wrap them inwhen_any(). Only rows for which all expressions evaluate toTRUEare kept (forfilter()) or dropped (forfilter_out()).- .by
<
tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative togroup_by(). For details and examples, see ?dplyr_by.- .preserve
Relevant when the
.datainput is grouped. If.preserve = FALSE(the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is.
Value
An object of the same type as .data. The output has the following
properties:
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if
.preserveis notTRUE).Data frame attributes are preserved.
Missing values
Both filter() and filter_out() treat NA like FALSE. This results in
the following behavior:
filter()drops bothNAandFALSE.filter_out()keeps bothNAandFALSE.
This means that filter(data, <conditions>) + filter_out(data, <conditions>)
captures every row within data exactly once.
The NA handling of these functions has been designed to match your
intent. When your intent is to keep rows, use filter(). When your intent
is to drop rows, use filter_out().
For example, if your goal with this cars data is to "drop rows where the
class is suv", then you might write this in one of two ways:
cars <- tibble(class = c("suv", NA, "coupe"))
cars
#> # A tibble: 3 x 1
#> class
#> <chr>
#> 1 suv
#> 2 <NA>
#> 3 coupecars |> filter(class != "suv")
#> # A tibble: 1 x 1
#> class
#> <chr>
#> 1 coupecars |> filter_out(class == "suv")
#> # A tibble: 2 x 1
#> class
#> <chr>
#> 1 <NA>
#> 2 coupeNote how filter() drops the NA rows even though our goal was only to drop
"suv" rows, but filter_out() matches our intuition.
To generate the correct result with filter(), you'd need to use:
cars |> filter(class != "suv" | is.na(class))
#> # A tibble: 2 x 1
#> class
#> <chr>
#> 1 <NA>
#> 2 coupeThis quickly gets unwieldy when multiple conditions are involved.
In general, if you find yourself:
Using "negative" operators like
!=or!Adding in
NAhandling like| is.na(col)or& !is.na(col)
then you should consider if swapping to the other filtering variant would make your conditions simpler.
Comparison to base subsetting
Base subsetting with [ doesn't treat NA like TRUE or FALSE. Instead,
it generates a fully missing row, which is different from how both filter()
and filter_out() work.
cars <- tibble(class = c("suv", NA, "coupe"), mpg = c(10, 12, 14))
cars
#> # A tibble: 3 x 2
#> class mpg
#> <chr> <dbl>
#> 1 suv 10
#> 2 <NA> 12
#> 3 coupe 14cars[cars$class == "suv",]
#> # A tibble: 2 x 2
#> class mpg
#> <chr> <dbl>
#> 1 suv 10
#> 2 <NA> NA
cars |> filter(class == "suv")
#> # A tibble: 1 x 2
#> class mpg
#> <chr> <dbl>
#> 1 suv 10Useful filter functions
There are many functions and operators that are useful when constructing the expressions used to filter the data:
==,>,>=etc
Grouped tibbles
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
With the grouped equivalent:
In the ungrouped version, filter() compares the value of mass in each row
to the global average (taken over the whole data set), keeping only the rows
with mass greater than this global average. In contrast, the grouped
version calculates the average mass separately for each gender group, and
keeps rows with mass greater than the relevant within-gender average.
Methods
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
dbplyr (tbl_lazy), dplyr (data.frame, ts)
.
Examples
# Filtering for one criterion
filter(starwars, species == "Human")
#> # A tibble: 35 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
#> 2 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 3 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 4 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 5 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 6 Biggs D… 183 84 black light brown 24 male mascu…
#> 7 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> 8 Anakin … 188 84 blond fair blue 41.9 male mascu…
#> 9 Wilhuff… 180 NA auburn, g… fair blue 64 male mascu…
#> 10 Han Solo 180 80 brown fair brown 29 male mascu…
#> # ℹ 25 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# Filtering for multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male mascu…
#> 2 Gasgano 122 NA none white, bl… black NA male mascu…
#> 3 Kit Fisto 196 87 none green black NA male mascu…
#> 4 Plo Koon 188 80 none orange black 22 male mascu…
#> 5 Lama Su 229 88 none grey black NA male mascu…
#> 6 Taun We 213 NA none grey black NA fema… femin…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema… femin…
#> 8 Tion Med… 206 80 none grey black NA male mascu…
#> 9 BB8 NA NA none none black NA none mascu…
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
filter(starwars, hair_color == "none" | eye_color == "black")
#> # A tibble: 39 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 2 Greedo 173 74 NA green black 44 male mascu…
#> 3 IG-88 200 140 none metal red 15 none mascu…
#> 4 Bossk 190 113 none green red 53 male mascu…
#> 5 Lobot 175 79 none light blue 37 male mascu…
#> 6 Ackbar 180 83 none brown mot… orange 41 male mascu…
#> 7 Nien Nu… 160 68 none grey black NA male mascu…
#> 8 Nute Gu… 191 90 none mottled g… red NA male mascu…
#> 9 Jar Jar… 196 66 none orange orange 52 male mascu…
#> 10 Roos Ta… 224 82 none grey orange NA male mascu…
#> # ℹ 29 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# Multiple comma separated expressions are combined using `&`
starwars |> filter(hair_color == "none", eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male mascu…
#> 2 Gasgano 122 NA none white, bl… black NA male mascu…
#> 3 Kit Fisto 196 87 none green black NA male mascu…
#> 4 Plo Koon 188 80 none orange black 22 male mascu…
#> 5 Lama Su 229 88 none grey black NA male mascu…
#> 6 Taun We 213 NA none grey black NA fema… femin…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema… femin…
#> 8 Tion Med… 206 80 none grey black NA male mascu…
#> 9 BB8 NA NA none none black NA none mascu…
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# To combine comma separated expressions using `|` instead, use `when_any()`
starwars |> filter(when_any(hair_color == "none", eye_color == "black"))
#> # A tibble: 39 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 2 Greedo 173 74 NA green black 44 male mascu…
#> 3 IG-88 200 140 none metal red 15 none mascu…
#> 4 Bossk 190 113 none green red 53 male mascu…
#> 5 Lobot 175 79 none light blue 37 male mascu…
#> 6 Ackbar 180 83 none brown mot… orange 41 male mascu…
#> 7 Nien Nu… 160 68 none grey black NA male mascu…
#> 8 Nute Gu… 191 90 none mottled g… red NA male mascu…
#> 9 Jar Jar… 196 66 none orange orange 52 male mascu…
#> 10 Roos Ta… 224 82 none grey orange NA male mascu…
#> # ℹ 29 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# Filtering out to drop rows
filter_out(starwars, hair_color == "none")
#> # A tibble: 49 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
#> 2 C-3PO 167 75 NA gold yellow 112 none mascu…
#> 3 R2-D2 96 32 NA white, bl… red 33 none mascu…
#> 4 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 5 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 6 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 7 R5-D4 97 32 NA white, red red NA none mascu…
#> 8 Biggs D… 183 84 black light brown 24 male mascu…
#> 9 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> 10 Anakin … 188 84 blond fair blue 41.9 male mascu…
#> # ℹ 39 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# When filtering out, it can be useful to first interactively filter for the
# rows you want to drop, just to double check that you've written the
# conditions correctly. Then, just change `filter()` to `filter_out()`.
filter(starwars, mass > 1000, eye_color == "orange")
#> # A tibble: 1 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Jabba De… 175 1358 NA green-tan… orange 600 herm… mascu…
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
filter_out(starwars, mass > 1000, eye_color == "orange")
#> # A tibble: 86 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
#> 2 C-3PO 167 75 NA gold yellow 112 none mascu…
#> 3 R2-D2 96 32 NA white, bl… red 33 none mascu…
#> 4 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 5 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 8 R5-D4 97 32 NA white, red red NA none mascu…
#> 9 Biggs D… 183 84 black light brown 24 male mascu…
#> 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> # ℹ 76 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following keeps rows where `mass` is greater than the
# global average:
starwars |> filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 10 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 2 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 3 Chewbac… 228 112 brown unknown blue 200 male mascu…
#> 4 Jabba D… 175 1358 NA green-tan… orange 600 herm… mascu…
#> 5 Jek Ton… 180 110 brown fair blue NA NA NA
#> 6 IG-88 200 140 none metal red 15 none mascu…
#> 7 Bossk 190 113 none green red 53 male mascu…
#> 8 Dexter … 198 102 none brown yellow NA male mascu…
#> 9 Grievous 216 159 none brown, wh… green, y… NA male mascu…
#> 10 Tarfful 234 136 brown brown blue NA male mascu…
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# Whereas this keeps rows with `mass` greater than the per `gender`
# average:
starwars |> filter(mass > mean(mass, na.rm = TRUE), .by = gender)
#> # A tibble: 15 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth … 202 136 none white yellow 41.9 male mascu…
#> 2 Owen L… 178 120 brown, gr… light blue 52 male mascu…
#> 3 Beru W… 165 75 brown light blue 47 fema… femin…
#> 4 Chewba… 228 112 brown unknown blue 200 male mascu…
#> 5 Jabba … 175 1358 NA green-tan… orange 600 herm… mascu…
#> 6 Jek To… 180 110 brown fair blue NA NA NA
#> 7 IG-88 200 140 none metal red 15 none mascu…
#> 8 Bossk 190 113 none green red 53 male mascu…
#> 9 Ayla S… 178 55 none blue hazel 48 fema… femin…
#> 10 Gregar… 185 85 black dark brown NA NA NA
#> 11 Lumina… 170 56.2 black yellow blue 58 fema… femin…
#> 12 Zam We… 168 55 blonde fair, gre… yellow NA fema… femin…
#> 13 Shaak … 178 57 none red, blue… black NA fema… femin…
#> 14 Grievo… 216 159 none brown, wh… green, y… NA male mascu…
#> 15 Tarfful 234 136 brown brown blue NA male mascu…
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# If you find yourself trying to use a `filter()` to drop rows, then
# you should consider if switching to `filter_out()` can simplify your
# conditions. For example, to drop blond individuals, you might try:
starwars |> filter(hair_color != "blond")
#> # A tibble: 79 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 2 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 3 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 4 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 5 Biggs D… 183 84 black light brown 24 male mascu…
#> 6 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> 7 Wilhuff… 180 NA auburn, g… fair blue 64 male mascu…
#> 8 Chewbac… 228 112 brown unknown blue 200 male mascu…
#> 9 Han Solo 180 80 brown fair brown 29 male mascu…
#> 10 Wedge A… 170 77 brown fair hazel 21 male mascu…
#> # ℹ 69 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# But this also drops rows with an `NA` hair color! To retain those:
starwars |> filter(hair_color != "blond" | is.na(hair_color))
#> # A tibble: 84 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 C-3PO 167 75 NA gold yellow 112 none mascu…
#> 2 R2-D2 96 32 NA white, bl… red 33 none mascu…
#> 3 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 4 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 5 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 6 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 7 R5-D4 97 32 NA white, red red NA none mascu…
#> 8 Biggs D… 183 84 black light brown 24 male mascu…
#> 9 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> 10 Wilhuff… 180 NA auburn, g… fair blue 64 male mascu…
#> # ℹ 74 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# But explicit `NA` handling like this can quickly get unwieldy, especially
# with multiple conditions. Since your intent was to specify rows to drop
# rather than rows to keep, use `filter_out()`. This also removes the need
# for any explicit `NA` handling.
starwars |> filter_out(hair_color == "blond")
#> # A tibble: 84 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 C-3PO 167 75 NA gold yellow 112 none mascu…
#> 2 R2-D2 96 32 NA white, bl… red 33 none mascu…
#> 3 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 4 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 5 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 6 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 7 R5-D4 97 32 NA white, red red NA none mascu…
#> 8 Biggs D… 183 84 black light brown 24 male mascu…
#> 9 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> 10 Wilhuff… 180 NA auburn, g… fair blue 64 male mascu…
#> # ℹ 74 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# To refer to column names that are stored as strings, use the `.data`
# pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars |>
filter(
.data[[vars[[1]]]] > cond[[1]],
.data[[vars[[2]]]] > cond[[2]]
)
#> # A tibble: 21 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 2 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 3 Biggs D… 183 84 black light brown 24 male mascu…
#> 4 Anakin … 188 84 blond fair blue 41.9 male mascu…
#> 5 Chewbac… 228 112 brown unknown blue 200 male mascu…
#> 6 Jabba D… 175 1358 NA green-tan… orange 600 herm… mascu…
#> 7 Jek Ton… 180 110 brown fair blue NA NA NA
#> 8 IG-88 200 140 none metal red 15 none mascu…
#> 9 Bossk 190 113 none green red 53 male mascu…
#> 10 Ackbar 180 83 none brown mot… orange 41 male mascu…
#> # ℹ 11 more rows
#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
# Learn more in ?rlang::args_data_masking
