R/add-cols.R
add_miss_cluster.RdA way to extract the cluster of missingness that a group belongs to.
For example, if you use vis_miss(airquality, cluster = TRUE), you can
see some clustering in the data, but you do not have a way to identify
the cluster. Future work will incorporate the seriation package to
allow for better control over the clustering from the user.
add_miss_cluster(data, cluster_method = "mcquitty", n_clusters = 2)a dataframe
character vector of the agglomeration method to use,
the default is "mcquitty". Options are taken from stats::hclust
helpfile, and options include: "ward.D", "ward.D2", "single", "complete",
"average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or
"centroid" (= UPGMC).
numeric the number of clusters you expect. Defaults to 2.
add_miss_cluster(airquality)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 1
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 2
#> # ℹ 143 more rows
add_miss_cluster(airquality, n_clusters = 3)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 1
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 3
#> # ℹ 143 more rows
add_miss_cluster(airquality, cluster_method = "ward.D", n_clusters = 3)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 2
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 3
#> # ℹ 143 more rows