R/assign_colnames.r
assign_colnames.RdMany tables in Word documents are in twisted formats where there may be
labels or other oddities mixed in that make it difficult to work with the
underlying data. This function makes it easy to identify a particular row
in a scraped data.frame as the one containing column names and
have it become the column names, removing it and (optionally) all of the
rows before it (since that's usually what needs to be done).
assign_colnames(dat, row, remove = TRUE, remove_previous = remove)can be any data.frame but is intended for use with
ones retuned by this package
numeric value indicating the row number that is to become the column names
remove row specified by row after making it
the column names? (Default: TRUE)
remove any rows preceding row? (Default:
TRUE but will be assigned whatever is given for
remove).
data.frame
# a "real" Word doc
real_world <- read_docx(system.file("examples/realworld.docx", package="docxtractr"))
docx_tbl_count(real_world)
#> [1] 8
# get all the tables
tbls <- docx_extract_all_tbls(real_world)
# make table 1 better
assign_colnames(tbls[[1]], 2)
#> # A tibble: 7 × 9
#> Country Birthrate `Death Rate` `Population Growth 2005` Population Growth 20…¹
#> <chr> <chr> <chr> <chr> <chr>
#> 1 USA 2.06 0.51% 0.92% -0.06%
#> 2 China 1.62 0.3% 0.6% -0.58%
#> 3 Egypt 2.83 0.41% 2.0% 1.32%
#> 4 India 2.35 0.34% 1.56% 0.76%
#> 5 Italy 1.28 0.72% 0.35% -1.33%
#> 6 Mexico 2.43 0.25% 1.41% 0.96%
#> 7 Nigeria 4.78 0.26% 2.46% 3.58%
#> # ℹ abbreviated name: ¹`Population Growth 2050`
#> # ℹ 4 more variables: `Relative place in Transition` <chr>,
#> # `Social Factors 1` <chr>, `Social Factors 2` <chr>,
#> # `Social Factors 3` <chr>
# make table 5 better
assign_colnames(tbls[[5]], 2)
#> # A tibble: 3 × 6
#> Nigeria Default Prediction `+ 5 years` `+15 years` `-5 years`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Birth rate 4.78 Goes Down 4.76 4.72 4.79
#> 2 Death rate 0.36% Stay the Same 0.42% 0.52% 0.3%
#> 3 Population growth 3.58% Goes Down 3.02% 2.32% 4.38%