Given a document read with read_docx and a table to extract (optionally
indicating whether there was a header or not and if cell whitepace trimming is
desired) extract the contents of the table to a data.frame.
docx_extract_tbl(
docx,
tbl_number = 1,
header = TRUE,
preserve = FALSE,
trim = TRUE
)docx object read with read_docx
which table to extract (defaults to 1)
assume first row of table is a header row? (default; TRUE)
preserve line breaks within a cell? Default: FALSE. NOTE: This overrides trim.
trim leading/trailing whitespace (if any) in cells? (default: TRUE)
data.frame
docx_extract_all, docx_extract_tbl,
assign_colnames
doc3 <- read_docx(system.file("examples/data3.docx", package="docxtractr"))
docx_extract_tbl(doc3, 3)
#> # A tibble: 6 × 2
#> Foo Bar
#> <chr> <chr>
#> 1 Aa Bb
#> 2 Dd Ee
#> 3 Gg Hh
#> 4 1 2
#> 5 Zz Jj
#> 6 Tt ii
intracell_whitespace <- read_docx(system.file("examples/preserve.docx", package="docxtractr"))
docx_extract_tbl(intracell_whitespace, 2, preserve=FALSE)
#> # A tibble: 2 × 4
#> X Kite Lemur Madagascar
#> <chr> <chr> <chr> <chr>
#> 1 Nanny Open Port Quarter
#> 2 Rain Sand Television Unicorn
docx_extract_tbl(intracell_whitespace, 2, preserve=TRUE)
#> # A tibble: 2 × 4
#> X Kite Lemur Madagascar
#> <chr> <chr> <chr> <chr>
#> 1 Nanny Open Port Quarter
#> 2 Rain Sand Television Unicorn