Given a document read with read_docx and a table to extract (optionally indicating whether there was a header or not and if cell whitepace trimming is desired) extract the contents of the table to a data.frame.

docx_extract_tbl(
  docx,
  tbl_number = 1,
  header = TRUE,
  preserve = FALSE,
  trim = TRUE
)

Arguments

docx

docx object read with read_docx

tbl_number

which table to extract (defaults to 1)

header

assume first row of table is a header row? (default; TRUE)

preserve

preserve line breaks within a cell? Default: FALSE. NOTE: This overrides trim.

trim

trim leading/trailing whitespace (if any) in cells? (default: TRUE)

Value

data.frame

See also

Examples

doc3 <- read_docx(system.file("examples/data3.docx", package="docxtractr"))
docx_extract_tbl(doc3, 3)
#> # A tibble: 6 × 2
#>   Foo   Bar  
#>   <chr> <chr>
#> 1 Aa    Bb   
#> 2 Dd    Ee   
#> 3 Gg    Hh   
#> 4 1     2    
#> 5 Zz    Jj   
#> 6 Tt    ii   

intracell_whitespace <- read_docx(system.file("examples/preserve.docx", package="docxtractr"))
docx_extract_tbl(intracell_whitespace, 2, preserve=FALSE)
#> # A tibble: 2 × 4
#>   X     Kite  Lemur      Madagascar
#>   <chr> <chr> <chr>      <chr>     
#> 1 Nanny Open  Port       Quarter   
#> 2 Rain  Sand  Television Unicorn   
docx_extract_tbl(intracell_whitespace, 2, preserve=TRUE)
#> # A tibble: 2 × 4
#>   X     Kite  Lemur      Madagascar
#>   <chr> <chr> <chr>      <chr>     
#> 1 Nanny Open  Port       Quarter   
#> 2 Rain  Sand  Television Unicorn