Efficient rbind of data frames, even if the column names don't match

smartbind(..., list, fill = NA, sep = ":", verbose = FALSE)

Arguments

...

Data frames to combine

list

List containing data frames to combine

fill

Value to use when 'filling' missing columns. Defaults to NA.

sep

Character string used to separate column names when pasting them together.

verbose

Logical flag indicating whether to display processing messages. Defaults to FALSE.

Value

The returned data frame will contain:

columns

all columns present in any provided data frame

rows

a set of rows from each provided data frame, with values in columns not present in the given data frame filled with missing (NA) values.

The data type of columns will be preserved, as long as all data frames with a given column name agree on the data type of that column. If the data frames disagree, the column will be converted into a character strings. The user will need to coerce such character columns into an appropriate type.

See also

Author

Gregory R. Warnes greg@warnes.net

Examples



df1 <- data.frame(A = 1:10, B = LETTERS[1:10], C = rnorm(10))
df2 <- data.frame(A = 11:20, D = rnorm(10), E = letters[1:10])

# rbind would fail
if (FALSE) { # \dontrun{
rbind(df1, df2)
# Error in match.names(clabs, names(xi)) : names do not match previous
# names:
#   D, E
} # }
# but smartbind combines them, appropriately creating NA entries
smartbind(df1, df2)
#>       A    B          C           D    E
#> 1:1   1    A -1.6176047          NA <NA>
#> 1:2   2    B -0.7237319          NA <NA>
#> 1:3   3    C  0.3067410          NA <NA>
#> 1:4   4    D  0.2255962          NA <NA>
#> 1:5   5    E  0.9357160          NA <NA>
#> 1:6   6    F  0.4424049          NA <NA>
#> 1:7   7    G  0.4545190          NA <NA>
#> 1:8   8    H -0.9620740          NA <NA>
#> 1:9   9    I -1.1324652          NA <NA>
#> 1:10 10    J -0.6003270          NA <NA>
#> 2:1  11 <NA>         NA -1.77506105    a
#> 2:2  12 <NA>         NA -0.09171419    b
#> 2:3  13 <NA>         NA -0.23262573    c
#> 2:4  14 <NA>         NA -0.51310927    d
#> 2:5  15 <NA>         NA  0.18558859    e
#> 2:6  16 <NA>         NA -1.43162311    f
#> 2:7  17 <NA>         NA -1.89864479    g
#> 2:8  18 <NA>         NA  0.57123723    h
#> 2:9  19 <NA>         NA -0.97562605    i
#> 2:10 20 <NA>         NA -0.87647620    j

# specify fill=0 to put 0 into the missing row entries
smartbind(df1, df2, fill = 0)
#>       A B          C           D E
#> 1:1   1 A -1.6176047  0.00000000 0
#> 1:2   2 B -0.7237319  0.00000000 0
#> 1:3   3 C  0.3067410  0.00000000 0
#> 1:4   4 D  0.2255962  0.00000000 0
#> 1:5   5 E  0.9357160  0.00000000 0
#> 1:6   6 F  0.4424049  0.00000000 0
#> 1:7   7 G  0.4545190  0.00000000 0
#> 1:8   8 H -0.9620740  0.00000000 0
#> 1:9   9 I -1.1324652  0.00000000 0
#> 1:10 10 J -0.6003270  0.00000000 0
#> 2:1  11 0  0.0000000 -1.77506105 a
#> 2:2  12 0  0.0000000 -0.09171419 b
#> 2:3  13 0  0.0000000 -0.23262573 c
#> 2:4  14 0  0.0000000 -0.51310927 d
#> 2:5  15 0  0.0000000  0.18558859 e
#> 2:6  16 0  0.0000000 -1.43162311 f
#> 2:7  17 0  0.0000000 -1.89864479 g
#> 2:8  18 0  0.0000000  0.57123723 h
#> 2:9  19 0  0.0000000 -0.97562605 i
#> 2:10 20 0  0.0000000 -0.87647620 j
#> Warning: Column class mismatch for 'C'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'B'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'E'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'E'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'B'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'C'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'B'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'E'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'E'. Converting column to class 'character'.
#> Warning: Column class mismatch for 'B'. Converting column to class 'character'.