Importing csv files into ff data.frames

Function read.table.ffdf reads separated flat files into ffdf objects, very much like (and using) read.table. It can also work with any convenience wrappers like read.csv and provides its own convenience wrapper (e.g. read.csv.ffdf) for R's usual wrappers.

read.table.ffdf(
  x = NULL
, file, fileEncoding = ""
, nrows = -1, first.rows = NULL, next.rows = NULL
, levels = NULL, appendLevels = TRUE
, FUN = "read.table", ...
, transFUN = NULL
, asffdf_args = list()
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE
)
read.csv.ffdf(...)
read.csv2.ffdf(...)
read.delim.ffdf(...)
read.delim2.ffdf(...)

Arguments

x

NULL or an optional ffdf object to which the read records are appended. If this is provided, it defines crucial features that are otherwise determnined during the 'first' chunk of reading: vmodes, colnames, colClasses, sequence of predefined levels.

file

the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). Tilde-expansion is performed where supported.

Alternatively, file can be a readable text-mode connection (which will be opened for reading if necessary, and if so closed (and hence destroyed) at the end of the function call).

fileEncoding

character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See file.

nrows

integer: the maximum number of rows to read in (includes first.rows in case a 'first' chunk is read) Negative and other invalid values are ignored.

first.rows

integer: number of rows to be read in the first chunk, see details. Default is the value given at next.rows or 1e3 otherwise. Ignored if x is given.

next.rows

integer: number of rows to be read in further chunks, see details. By default calculated as BATCHBYTES %/% sum(.rambytes[vmode(x)])

levels

NULL or an optional list, each element named with col.names of factor columns specifies the levels Ignored if x is given.

appendLevels

logical. A vector of permissions to expand levels for factor columns. Recycled as necessary, or if the logical vector is named, unspecified values are taken to be TRUE. Ignored during processing of the 'first' chunk

FUN

character: name of a function that is called for reading each chunk, see read.table, read.csv, etc.

...

further arguments, passed to FUN in read.table.ffdf, or passed to read.table.ffdf in the convenience wrappers

transFUN

NULL or a function that is called on each data.frame chunk after reading with FUN and before further processing (for filtering, transformations etc.)

asffdf_args

further arguments passed to as.ffdf when converting the data.frame of the first chunk to ffdf. Ignored if x is given.

BATCHBYTES

integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. Default getOption("ffbatchbytes").

VERBOSE

logical: TRUE to verbose timings for each processed chunk (default FALSE)

Details

read.table.ffdf has been designed to read very large (many rows) separated flatfiles in row-chunks and store the result in a ffdf object on disk, but quickly accessible via ff techniques.
The first chunk is read with a default of 1000 rows, for subsequent chunks the number of rows is calculated to not require more RAM than getOption("ffbatchbytes"). The following could be indications to change the parameter first.rows:

set first.rows=-1 to read the complete file in one go (requires enough RAM)
set first.rows to a smaller number if the pre-allocation of RAM for the first chunk with parameter nrows in read.table is too large, i.e. with many columns on machine with little RAM.
set first.rows to a larger number if you expect better factor level ordering (factor levels are sorted in the first chunk, but not at subsequent chunks, however, factor level ordering can be fixed later, see below).

By default the ffdf object is created on the fly at the end of reading the 'first' chunk, see argument first.rows. The creation of the ffdf object is done via as.ffdf and can be finetuned by passing argument asffdf_args. Even more control is possible by passing in a ffdf object as argument x to which the read records are appended.
read.table.ffdf has been designed to behave as much like read.table as possible. Hoever, note the following differences:

Arguments 'colClasses' and 'col.names' are now enforced also during 'next.rows' chunks. For example giving colClasses=NA will force that no colClasses are derived from the first.rows respective from the ffdf object in parameter x.
colClass 'ordered' is allowed and will create an ordered factor
character vector are not supported, character data must be read as one of the following colClasses: 'Date', 'POSIXct', 'factor, 'ordered'. By default character columns are read as factors. Accordingly arguments 'as.is' and 'stringsAsFactors' are not allowed.
the sequence of levels.ff from chunked reading can depend on chunk size: by default new levels found on a chunk are appended to the levels found in previous chunks, no attempt is made to sort and recode the levels during chunked processing, levels can be sorted and recoded most efficiently after all records have been read using sortLevels.
the default for argument 'comment.char' is "" even for those FUN that have a different default. However, explicit specification of 'comment.char' will have priority.

Note

Note that using the 'skip' argument still requires to read the file from beginning in order to count the lines to be skipped. If you first read part of the file in order to understand its structure and then want to continue, a more efficient solution that using 'skip' is opening a file connection and pass that to argument 'file'. read.table.ffdf does the same in order to skip efficiently over previously read chunks.

Value

An ffdf object. If created during the 'first' chunk pass, it will have one physical component per virtual column.

Author

Jens Oehlschlägel, Christophe Dutang

Examples

 message("create some csv data on disk")
#> create some csv data on disk
 x <- data.frame(
   log=rep(c(FALSE, TRUE), length.out=26)
 , int=1:26
 , dbl=1:26 + 0.1
 , fac=factor(letters)
 , ord=ordered(LETTERS)
 , dct=Sys.time()+1:26
 , dat=seq(as.Date("1910/1/1"), length.out=26, by=1)
 , stringsAsFactors = TRUE
 )
 x <- x[c(13:1, 13:1),]
 csvfile <- tempPathFile(path=getOption("fftempdir"), extension="csv")
 write.csv(x, file=csvfile, row.names=FALSE)
 
 cat("Simply read csv with header\n")
#> Simply read csv with header
 y <- read.csv(file=csvfile, header=TRUE)
 y
#>      log int  dbl fac ord                        dct        dat
#> 1  FALSE  13 13.1   m   M 2025-03-17 22:36:08.015253 1910-01-13
#> 2   TRUE  12 12.1   l   L 2025-03-17 22:36:07.015253 1910-01-12
#> 3  FALSE  11 11.1   k   K 2025-03-17 22:36:06.015253 1910-01-11
#> 4   TRUE  10 10.1   j   J 2025-03-17 22:36:05.015253 1910-01-10
#> 5  FALSE   9  9.1   i   I 2025-03-17 22:36:04.015253 1910-01-09
#> 6   TRUE   8  8.1   h   H 2025-03-17 22:36:03.015253 1910-01-08
#> 7  FALSE   7  7.1   g   G 2025-03-17 22:36:02.015253 1910-01-07
#> 8   TRUE   6  6.1   f   F 2025-03-17 22:36:01.015253 1910-01-06
#> 9  FALSE   5  5.1   e   E 2025-03-17 22:36:00.015253 1910-01-05
#> 10  TRUE   4  4.1   d   D 2025-03-17 22:35:59.015253 1910-01-04
#> 11 FALSE   3  3.1   c   C 2025-03-17 22:35:58.015253 1910-01-03
#> 12  TRUE   2  2.1   b   B 2025-03-17 22:35:57.015253 1910-01-02
#> 13 FALSE   1  1.1   a   A 2025-03-17 22:35:56.015253 1910-01-01
#> 14 FALSE  13 13.1   m   M 2025-03-17 22:36:08.015253 1910-01-13
#> 15  TRUE  12 12.1   l   L 2025-03-17 22:36:07.015253 1910-01-12
#> 16 FALSE  11 11.1   k   K 2025-03-17 22:36:06.015253 1910-01-11
#> 17  TRUE  10 10.1   j   J 2025-03-17 22:36:05.015253 1910-01-10
#> 18 FALSE   9  9.1   i   I 2025-03-17 22:36:04.015253 1910-01-09
#> 19  TRUE   8  8.1   h   H 2025-03-17 22:36:03.015253 1910-01-08
#> 20 FALSE   7  7.1   g   G 2025-03-17 22:36:02.015253 1910-01-07
#> 21  TRUE   6  6.1   f   F 2025-03-17 22:36:01.015253 1910-01-06
#> 22 FALSE   5  5.1   e   E 2025-03-17 22:36:00.015253 1910-01-05
#> 23  TRUE   4  4.1   d   D 2025-03-17 22:35:59.015253 1910-01-04
#> 24 FALSE   3  3.1   c   C 2025-03-17 22:35:58.015253 1910-01-03
#> 25  TRUE   2  2.1   b   B 2025-03-17 22:35:57.015253 1910-01-02
#> 26 FALSE   1  1.1   a   A 2025-03-17 22:35:56.015253 1910-01-01
 cat("Read csv with header\n")
#> Read csv with header
 ffy <- read.csv.ffdf(file=csvfile, header=TRUE)
 ffy
#> ffdf (all open) dim=c(26,7), dimorder=c(1,2) row.names=NULL
#> ffdf virtual mapping
#>     PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
#> log          log      logical       logical FALSE           FALSE
#> int          int      integer       integer FALSE           FALSE
#> dbl          dbl       double        double FALSE           FALSE
#> fac          fac      integer       integer FALSE           FALSE
#> ord          ord      integer       integer FALSE           FALSE
#> dct          dct      integer       integer FALSE           FALSE
#> dat          dat      integer       integer FALSE           FALSE
#>     PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
#> log            FALSE                 1                1               1
#> int            FALSE                 2                1               1
#> dbl            FALSE                 3                1               1
#> fac            FALSE                 4                1               1
#> ord            FALSE                 5                1               1
#> dct            FALSE                 6                1               1
#> dat            FALSE                 7                1               1
#>     PhysicalIsOpen
#> log           TRUE
#> int           TRUE
#> dbl           TRUE
#> fac           TRUE
#> ord           TRUE
#> dct           TRUE
#> dat           TRUE
#> ffdf data
#>                           log                        int
#> 1  FALSE                      13                        
#> 2  TRUE                       12                        
#> 3  FALSE                      11                        
#> 4  TRUE                       10                        
#> 5  FALSE                       9                        
#> 6  TRUE                        8                        
#> 7  FALSE                       7                        
#> 8  TRUE                        6                        
#> :                           :                          :
#> 19 TRUE                        8                        
#> 20 FALSE                       7                        
#> 21 TRUE                        6                        
#> 22 FALSE                       5                        
#> 23 TRUE                        4                        
#> 24 FALSE                       3                        
#> 25 TRUE                        2                        
#> 26 FALSE                       1                        
#>                           dbl                        fac
#> 1  13.1                       m                         
#> 2  12.1                       l                         
#> 3  11.1                       k                         
#> 4  10.1                       j                         
#> 5   9.1                       i                         
#> 6   8.1                       h                         
#> 7   7.1                       g                         
#> 8   6.1                       f                         
#> :                           :                          :
#> 19  8.1                       h                         
#> 20  7.1                       g                         
#> 21  6.1                       f                         
#> 22  5.1                       e                         
#> 23  4.1                       d                         
#> 24  3.1                       c                         
#> 25  2.1                       b                         
#> 26  1.1                       a                         
#>                           ord                        dct
#> 1  M                          2025-03-17 22:36:08.015253
#> 2  L                          2025-03-17 22:36:07.015253
#> 3  K                          2025-03-17 22:36:06.015253
#> 4  J                          2025-03-17 22:36:05.015253
#> 5  I                          2025-03-17 22:36:04.015253
#> 6  H                          2025-03-17 22:36:03.015253
#> 7  G                          2025-03-17 22:36:02.015253
#> 8  F                          2025-03-17 22:36:01.015253
#> :                           :                          :
#> 19 H                          2025-03-17 22:36:03.015253
#> 20 G                          2025-03-17 22:36:02.015253
#> 21 F                          2025-03-17 22:36:01.015253
#> 22 E                          2025-03-17 22:36:00.015253
#> 23 D                          2025-03-17 22:35:59.015253
#> 24 C                          2025-03-17 22:35:58.015253
#> 25 B                          2025-03-17 22:35:57.015253
#> 26 A                          2025-03-17 22:35:56.015253
#>                           dat
#> 1  1910-01-13                
#> 2  1910-01-12                
#> 3  1910-01-11                
#> 4  1910-01-10                
#> 5  1910-01-09                
#> 6  1910-01-08                
#> 7  1910-01-07                
#> 8  1910-01-06                
#> :                           :
#> 19 1910-01-08                
#> 20 1910-01-07                
#> 21 1910-01-06                
#> 22 1910-01-05                
#> 23 1910-01-04                
#> 24 1910-01-03                
#> 25 1910-01-02                
#> 26 1910-01-01                
 sapply(ffy[,], class)
#>       log       int       dbl       fac       ord       dct       dat 
#> "logical" "integer" "numeric"  "factor"  "factor"  "factor"  "factor" 
 
 message("reading with colClasses (an ordered factor wont'work in read.csv)")
#> reading with colClasses (an ordered factor wont'work in read.csv)
 try(read.csv(file=csvfile, header=TRUE, colClasses=c(ord="ordered")
 , stringsAsFactors = TRUE))
#> Error in methods::as(data[[i]], colClasses[i]) : 
#>   no method or default for coercing “character” to “ordered”
 # TODO could fix this with the following two commands (Gabor Grothendieck) 
 # but does not know what bad side-effects this could have
 #setOldClass("ordered")
 #setAs("character", "ordered", function(from) ordered(from))
 y <- read.csv(file=csvfile, header=TRUE, colClasses=c(dct="POSIXct", dat="Date")
 , stringsAsFactors = TRUE)
 ffy <- read.csv.ffdf(
   file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 )
 rbind(
   ram_class = sapply(y, function(x)paste(class(x), collapse = ","))
 , ff_class = sapply(ffy[,], function(x)paste(class(x), collapse = ","))
 , ff_vmode = vmode(ffy)
 )
#>           log       int       dbl       fac       ord             
#> ram_class "logical" "integer" "numeric" "factor"  "factor"        
#> ff_class  "logical" "integer" "numeric" "factor"  "ordered,factor"
#> ff_vmode  "logical" "integer" "double"  "integer" "integer"       
#>           dct              dat     
#> ram_class "POSIXct,POSIXt" "Date"  
#> ff_class  "POSIXct,POSIXt" "Date"  
#> ff_vmode  "double"         "double"
 
 message("NOTE that reading in chunks can change the sequence of levels and thus the coding")
#> NOTE that reading in chunks can change the sequence of levels and thus the coding
 message("(Sorting levels during chunked reading can be too expensive)")
#> (Sorting levels during chunked reading can be too expensive)
 levels(ffy$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
 ffy <- read.csv.ffdf(
   file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 , first.rows=6
 , next.rows=10
 , VERBOSE=TRUE
 )
#> read.table.ffdf 1..6 (6)  csv-read=0.001sec ffdf-write=0.006sec
#> read.table.ffdf 7..16 (10)  csv-read=0sec ffdf-write=0.003sec
#> read.table.ffdf 17..26 (10)  csv-read=0sec ffdf-write=0.003sec
#> read.table.ffdf 27..26 (0)  csv-read=0sec
#>  csv-read=0.001sec  ffdf-write=0.012sec  TOTAL=0.013sec
 levels(ffy$fac[])
#>  [1] "h" "i" "j" "k" "l" "m" "a" "b" "c" "d" "e" "f" "g"
 
 
 message("If we don't know the levels we can sort then after reading")
#> If we don't know the levels we can sort then after reading
 message("(Will rewrite all factor codes)")
#> (Will rewrite all factor codes)
 message("NOTE that you MUST assign the return value of sortLevels()")
#> NOTE that you MUST assign the return value of sortLevels()
 ffy <- sortLevels(ffy)
 levels(ffy$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
 
 message("If we KNOW the levels we can fix levels upfront")
#> If we KNOW the levels we can fix levels upfront
 ffy <- read.csv.ffdf(
   file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 , first.rows=6
 , next.rows=10
 , levels=list(fac=letters, ord=LETTERS)
 )
 levels(ffy$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
 
 message("Or we inspect a sufficiently large chunk of data and use those")
#> Or we inspect a sufficiently large chunk of data and use those
 table(ffy$fac[], exclude=NULL)
#> 
#> a b c d e f g h i j k l m n o p q r s t u v w x y z 
#> 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 
 ffy <- read.csv.ffdf(
   file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 , nrows=13
 , VERBOSE=TRUE
 )
#> read.table.ffdf 1..13 (13)  csv-read=0sec ffdf-write=0.006sec
#>  csv-read=0sec  ffdf-write=0.006sec  TOTAL=0.006sec
 message("append the rest to ffy")
#> append the rest to ffy
 ffy <- read.csv.ffdf(
   x=ffy
 , file=csvfile
 , header=FALSE
 , skip=1 + nrow(ffy)
 , VERBOSE=TRUE
 )
#> read.table.ffdf 1..13 (13)  csv-read=0.002sec ffdf-write=0.002sec
#>  csv-read=0.002sec  ffdf-write=0.002sec  TOTAL=0.004sec
 table(ffy$fac[], exclude=NULL)
#> 
#> a b c d e f g h i j k l m 
#> 2 2 2 2 2 2 2 2 2 2 2 2 2 
 
 message("We can turn unexpected factor levels to NA, say we only allowed a:l")
#> We can turn unexpected factor levels to NA, say we only allowed a:l
 ffy <- read.csv.ffdf(
   file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 , levels=list(fac=letters[1:12], ord=LETTERS[1:12])
 , appendLevels=FALSE
 )
 sapply(colnames(ffy), function(i)sum(is.na(ffy[[i]][])))
#> log int dbl fac ord dct dat 
#>   0   0   0   2   2   0   0 

 message("let's store some columns more efficient")
#> let's store some columns more efficient
 sum(.ffbytes[vmode(ffy)])
#> [1] 36.25
 ffy$log <- clone(ffy$log, vmode="boolean")
 ffy$fac <- clone(ffy$fac, vmode="byte")
 ffy$ord <- clone(ffy$ord, vmode="byte")
 sum(.ffbytes[vmode(ffy)])
#> [1] 30.125
 
 message("let's make a template with zero rows")
#> let's make a template with zero rows
 ffx <- clone(ffy)  
 nrow(ffx) <- 0
   
 message("reading with template and colClasses")
#> reading with template and colClasses
 ffy <- read.csv.ffdf(
   x=ffx
 , file=csvfile
 , header=TRUE
 , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
 , next.rows = 12
 , VERBOSE = TRUE
 )
#> read.table.ffdf 1..12 (12)  csv-read=0sec ffdf-write=0.003sec
#> read.table.ffdf 13..24 (12)  csv-read=0sec ffdf-write=0.003sec
#> read.table.ffdf 25..26 (2)  csv-read=0sec ffdf-write=0.003sec
#>  csv-read=0sec  ffdf-write=0.009sec  TOTAL=0.009sec
 rbind(
   ff_class = sapply(ffy[,], function(x)paste(class(x), collapse = ","))
 , ff_vmode = vmode(ffy)
 )
#>          log       int       dbl       fac      ord             
#> ff_class "logical" "integer" "numeric" "factor" "ordered,factor"
#> ff_vmode "boolean" "integer" "double"  "byte"   "byte"          
#>          dct              dat     
#> ff_class "POSIXct,POSIXt" "Date"  
#> ff_vmode "double"         "double"
 levels(ffx$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l"
 levels(ffy$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
 
 message("reading with template without colClasses")
#> reading with template without colClasses
 ffy <- read.csv.ffdf(
   x=ffx
   , file=csvfile
   , header=TRUE
   , next.rows = 12
   , VERBOSE = TRUE
 )
#> read.table.ffdf 1..12 (12)  csv-read=0.001sec ffdf-write=0.003sec
#> read.table.ffdf 13..24 (12)  csv-read=0sec ffdf-write=0.003sec
#> read.table.ffdf 25..26 (2)  csv-read=0sec ffdf-write=0.003sec
#>  csv-read=0.001sec  ffdf-write=0.009sec  TOTAL=0.01sec
 rbind(
   ff_class = sapply(ffy[,], function(x)paste(class(x), collapse = ","))
 , ff_vmode = vmode(ffy)
 )
#>          log       int       dbl       fac      ord             
#> ff_class "logical" "integer" "numeric" "factor" "ordered,factor"
#> ff_vmode "boolean" "integer" "double"  "byte"   "byte"          
#>          dct              dat     
#> ff_class "POSIXct,POSIXt" "Date"  
#> ff_vmode "double"         "double"
 levels(ffx$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l"
 levels(ffy$fac[])
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
 

 message("We can fine-tune the creation of the ffdf")
#> We can fine-tune the creation of the ffdf
 message("- let's create the ff files outside of fftempdir")
#> - let's create the ff files outside of fftempdir
 message("- let's reduce required disk space and thus file.system cache RAM")
#> - let's reduce required disk space and thus file.system cache RAM
 message("By default we had record size 36.25")
#> By default we had record size 36.25
 ffy <- read.csv.ffdf(
   file=csvfile
   , header=TRUE
   , colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
   , asffdf_args=list(
     vmode = c(
         log="boolean"
       , int="byte"
       , dbl="single"
       , fac="nibble"  # no NAs
       , ord="nibble"  # no NAs
       , dct="single"
       , dat="single"
     )
     , col_args=list(pattern = "./csv")  # create in getwd() with prefix csv
   )
 )
 vmode(ffy)
#>       log       int       dbl       fac       ord       dct       dat 
#> "boolean"    "byte"  "single"  "nibble"  "nibble"  "single"  "single" 
 message("This recordsize is more than 50% reduced")
#> This recordsize is more than 50% reduced
 sum(.ffbytes[vmode(ffy)]) / 36.25
#> [1] 0.3896552
 
 message("Don't forget to wrap-up files that are not in fftempdir")
#> Don't forget to wrap-up files that are not in fftempdir
 delete(ffy); rm(ffy)
#> [1] TRUE
 message("It's a good habit to also wrap-up temporary stuff (or at least know how this is done)")
#> It's a good habit to also wrap-up temporary stuff (or at least know how this is done)
 rm(ffx); gc()
#>           used (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells 1173157 62.7    1994352 106.6  1994352 106.6
#> Vcells 2190177 16.8    8790397  67.1  8790397  67.1

   
 fwffile <- tempfile()
 
 cat(file=fwffile, "123456", "987654", sep="\n")
 x <- read.fwf(fwffile, widths=c(1,2,3), stringsAsFactors = TRUE)    #> 1 23 456 \ 9 87 654
 y <- read.table.ffdf(file=fwffile, FUN="read.fwf", widths=c(1,2,3))
 stopifnot(identical(x, y[,]))
 x <- read.fwf(fwffile, widths=c(1,-2,3), stringsAsFactors = TRUE)   #> 1 456 \ 9 654
 y <- read.table.ffdf(file=fwffile, FUN="read.fwf", widths=c(1,-2,3))
 stopifnot(identical(x, y[,]))
 unlink(fwffile)
 
 cat(file=fwffile, "123", "987654", sep="\n")
 x <- read.fwf(fwffile, widths=c(1,0, 2,3), stringsAsFactors = TRUE)    #> 1 NA 23 NA \ 9 NA 87 654
 y <- read.table.ffdf(file=fwffile, FUN="read.fwf", widths=c(1,0, 2,3))
 stopifnot(identical(x, y[,]))
 unlink(fwffile)
 
 cat(file=fwffile, "123456", "987654", sep="\n")
 x <- read.fwf(fwffile, widths=list(c(1,0, 2,3), c(2,2,2))
 , stringsAsFactors = TRUE) #> 1 NA 23 456 98 76 54
 y <- read.table.ffdf(file=fwffile, FUN="read.fwf", widths=list(c(1,0, 2,3), c(2,2,2)))
 stopifnot(identical(x, y[,]))
 
 unlink(fwffile)

#> Warning: unknown factor values mapped to NA

    unlink(csvfile)