Format a Data Frame or Matrix for LaTeX or HTML

format.df does appropriate rounding and decimal alignment, and outputs a character matrix containing the formatted data. If x is a data.frame, then do each component separately. If x is a matrix, but not a data.frame, make it a data.frame with individual components for the columns. If a component x$x is a matrix, then do all columns the same.

Usage

format.df(x, digits, dec=NULL, rdec=NULL, cdec=NULL,
          numeric.dollar=!dcolumn, na.blank=FALSE, na.dot=FALSE,
          blank.dot=FALSE, col.just=NULL, cdot=FALSE,
          dcolumn=FALSE, matrix.sep=' ', scientific=c(-4,4),
          math.row.names=FALSE, already.math.row.names=FALSE,
          math.col.names=FALSE, already.math.col.names=FALSE,
          double.slash=FALSE, format.Date="%m/%d/%Y",
          format.POSIXt="%m/%d/%Y %H:%M:%OS", ...)

Arguments

x: a matrix (usually numeric) or data frame
digits: causes all values in the table to be formatted to digits significant digits. dec is usually preferred.
dec: If dec is a scalar, all elements of the matrix will be rounded to dec decimal places to the right of the decimal. dec can also be a matrix whose elements correspond to x, for customized rounding of each element. A matrix dec must have number of columns equal to number of columns of input x. A scalar dec is expanded to a vector cdec with number of items equal to number of columns of input x.
rdec: a vector specifying the number of decimal places to the right for each row (cdec is more commonly used than rdec) A vector rdec must have number of items equal to number of rows of input x. rdec is expanded to matrix dec.
cdec: a vector specifying the number of decimal places for each column. The vector must have number of items equal to number of columns or components of input x.
cdot: Set to TRUE to use centered dots rather than ordinary periods in numbers. The output uses a syntax appropriate for latex.
na.blank: Set to TRUE to use blanks rather than NA for missing values. This usually looks better in latex.
dcolumn: Set to TRUE to use David Carlisle's dcolumn style for decimal alignment in latex. Default is FALSE. You will probably want to use dcolumn if you use rdec, as a column may then contain varying number of places to the right of the decimal. dcolumn can line up all such numbers on the decimal point, with integer values right justified at the decimal point location of numbers that actually contain decimal places. When you use dcolumn = TRUE, numeric.dollar is set by default to FALSE. When you use dcolumn = TRUE, the object attribute "style" set to dcolumn as the latex usepackage must reference [dcolumn]. The three files dcolumn.sty, newarray.sty, and array.sty will need to be in a directory in your TEXINPUTS path. When you use dcolumn=TRUE, numeric.dollar should be set to FALSE.
numeric.dollar: logical, default !dcolumn. Set to TRUE to place dollar signs around numeric values when dcolumn = FALSE. This assures that latex will use minus signs rather than hyphens to indicate negative numbers. Set to FALSE when dcolumn = TRUE, as dcolumn.sty automatically uses minus signs.
math.row.names: logical, set true to place dollar signs around the row names.
already.math.row.names: set to TRUE to prevent any math mode changes to row names
math.col.names: logical, set true to place dollar signs around the column names.
already.math.col.names: set to TRUE to prevent any math mode changes to column names
na.dot: Set to TRUE to use periods rather than NA for missing numeric values. This works with the SAS convention that periods indicate missing values.
blank.dot: Set to TRUE to use periods rather than blanks for missing character values. This works with the SAS convention that periods indicate missing values.
col.just: Input vector col.just must have number of columns equal to number of columns of the output matrix. When NULL, the default, the col.just attribute of the result is set to l for character columns and to r for numeric columns. The user can override the default by an argument vector whose length is equal to the number of columns of the result matrix. When format.df is called by latex.default, the col.just is used as the cols argument to the tabular environment and the letters l, r, and c are valid values. When format.df is called by SAS, the col.just is used to determine whether a \$ is needed on the input line of the sysin file, and the letters l and r are valid values. You can pass specifications other than l,r,c in col.just, e.g., "p{3in}" to get paragraph-formatted columns from latex().
matrix.sep: When x is a data frame containing a matrix, so that new column names are constructed from the name of the matrix object and the names of the individual columns of the matrix, matrix.sep specifies the character to use to separate object names from individual column names.
scientific: specifies ranges of exponents (or a logical vector) specifying values not to convert to scientific notation. See format.default for details.
double.slash: should escaping backslashes be themselves escaped.
format.Date: String used to format objects of the Date class.
format.POSIXt: String used to format objects of the POSIXt class.
...: other arguments are accepted and passed to format.default. For latexVerbatim these arguments are passed to the print function.

Value

a character matrix with character images of properly rounded x. Matrix components of input x are now just sets of columns of character matrix. Object attribute"col.just" repeats the value of the argument col.just when provided, otherwise, it includes the recommended justification for columns of output. See the discussion of the argument col.just. The default justification is l for characters and factors, r for numeric. When dcolumn==TRUE, numerics will have . as the justification character.

Author

Frank E. Harrell, Jr.,
Department of Biostatistics,
Vanderbilt University,
fh@fharrell.com

Richard M. Heiberger,
Department of Statistics,
Temple University, Philadelphia, PA.
rmh@temple.edu

Examples