fastLm.RdfastLm estimates the linear model using one of several methods
implemented using the Eigen linear algebra library.
fastLmPure(X, y, method = 0L)
fastLm(X, ...)
# Default S3 method
fastLm(X, y, method = 0L, ...)
# S3 method for class 'formula'
fastLm(formula, data = list(), method = 0L, ...)the response vector
a model matrix
an object of class "formula" (or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
in the ‘Details’ section of the documentation for lm.
an optional data frame, list or environment (or object
coercible by as.data.frame to a data frame) containing
the variables in the model. If not found in data, the
variables are taken from environment(formula),
typically the environment from which lm is called.
an integer scalar with value 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the Jacobi singular value decomposition (SVD) and 5 for a method based on the eigenvalue-eigenvector decomposition of \(\mathbf{X}^\prime\mathbf{X}\). Default is zero.
not used
Linear models should be estimated using the lm function. In
some cases, lm.fit may be appropriate.
The fastLmPure function provides a reference use case of the Eigen
C++ template library via the wrapper functions in the RcppEigen package.
The fastLm function provides a more standard implementation of
a linear model fit, offering both a default and a formula interface as
well as print, summary and predict methods.
Internally the fastLm function, by default, uses a QR
decomposition with column pivots, which is a rank-revealing
decomposition, so that it can handle rank-deficient cases
effectively. Other methods for determining least squares solutions
are available according to the value of the method argument.
An example of the type of situation requiring extra care in checking for rank deficiency is a two-way layout with missing cells (see the examples section). These cases require a special pivoting scheme of “pivot only on (apparent) rank deficiency” which is not part of conventional linear algebra software.
fastLmPure returns a list with several components:
a vector of coefficients
a vector of the standard errors of the coefficient estimates
a scalar denoting the computed rank of the model matrix
a scalar denoting the degrees of freedom in the model
the vector of residuals
a numeric scalar - the root mean square for residuals
the vector of fitted value
fastLm returns a richer object which also includes the
call argument similar to the lm or
rlm functions..
Douglas Bates and Dirk Eddelbuettel (2013). Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package. Journal of Statistical Software, 52(5), 1-24. URL http://www.jstatsoft.org/v52/i05/.
data(trees, package="datasets")
mm <- cbind(1, log(trees$Girth)) # model matrix
y <- log(trees$Volume) # response
## bare-bones direct interface
flm <- fastLmPure(mm, y)
print(flm)
#> $coefficients
#> [1] -2.353325 2.199970
#>
#> $se
#> [1] 0.23066284 0.08983455
#>
#> $rank
#> [1] 2
#>
#> $df.residual
#> [1] 29
#>
#> $residuals
#> [1] 0.029770334 -0.048343313 -0.108675736 -0.022348590 0.072716847
#> [6] 0.099013629 -0.174701646 -0.020550966 0.176067980 0.029107001
#> [11] 0.205184486 0.043971170 0.062839654 0.001010516 -0.163706638
#> [16] -0.172405965 0.247962548 -0.029135115 -0.158376200 -0.205999241
#> [21] 0.088437490 -0.027410923 0.062096059 -0.100836986 -0.035300119
#> [26] 0.096435946 0.076549210 0.072452205 -0.063824212 -0.073580387
#> [31] 0.041580960
#>
#> $s
#> [1] 0.1149578
#>
#> $fitted.values
#> [1] 2.302374 2.380487 2.431063 2.819630 2.861140 2.881605 2.921973 2.921973
#> [9] 2.941882 2.961613 2.981168 3.000551 3.000551 3.057697 3.113395 3.272498
#> [17] 3.272498 3.339678 3.404867 3.420867 3.452522 3.483728 3.529722 3.746287
#> [25] 3.787154 3.918144 3.943431 3.993150 4.005406 4.005406 4.302224
#>
## standard R interface for formula or data returning object of class fastLm
flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
summary(flmmod)
#>
#> Call:
#> fastLm.formula(formula = log(Volume) ~ log(Girth), data = trees)
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -0.206000 -0.068700 0.001011 0.072580 0.248000
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.353325 0.230663 -10.202 4.18e-11 ***
#> log(Girth) 2.199970 0.089835 24.489 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.115 on 29 degrees of freedom
#> Multiple R-squared: 0.9539, Adjusted R-squared: 0.9523
## case where non-rank-revealing methods break down
dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
xtabs(~ f2 + f1, dd) # one missing cell
#> f1
#> f2 A B C D
#> a 2 0 2 2
#> b 2 2 2 2
#> c 2 2 2 2
mm <- model.matrix(~ f1 * f2, dd)
kappa(mm) # large, indicating rank deficiency
#> [1] 9.23756e+16
set.seed(1)
dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency
#>
#> Call:
#> lm(formula = y ~ f1 * f2, data = dd)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.12155 -0.04702 0.00000 0.04702 0.12155
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.97786 0.05816 16.81 3.41e-09 ***
#> f1B 12.03807 0.08226 146.35 < 2e-16 ***
#> f1C 3.11722 0.08226 37.90 5.22e-13 ***
#> f1D 4.06852 0.08226 49.46 2.83e-14 ***
#> f2b 5.06012 0.08226 61.52 2.59e-15 ***
#> f2c 5.99759 0.08226 72.91 4.01e-16 ***
#> f1B:f2b -3.01476 0.11633 -25.92 3.27e-11 ***
#> f1C:f2b 7.70300 0.11633 66.22 1.16e-15 ***
#> f1D:f2b 8.96425 0.11633 77.06 < 2e-16 ***
#> f1B:f2c NA NA NA NA
#> f1C:f2c 10.96133 0.11633 94.23 < 2e-16 ***
#> f1D:f2c 12.04108 0.11633 103.51 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.08226 on 11 degrees of freedom
#> Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
#> F-statistic: 1.855e+04 on 10 and 11 DF, p-value: < 2.2e-16
#>
try(summary(fastLm(y ~ f1 * f2, dd))) # also detects rank deficiency
#>
#> Call:
#> fastLm.formula(formula = y ~ f1 * f2, data = dd)
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -0.12150 -0.04702 0.00000 0.04702 0.12150
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.977859 0.058165 16.812 3.413e-09 ***
#> f1B 12.038068 0.082258 146.346 < 2.2e-16 ***
#> f1C 3.117222 0.082258 37.896 5.221e-13 ***
#> f1D 4.068523 0.082258 49.461 2.833e-14 ***
#> f2b 5.060123 0.082258 61.516 2.593e-15 ***
#> f2c 5.997592 0.082258 72.912 4.015e-16 ***
#> f1B:f2b -3.014763 0.116330 -25.916 3.266e-11 ***
#> f1C:f2b 7.702999 0.116330 66.217 1.156e-15 ***
#> f1D:f2b 8.964251 0.116330 77.059 < 2.2e-16 ***
#> f1B:f2c NA NA NA NA
#> f1C:f2c 10.961326 0.116330 94.226 < 2.2e-16 ***
#> f1D:f2c 12.041081 0.116330 103.508 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.08226 on 11 degrees of freedom
#> Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999