A basic computing engine for sparse linear least squares regression.

Note that the exact interface (arguments, return value) currently is experimental, and is bound to change. Use at your own risk!

lm.fit.sparse(x, y, w = NULL, offset = NULL,
              method = c("qr", "cholesky"),
              tol = 1e-7, singular.ok = TRUE, order = NULL,
              transpose = FALSE)

Arguments

x

sparse design matrix of dimension n * p, i.e., an R object of a class extending dsparseMatrix; typically the result of sparse.model.matrix.

y

vector of observations of length n, or a matrix with n rows.

w

vector of weights (length n) to be used in the fitting process. Weighted least squares is used with weights w, i.e., sum(w * e^2) is minimized.

Not yet implemented !

offset

numeric of length n). This can be used to specify an a priori known component to be included in the linear predictor during fitting.

method

a character string specifying the (factorization) method. Currently, "qr" or "cholesky".

tol

[for back-compatibility only; unused:] tolerance for the qr decomposition. Default is 1e-7.

singular.ok

[for back-compatibility only; unused:] logical. If FALSE, a singular model is an error.

order

integer or NULL, for method == "qr", will determine how the fill-reducing ordering (aka permutation) for the “symbolic” part is determined (in cs_amd()), with the options

0:

natural,

1:

Chol,

2:

LU, and

3:

QR,

where 3 is the default.

transpose

logical; if true, use the transposed matrix t(x) instead of x.

Value

Either a single numeric vector or a list of four numeric vectors.

See also

glm4 is an alternative (much) more general fitting function.

sparse.model.matrix from the Matrix package; the non-sparse function in standard R's package stats: lm.fit().

Examples

dd <- expand.grid(a = as.factor(1:3),
                  b = as.factor(1:4),
                  c = as.factor(1:2),
                  d= as.factor(1:8))
n <- nrow(dd <- dd[rep(seq_len(nrow(dd)), each = 10), ])
set.seed(17)
dM <- cbind(dd, x = round(rnorm(n), 1))
## randomly drop some
n <- nrow(dM <- dM[- sample(n, 50),])
dM <- within(dM, { A <- c(2,5,10)[a]
                   B <- c(-10,-1, 3:4)[b]
                   C <- c(-8,8)[c]
                   D <- c(10*(-5:-2), 20*c(0, 3:5))[d]
   Y <- A + B + A*B + C + D + A*D + C*x + rnorm(n)/10
   wts <- sample(1:10, n, replace=TRUE)
   rm(A,B,C,D)
})
str(dM) # 1870 x 7
#> 'data.frame':	1870 obs. of  7 variables:
#>  $ a  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ b  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ c  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ d  : Factor w/ 8 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ x  : num  -1 -0.1 -0.2 -0.8 0.8 -0.2 1 1.7 0.3 0.4 ...
#>  $ wts: int  6 1 7 7 5 9 5 4 5 8 ...
#>  $ Y  : num  -178 -185 -184 -180 -192 ...

X <- Matrix::sparse.model.matrix( ~ (a+b+c+d)^2 + c*x, data = dM)
dim(X) # 1870 x 69
#> [1] 1870   69
X[1:10, 1:20]
#> 10 x 20 sparse Matrix of class "dgCMatrix"
#>   [[ suppressing 20 column names ‘(Intercept)’, ‘a2’, ‘a3’ ... ]]
#>                                               
#> 1   1 . . . . . . . . . . . . . -1.0 . . . . .
#> 1.1 1 . . . . . . . . . . . . . -0.1 . . . . .
#> 1.2 1 . . . . . . . . . . . . . -0.2 . . . . .
#> 1.3 1 . . . . . . . . . . . . . -0.8 . . . . .
#> 1.4 1 . . . . . . . . . . . . .  0.8 . . . . .
#> 1.5 1 . . . . . . . . . . . . . -0.2 . . . . .
#> 1.6 1 . . . . . . . . . . . . .  1.0 . . . . .
#> 1.7 1 . . . . . . . . . . . . .  1.7 . . . . .
#> 1.8 1 . . . . . . . . . . . . .  0.3 . . . . .
#> 1.9 1 . . . . . . . . . . . . .  0.4 . . . . .

## For now, use  'MatrixModels:::'  --- TODO : export once interface is clear!

Xd <- as(X,"matrix")
system.time(fmDense <- lm.fit(Xd, y = dM[,"Y"])) # {base} functionality
#>    user  system elapsed 
#>   0.003   0.000   0.003 
system.time( r1 <- MatrixModels:::lm.fit.sparse(X, y = dM[,"Y"]) ) # *is* faster
#>    user  system elapsed 
#>   0.003   0.000   0.003 
stopifnot(all.equal(r1, unname(fmDense$coeff), tolerance = 1e-12))
system.time(
     r2 <- MatrixModels:::lm.fit.sparse(X, y = dM[,"Y"], method = "chol") )
#>    user  system elapsed 
#>   0.002   0.000   0.002 
stopifnot(all.equal(r1, r2$coef, tolerance = 1e-12),
          all.equal(fmDense$residuals, r2$residuals, tolerance=1e-9)
         )
## with weights:
system.time(fmD.w <- with(dM, lm.wfit(Xd, Y, w = wts)))
#>    user  system elapsed 
#>   0.004   0.000   0.004 
system.time(fm.w1 <- with(dM, MatrixModels:::lm.fit.sparse(X, Y, w = wts)))
#>    user  system elapsed 
#>   0.003   0.000   0.003 
system.time(fm.w2 <- with(dM, MatrixModels:::lm.fit.sparse(X, Y, w = wts,
                                                     method = "chol") ))
#>    user  system elapsed 
#>   0.002   0.000   0.002 
stopifnot(all.equal(fm.w1, unname(fmD.w$coeff), tolerance = 1e-12),
          all.equal(fm.w2$coef, fm.w1, tolerance = 1e-12),
          all.equal(fmD.w$residuals, fm.w2$residuals, tolerance=1e-9)
          )