library(lmeresampler)

Overview

The lmeresampler package provides functionality to perform various bootstrap processes for nested linear mixed-effects (LMEs) models including parametric, residual, cases, CGR, and REB bootstraps. This is particularly useful for models fit with relatively small data sets, where bootstrapping may yield more robust parameter estimates. This vignette seeks to inform users of:

how to bootstrap hierarchical linear models with lmeresampler,
what kinds of bootstraps are available and how they operate,
how to use the bootstrap output, and
extensions for both users and future developers of the package.

The Data

Examples of lmeresampler functions will use hierarchical linear models fit using jsp728, a data set containing information about 728 students (level-1) from 50 primary (elementary) schools in inner London (level-2) that were part of the Junior School Project (JSP). The data was collected by the University of Bristol Centre for Multilevel Modeling. For more information about the variables, see ?jsp728.

We will work with vcmodA for lme4, and vcmodB for nlme. Each of these two-level models predicts mathAge11 (the student’s math score at age 11) and contains the same 3 fixed effects: mathAge8 (the student’s math score at age 8), gender (the student’s gender), and class (the student’s father’s social class, a factor with two levels: manual and nonmanual). They also both have a random intercept for school. The models are as follows:

library(lme4)
vcmodA <- lmer(mathAge11 ~ mathAge8 + gender + class + (1 | school), data = jsp728)

library(nlme)
vcmodB <- lme(mathAge11 ~ mathAge8 + gender + class, random = ~1|school, data = jsp728)

The Call

In order to perform the bootstrap, the user must call the bootstrap() function. The function contains several important parameters for the user to specify when making the call:

model: the LME model to be bootstrapped.
.f: a function that will compute the statistics of interest (for example, to have the bootstrap return the fixed effects estimates, specify .f = fixef).
type: a character string specifying what type of bootstrap should be executed. Possible values are "parametric", "residual", "case", cgr, or reb. More information about each bootstrap type may be found in the following sections.
B: the number of bootstrap resamples to be performed.
resample: a logical vector specifying whether each level of the data set should be resampled in the cases bootstrap. The levels should be specified from the highest level (largest cluster) of the hierarchy to the lowest (observation-level). For example, for students within a school, specify the school level first, then the student level.
reb_type: an integer value specifying what kind of REB (random effects block) bootstrap should be performed. More on what the differences between the REB bootstrap types are may be found in section 5 of this vignette.

1. The Parametric Bootstrap

The parametric bootstrap simulates bootstrap samples from the estimated distribution functions. That is, error terms and random effects are simulated from their estimated normal distributions and are combined into bootstrap samples via the fitted model equation.

Examples

# let's set .f = fixef to specify that we want only the fixed effects bootstrapped

# lme4
lmer_par_boot <- bootstrap(vcmodA, .f = fixef, type = "parametric", B = 100)

# nlme
lme_par_boot  <- bootstrap(vcmodB, .f = fixef, type = "parametric", B = 100)

Let’s also take a look at the structure of the lmeresamp objects returned by a bootstrap call.

`bootstrap()` output

Regardless of the type of bootstrap performed, the output of bootstrap() will be an lmeresamp object that is formatted as a list of 10 objects. Notice that some of the objects come from the call itself, while others are calculated for the user using the bootstrapped values. The 10 list objects are as follows:

observed: the estimated values for the model parameters,
model: the fit model, formatted as the default list object,
.f: the function call,
replicates: a B \(\times \ p\) data frame of bootstrap values for each of the model parameters, \(p\),
stats: a data frame containing the observed, rep.mean (bootstrap mean), se (bootstrap standard error), and bias values for each model parameter,
R: the number of bootstrap resamples performed,
data: the data with which the model was fit,
seed: a vector of randomly generated seeds that are used by the bootstrap,
type: the type of bootstrap executed, and
call: the call to bootstrap() that the user made.

Let’s take a look at an example using the lmer_par_boot object we created in the above section.

# everything the lmeresamp object (and each of its sub-objects) contains
str(lmer_par_boot)
#> List of 10
#>  $ observed  : Named num [1:4] 14.158 0.639 -0.357 0.72
#>   ..- attr(*, "names")= chr [1:4] "(Intercept)" "mathAge8" "genderM" "classnonmanual"
#>  $ model     :Formal class 'lmerMod' [package "lme4"] with 13 slots
#>   .. ..@ resp   :Reference class 'lmerResp' [package "lme4"] with 9 fields
#>   .. .. ..$ Ptr    :<externalptr> 
#>   .. .. ..$ mu     : num [1:728] 35 23.8 31.5 27.1 37.3 ...
#>   .. .. ..$ offset : num [1:728] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..$ sqrtXwt: num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ sqrtrwt: num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ weights: num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ wtres  : num [1:728] 3.972 -12.804 0.529 -0.08 -1.302 ...
#>   .. .. ..$ y      : num [1:728] 39 11 32 27 36 33 30 17 33 20 ...
#>   .. .. ..$ REML   : int 4
#>   .. .. ..and 28 methods, of which 14 are  possibly relevant:
#>   .. .. ..  allInfo, copy#envRefClass, initialize, initialize#lmResp,
#>   .. .. ..  initializePtr, initializePtr#lmResp, objective, ptr, ptr#lmResp,
#>   .. .. ..  setOffset, setResp, setWeights, updateMu, wrss
#>   .. ..@ Gp     : int [1:2] 0 48
#>   .. ..@ call   : language lmer(formula = mathAge11 ~ mathAge8 + gender + class + (1 | school), data = jsp728)
#>   .. ..@ frame  :'data.frame':   728 obs. of  5 variables:
#>   .. .. ..$ mathAge11: num [1:728] 39 11 32 27 36 33 30 17 33 20 ...
#>   .. .. ..$ mathAge8 : num [1:728] 36 19 31 23 39 25 27 14 30 19 ...
#>   .. .. ..$ gender   : Factor w/ 2 levels "F","M": 2 1 1 1 1 2 2 2 2 2 ...
#>   .. .. ..$ class    : Factor w/ 2 levels "manual","nonmanual": 2 1 1 2 2 1 1 1 1 1 ...
#>   .. .. ..$ school   : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language mathAge11 ~ mathAge8 + gender + class + (1 + school)
#>   .. .. .. .. ..- attr(*, "variables")= language list(mathAge11, mathAge8, gender, class, school)
#>   .. .. .. .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
#>   .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. .. .. ..$ : chr [1:5] "mathAge11" "mathAge8" "gender" "class" ...
#>   .. .. .. .. .. .. ..$ : chr [1:4] "mathAge8" "gender" "class" "school"
#>   .. .. .. .. ..- attr(*, "term.labels")= chr [1:4] "mathAge8" "gender" "class" "school"
#>   .. .. .. .. ..- attr(*, "order")= int [1:4] 1 1 1 1
#>   .. .. .. .. ..- attr(*, "intercept")= int 1
#>   .. .. .. .. ..- attr(*, "response")= int 1
#>   .. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. .. .. .. ..- attr(*, "predvars")= language list(mathAge11, mathAge8, gender, class, school)
#>   .. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "factor" "factor" ...
#>   .. .. .. .. .. ..- attr(*, "names")= chr [1:5] "mathAge11" "mathAge8" "gender" "class" ...
#>   .. .. .. .. ..- attr(*, "predvars.fixed")= language list(mathAge11, mathAge8, gender, class)
#>   .. .. .. .. ..- attr(*, "varnames.fixed")= chr [1:4] "mathAge11" "mathAge8" "gender" "class"
#>   .. .. .. .. ..- attr(*, "predvars.random")= language list(mathAge11, school)
#>   .. .. ..- attr(*, "formula")=Class 'formula'  language mathAge11 ~ mathAge8 + gender + class + (1 | school)
#>   .. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. ..@ flist  :List of 1
#>   .. .. ..$ school: Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..- attr(*, "assign")= int 1
#>   .. ..@ cnms   :List of 1
#>   .. .. ..$ school: chr "(Intercept)"
#>   .. ..@ lower  : num 0
#>   .. ..@ theta  : num 0.41
#>   .. ..@ beta   : num [1:4] 14.158 0.639 -0.357 0.72
#>   .. ..@ u      : num [1:48] -6.083 -0.343 0.717 -1.071 -1.409 ...
#>   .. ..@ devcomp:List of 2
#>   .. .. ..$ cmp : Named num [1:10] 56.9 25.7 13664.4 618.7 14283.1 ...
#>   .. .. .. ..- attr(*, "names")= chr [1:10] "ldL2" "ldRX2" "wrss" "ussq" ...
#>   .. .. ..$ dims: Named int [1:12] 728 728 4 724 48 1 1 1 0 4 ...
#>   .. .. .. ..- attr(*, "names")= chr [1:12] "N" "n" "p" "nmp" ...
#>   .. ..@ pp     :Reference class 'merPredD' [package "lme4"] with 18 fields
#>   .. .. ..$ Lambdat:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. .. .. ..@ i       : int [1:48] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. .. ..@ p       : int [1:49] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. .. ..@ Dim     : int [1:2] 48 48
#>   .. .. .. .. ..@ Dimnames:List of 2
#>   .. .. .. .. .. ..$ : NULL
#>   .. .. .. .. .. ..$ : NULL
#>   .. .. .. .. ..@ x       : num [1:48] 0.41 0.41 0.41 0.41 0.41 ...
#>   .. .. .. .. ..@ factors : list()
#>   .. .. ..$ LamtUt :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. .. .. ..@ i       : int [1:728] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. .. .. ..@ p       : int [1:729] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. .. ..@ Dim     : int [1:2] 48 728
#>   .. .. .. .. ..@ Dimnames:List of 2
#>   .. .. .. .. .. ..$ : NULL
#>   .. .. .. .. .. ..$ : chr [1:728] "1" "2" "3" "4" ...
#>   .. .. .. .. ..@ x       : num [1:728] 0.41 0.41 0.41 0.41 0.41 ...
#>   .. .. .. .. ..@ factors : list()
#>   .. .. ..$ Lind   : int [1:48] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ Ptr    :<externalptr> 
#>   .. .. ..$ RZX    : num [1:48, 1:4] 3.8 1.51 2.83 3.13 4.27 ...
#>   .. .. ..$ Ut     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. .. .. ..@ i       : int [1:728] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. .. .. ..@ p       : int [1:729] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. .. ..@ Dim     : int [1:2] 48 728
#>   .. .. .. .. ..@ Dimnames:List of 2
#>   .. .. .. .. .. ..$ : chr [1:48] "1" "2" "3" "4" ...
#>   .. .. .. .. .. ..$ : chr [1:728] "1" "2" "3" "4" ...
#>   .. .. .. .. ..@ x       : num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. .. .. ..@ factors : list()
#>   .. .. ..$ Utr    : num [1:48] 196.3 60.6 162.7 158.2 304 ...
#>   .. .. ..$ V      : num [1:728, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ VtV    : num [1:4, 1:4] 728 0 0 0 18911 ...
#>   .. .. ..$ Vtr    : num [1:4] 22364 603862 10413 7542
#>   .. .. ..$ X      : num [1:728, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:728] "1" "2" "3" "4" ...
#>   .. .. .. .. ..$ : chr [1:4] "(Intercept)" "mathAge8" "genderM" "classnonmanual"
#>   .. .. .. ..- attr(*, "assign")= int [1:4] 0 1 2 3
#>   .. .. .. ..- attr(*, "contrasts")=List of 2
#>   .. .. .. .. ..$ gender: chr "contr.treatment"
#>   .. .. .. .. ..$ class : chr "contr.treatment"
#>   .. .. .. ..- attr(*, "msgScaleX")= chr(0) 
#>   .. .. ..$ Xwts   : num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..$ Zt     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. .. .. ..@ i       : int [1:728] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. .. .. ..@ p       : int [1:729] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. .. ..@ Dim     : int [1:2] 48 728
#>   .. .. .. .. ..@ Dimnames:List of 2
#>   .. .. .. .. .. ..$ : chr [1:48] "1" "2" "3" "4" ...
#>   .. .. .. .. .. ..$ : chr [1:728] "1" "2" "3" "4" ...
#>   .. .. .. .. ..@ x       : num [1:728] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. .. .. ..@ factors : list()
#>   .. .. ..$ beta0  : num [1:4] 0 0 0 0
#>   .. .. ..$ delb   : num [1:4] 14.158 0.639 -0.357 0.72
#>   .. .. ..$ delu   : num [1:48] -6.083 -0.343 0.717 -1.071 -1.409 ...
#>   .. .. ..$ theta  : num 0.41
#>   .. .. ..$ u0     : num [1:48] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..and 45 methods, of which 31 are  possibly relevant:
#>   .. .. ..  b, beta, CcNumer, copy#envRefClass, initialize, initializePtr,
#>   .. .. ..  installPars, L, ldL2, ldRX2, linPred, P, ptr, RX, RXdiag, RXi,
#>   .. .. ..  setBeta0, setDelb, setDelu, setTheta, setZt, solve, solveU, sqrL,
#>   .. .. ..  u, unsc, updateDecomp, updateL, updateLamtUt, updateRes, updateXwts
#>   .. ..@ optinfo:List of 7
#>   .. .. ..$ optimizer: chr "nloptwrap"
#>   .. .. ..$ control  :List of 1
#>   .. .. .. ..$ print_level: num 0
#>   .. .. ..$ derivs   :List of 2
#>   .. .. .. ..$ gradient: num -4.8e-06
#>   .. .. .. ..$ Hessian : num [1, 1] 513
#>   .. .. ..$ conv     :List of 2
#>   .. .. .. ..$ opt : num 0
#>   .. .. .. ..$ lme4: list()
#>   .. .. ..$ feval    : int 15
#>   .. .. ..$ warnings : list()
#>   .. .. ..$ val      : num 0.41
#>  $ .f        :function (object, ...)  
#>  $ replicates:'data.frame':  100 obs. of  4 variables:
#>   ..$ (Intercept)   : num [1:100] 13.8 13.3 12.7 12.9 13.5 ...
#>   ..$ mathAge8      : num [1:100] 0.639 0.675 0.683 0.681 0.666 ...
#>   ..$ genderM       : num [1:100] -0.3944 -0.5 0.0164 -0.3863 -0.0449 ...
#>   ..$ classnonmanual: num [1:100] 1.279 0.777 0.256 0.713 0.99 ...
#>  $ stats     :'data.frame':  4 obs. of  4 variables:
#>   ..$ observed: num [1:4] 14.158 0.639 -0.357 0.72
#>   ..$ rep.mean: num [1:4] 14.113 0.641 -0.319 0.671
#>   ..$ se      : num [1:4] 0.7824 0.0256 0.39 0.3848
#>   ..$ bias    : num [1:4] -0.0446 0.00254 0.03796 -0.0489
#>  $ R         : num 100
#>  $ data      :'data.frame':  728 obs. of  5 variables:
#>   ..$ mathAge11: num [1:728] 39 11 32 27 36 33 30 17 33 20 ...
#>   ..$ mathAge8 : num [1:728] 36 19 31 23 39 25 27 14 30 19 ...
#>   ..$ gender   : Factor w/ 2 levels "F","M": 2 1 1 1 1 2 2 2 2 2 ...
#>   ..$ class    : Factor w/ 2 levels "manual","nonmanual": 2 1 1 2 2 1 1 1 1 1 ...
#>   ..$ school   : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mathAge11 ~ mathAge8 + gender + class + (1 + school)
#>   .. .. ..- attr(*, "variables")= language list(mathAge11, mathAge8, gender, class, school)
#>   .. .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:5] "mathAge11" "mathAge8" "gender" "class" ...
#>   .. .. .. .. ..$ : chr [1:4] "mathAge8" "gender" "class" "school"
#>   .. .. ..- attr(*, "term.labels")= chr [1:4] "mathAge8" "gender" "class" "school"
#>   .. .. ..- attr(*, "order")= int [1:4] 1 1 1 1
#>   .. .. ..- attr(*, "intercept")= int 1
#>   .. .. ..- attr(*, "response")= int 1
#>   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. .. ..- attr(*, "predvars")= language list(mathAge11, mathAge8, gender, class, school)
#>   .. .. ..- attr(*, "dataClasses")= Named chr [1:5] "numeric" "numeric" "factor" "factor" ...
#>   .. .. .. ..- attr(*, "names")= chr [1:5] "mathAge11" "mathAge8" "gender" "class" ...
#>   .. .. ..- attr(*, "predvars.fixed")= language list(mathAge11, mathAge8, gender, class)
#>   .. .. ..- attr(*, "varnames.fixed")= chr [1:4] "mathAge11" "mathAge8" "gender" "class"
#>   .. .. ..- attr(*, "predvars.random")= language list(mathAge11, school)
#>   ..- attr(*, "formula")=Class 'formula'  language mathAge11 ~ mathAge8 + gender + class + (1 | school)
#>   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>  $ seed      : int [1:626] 10403 449 -1923197504 1596323498 -1092688843 1156305150 -61380703 -503890644 1216753221 -609109477 ...
#>  $ type      : chr "parametric"
#>  $ call      : language .bootstrap.completion(model = model, tstar = tstar, B = B, .f = .f, type = "parametric")
#>  - attr(*, "class")= chr "lmeresamp"
#>  - attr(*, "bootFail")= int 0

# an overview of those 10 list items
head(lmer_par_boot)
#> $observed
#>    (Intercept)       mathAge8        genderM classnonmanual 
#>     14.1577509      0.6388895     -0.3571922      0.7200815 
#> 
#> $model
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: mathAge11 ~ mathAge8 + gender + class + (1 | school)
#>    Data: jsp728
#> REML criterion at convergence: 4296.135
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  school   (Intercept) 1.820   
#>  Residual             4.442   
#> Number of obs: 728, groups:  school, 48
#> Fixed Effects:
#>    (Intercept)        mathAge8         genderM  classnonmanual  
#>        14.1578          0.6389         -0.3572          0.7201  
#> 
#> $.f
#> function (object, ...) 
#> UseMethod("fixef")
#> <bytecode: 0x7f9fb1f98568>
#> <environment: namespace:nlme>
#> 
#> $replicates
#>         (Intercept)  mathAge8       genderM classnonmanual
#> sim_1      13.76157 0.6387455 -0.3944286471      1.2789570
#> sim_2      13.34242 0.6748518 -0.5000416469      0.7767042
#> sim_3      12.70355 0.6834877  0.0164263454      0.2560703
#> sim_4      12.93456 0.6805159 -0.3863334950      0.7134389
#> sim_5      13.50698 0.6662828 -0.0448655416      0.9897759
#> sim_6      13.77010 0.6444399 -0.4025861377      0.6117421
#> sim_7      16.03412 0.6018908 -0.8928118988      0.5450426
#> sim_8      13.37014 0.6605798  0.2247945990      0.8514779
#> sim_9      14.15123 0.6300378 -0.4299901877      0.5920539
#> sim_10     14.44888 0.6148458 -0.3202545240      0.4705813
#> sim_11     15.29683 0.6433477 -1.3245043191     -0.1568052
#> sim_12     13.87867 0.6650450 -0.7061408740      0.8435955
#> sim_13     12.98049 0.6661074 -0.2520164867      0.1119747
#> sim_14     14.72815 0.6196007 -0.7127857305      0.5349420
#> sim_15     13.89288 0.6347395  0.1825056739      0.4387681
#> sim_16     14.20204 0.6388583  0.0921141276      0.1994774
#> sim_17     15.44717 0.5923083 -0.5527372398      0.3155766
#> sim_18     12.51770 0.7018793 -0.2744105800      0.4242059
#> sim_19     13.31035 0.6567489  0.1939469397      1.2561435
#> sim_20     13.33466 0.6647357 -0.3403812930      0.1043400
#> sim_21     13.24227 0.6603756  0.4828380643      0.3562036
#> sim_22     14.95551 0.6071914 -0.3776676234      0.7251923
#> sim_23     13.86692 0.6412240 -0.4582982515      1.0265082
#> sim_24     13.22877 0.6838896 -0.1150826124      0.4696920
#> sim_25     12.63188 0.6932576 -0.6036200929      0.5421396
#> sim_26     14.94581 0.6270136 -0.5378887334      0.5738781
#> sim_27     14.00255 0.6390788 -0.2840510000      0.6546131
#> sim_28     13.39372 0.6620475 -0.4384368832      0.2165526
#> sim_29     12.99400 0.6798563 -0.0762940315      0.6267686
#> sim_30     14.44387 0.6154566 -0.8438245120      1.1846777
#> sim_31     14.77000 0.6225777 -0.5076851746      0.3933997
#> sim_32     14.43698 0.6028937 -0.1385367974      1.2305023
#> sim_33     13.47098 0.6745828 -1.1725566073      0.6462707
#> sim_34     13.92943 0.6521601 -0.4327024064      0.1382355
#> sim_35     15.62564 0.5827177 -0.5467297367      0.7780729
#> sim_36     13.88289 0.6590721 -0.1163782236      0.1514344
#> sim_37     13.36823 0.6560279 -0.2234989236      0.3165113
#> sim_38     15.33316 0.6232568 -0.0403435019      0.5821787
#> sim_39     14.14970 0.6316424 -0.2478323732      1.1810055
#> sim_40     14.98462 0.5797491 -0.2390610008      1.8179198
#> sim_41     13.36252 0.6843399 -0.5552445745      1.0752751
#> sim_42     14.80976 0.6210266  0.0006871174      0.8680135
#> sim_43     14.14482 0.6465556 -0.2020442075      0.3594429
#> sim_44     14.60121 0.6189037  0.2716179328      0.7745453
#> sim_45     14.09710 0.6366292 -0.3518461777      0.6764171
#> sim_46     14.62768 0.6374843 -0.5542581909      0.5683921
#> sim_47     14.94079 0.6114965  0.5379038316      0.9522278
#> sim_48     14.27197 0.6369082 -0.4743738738      1.4164174
#> sim_49     13.49520 0.6656233 -0.2723929076      0.6369040
#> sim_50     12.64154 0.6737153  0.3021745559      0.4798928
#> sim_51     14.22409 0.6495284 -0.6116310102      0.7469973
#> sim_52     14.15708 0.6613507 -0.4539542416      0.9730830
#> sim_53     13.82596 0.6414617  0.0063657499      0.8738022
#> sim_54     14.26179 0.6485256 -0.7467311700      0.6295772
#> sim_55     14.43495 0.6547441  0.0944804516      0.7438410
#> sim_56     15.23001 0.5978098 -0.6473757074      1.3543772
#> sim_57     13.87296 0.6620468 -0.0302094712      0.5726899
#> sim_58     14.41792 0.6332597 -0.2654598716      0.1894347
#> sim_59     13.14848 0.6571935  0.3943597745      0.7286730
#> sim_60     13.82311 0.6690117 -0.3402738070      1.3882388
#> sim_61     15.14663 0.6198846 -0.8238580271      1.0597385
#> sim_62     15.33642 0.6111452 -0.7371465578      0.7897321
#> sim_63     16.67625 0.5914486 -1.3246598759      0.4526114
#> sim_64     14.08115 0.6343674  0.1099306990      0.8741189
#> sim_65     14.88034 0.6088945 -0.3010574902      0.3608213
#> sim_66     14.71747 0.6166515 -0.4504079662      0.8672247
#> sim_67     14.36585 0.6504350 -1.0121437740     -0.3243375
#> sim_68     14.39786 0.6112626 -0.2821486980      0.6655326
#> sim_69     14.02461 0.6396261  0.0157616867      0.6156224
#> sim_70     13.82712 0.6443069  0.3524883611      0.8773345
#> sim_71     13.66777 0.6488758 -0.4843472561      0.3609821
#> sim_72     14.46425 0.6204018 -0.3249085462      0.6079857
#> sim_73     14.08523 0.6138216 -0.0581964690      1.0700862
#> sim_74     14.01119 0.6604938 -0.3341229120     -0.1929079
#> sim_75     13.61072 0.6466036  0.1340679988      0.2690237
#> sim_76     13.54578 0.6495776 -0.1045815845      0.2278865
#> sim_77     14.63793 0.6290588 -0.1670955504      0.2984844
#> sim_78     13.31386 0.6603398 -0.1157988288      0.6088717
#> sim_79     14.69510 0.6303725 -0.7354101384      0.5507381
#> sim_80     14.72917 0.6135144  0.3981342731      0.5975397
#> sim_81     13.32810 0.6399431 -0.0149866564      0.9675338
#> sim_82     14.16615 0.6596951 -0.9593686773      0.1079214
#> sim_83     15.17007 0.6073054 -0.5701563291      1.1850358
#> sim_84     15.34540 0.6062870 -0.4035635241      1.3339170
#> sim_85     13.40747 0.6627142  0.1606923393      1.1810932
#> sim_86     13.39090 0.6610049 -0.3731550536      1.1014774
#> sim_87     13.81865 0.6415923  0.4206667637      0.3456894
#> sim_88     14.82431 0.6008029 -0.0944280410      0.6055474
#> sim_89     14.56419 0.6339625 -0.1764018257      0.6023720
#> sim_90     13.69587 0.6561939 -1.1779793045      0.9932023
#> sim_91     14.87553 0.6266078 -0.3263097272      0.8380055
#> sim_92     14.06384 0.6328441 -0.5466347853      0.4087850
#> sim_93     14.86106 0.6381001 -0.7552371297      0.5791429
#> sim_94     14.22317 0.6393512 -0.3168100222      1.2076459
#> sim_95     12.99617 0.6636817 -0.7702256245      0.6734834
#> sim_96     14.04844 0.6530962 -0.8813572124      0.8278928
#> sim_97     12.88374 0.6876093 -0.3152918012      1.3042307
#> sim_98     13.67416 0.6582187 -0.3841312542      0.3406590
#> sim_99     14.16203 0.6374881  0.0579829755      0.9054520
#> sim_100    13.94279 0.6546005 -0.6104905135      0.1935560
#> 
#> $stats
#>                  observed   rep.mean         se         bias
#> (Intercept)    14.1577509 14.1131510 0.78236221 -0.044599917
#> mathAge8        0.6388895  0.6414292 0.02555724  0.002539666
#> genderM        -0.3571922 -0.3192304 0.38995842  0.037961787
#> classnonmanual  0.7200815  0.6711779 0.38481090 -0.048903549
#> 
#> $R
#> [1] 100

2. The Residual Bootstrap

The residual bootstrap resamples the residual quantities from the fitted linear mixed-effects model in order to generate bootstrap resamples. That is, a random sample, drawn with replacement, is taken from the estimated error terms and the EBLUPS (at each level). Then the random samples are combined into bootstrap samples via the fitted model equation.

Examples

# nlme
lme_res_boot <- bootstrap(vcmodB, .f = fixef, type = "residual", B = 100)

3. The Cases Bootstrap

The cases bootstrap is a fully nonparametric bootstrap that resamples the data with respect to the clusters in order to generate bootstrap samples. Depending on the nature of the data, the resampling can be done only for the higher-level cluster(s), only at the observation-level within a cluster, or at all levels. See Van der Leeden et al. for a nice discussion of this decision. The level(s) of the data that should be resampled may be specified using the resample parameter, which is a logical vector. When setting resample values, the user should make sure to first specify the highest level (largest cluster) of the hierarchy, then the lowest (observation-level). For example, for students within a school, the school level should be specified first, then the student level.

Examples

# lme4
# resample the schools, but not the students
lmer_case_boot <- bootstrap(vcmodA, .f = fixef, type = "case", B = 100, resample = c(TRUE, FALSE))

# nlme
# do not resample the schools, but resample the students
lme_cases_boot1 <- bootstrap(vcmodB, .f = fixef, type = "case", B = 100, resample = c(FALSE, TRUE))

# nlme
# resample both the schools and the students
lme_cases_boot2 <- bootstrap(vcmodB, .f = fixef, type = "case", B = 100, resample = c(TRUE, TRUE))

4. The CGR (Semi-Parametric) Bootstrap

The semi-parametric bootstrap algorithm implemented was described by Carpenter, Goldstein and Rasbash (2003). The algorithm is outlined below:

Obtain the parameter estimates from the fitted model and calculate the estimated error terms and EBLUPs.
Rescale the error terms and EBLUPs so that the empirical variance of these quantities is equal to estimated variance components from the model.
Sample independently with replacement from the rescaled estimated error terms and rescaled EBLUPs.
Obtain bootstrap samples by combining the samples via the fitted model equation.
Refit the model and extract the statistic(s) of interest.
Repeat steps 3-5 B times.

Examples

# lme4
lmer_cgr_boot <- bootstrap(vcmodA, .f = fixef, type = "cgr", B = 100)

# nlme
lme_cgr_boot  <- bootstrap(vcmodB, .f = fixef, type = "cgr", B = 100)

5. The Random Effects Block (REB) Bootstrap

The random effects block (REB) bootstrap was outlined by Chambers and Chandra (2013) and has been developed for two-level nested linear mixed-effects (LME) models.

Consider a two-level LME of the form \(y = X \beta + Z b + \epsilon\).

The REB bootstrap algorithm (type = 0) is as follows:

Calculate the nonparametric residual quantities for the fitted model
- marginal residuals \(r = y - X\beta\)
- predicted random effects \(\tilde{b} = (Z^\prime Z)^{-1} Z^\prime r\)
- error terms \(\tilde{e} = r - Z \tilde{b}\)
Take a simple random sample with replacement of the groups and extract the corresponding elements of \(\tilde{b}\) and \(\tilde{e}\).
Generate bootstrap samples via the fitted model equation \(y = X \widehat{\beta} + Z \tilde{b} + \tilde{e}\).
Refit the model and extract the statistic(s) of interest.
Repeat steps 2-4 B times.

Variation 1 (type = 1): The first variation of the REB bootstrap zero centers and rescales the residual quantities prior to resampling.

Variation 2 (type = 2): The second variation of the REB bootstrap scales the estimates and centers the bootstrap distributions (i.e., adjusts for bias) after REB bootstrapping.

Examples

# lme4
lmer_reb_boot0 <- bootstrap(vcmodA, .f = fixef, type = "reb", B = 100, reb_type = 0)
lmer_reb_boot1 <- bootstrap(vcmodA, .f = fixef, type = "reb", B = 100, reb_type = 1)

# nlme
lme_reb_boot2  <- bootstrap(vcmodB, .f = fixef, type = "reb", B = 100, reb_type = 2)

6. Output Options

The newest version of lmeresampler comes equipped with its own print() and confint() functions, both of which follow the syntax of a generic print() and confint() function. In order for these functions to operate as necessary, the output of the bootstrap() function has been updated to be an object of class lmeresamp. See the next sections for some examples.

The `print()` Function

The syntax of the lmeresampler print() function is similar to the default print() function. The object to be printed must be the object returned by the bootstrap() call, an lmeresamp object. In addition, if the user would like to print confidence intervals for the object, setting ci = TRUE will call the confint() function and set method = "all" and level = 0.95. If not all of the intervals are of interest or a different confidence level is required, it is recommended that the user directly call confint() on the bootstrap object.

Examples

print(lmer_reb_boot0)
#> Bootstrap type: reb0 
#> 
#> Number of resamples: 100 
#> 
#>                  observed   rep.mean         se          bias
#> (Intercept)    14.1577509 14.0769408 0.81106998 -0.0808101564
#> mathAge8        0.6388895  0.6395461 0.02724457  0.0006565948
#> genderM        -0.3571922 -0.4409426 0.38588213 -0.0837504079
#> classnonmanual  0.7200815  0.7021238 0.43128316 -0.0179576564

print(lme_reb_boot2, ci = TRUE)
#> Bootstrap type: reb2 
#> 
#> Number of resamples: 100 
#> 
#>                       observed   rep.mean         se bias
#> beta.(Intercept)    14.1577509 14.1577509 0.74474925    0
#> beta.mathAge8        0.6388895  0.6388895 0.02556231    0
#> beta.genderM        -0.3571922 -0.3571922 0.36349159    0
#> beta.classnonmanual  0.7200815  0.7200815 0.39700106    0
#> sigma.(Intercept)    3.3122380  3.3122380 0.62752568    0
#> sigma2              19.7280582 19.7280582 1.51845206    0
#> 
#> 95 basic interval: 
#>                     basic.lower basic.upper
#> beta.(Intercept)     12.8440930  15.7497846
#> beta.mathAge8         0.5868947   0.6811430
#> beta.genderM         -0.9118761   0.3587842
#> beta.classnonmanual  -0.1117500   1.4513645
#> sigma.(Intercept)     2.1188934   4.5182002
#> sigma2               16.7484212  22.5134118
#> 
#> 95 percent normal-t interval: 
#>                 norm.t.lower norm.t.upper
#> (Intercept)      12.71765712   15.5978447
#> mathAge8          0.58892933    0.6888497
#> genderM          -1.02495574    0.3105713
#> classnonmanual   -0.03972045    1.4798834
#> sd((Intercept))   1.36162176    2.4325683
#> sigma             4.21190797    4.6838768
#> 
#> 95 percent bootstrap-t interval: 
#>                boot.t.lower boot.t.upper
#> (Intercept)      14.1577509   14.1577509
#> mathAge8          0.6388895    0.6388895
#> genderM          -0.3571922   -0.3571922
#> classnonmanual    0.7200815    0.7200815
#> 
#> 95 percent percentile-t interval: 
#>                     perc.t.lower perc.t.upper
#> beta.(Intercept)     12.56571727   15.4714088
#> beta.mathAge8         0.59663599    0.6908843
#> beta.genderM         -1.07316860    0.1974917
#> beta.classnonmanual  -0.01120156    1.5519130
#> sigma.(Intercept)     2.10627572    4.5055825
#> sigma2               16.94270452   22.7076952

The `confint()` Function

Similarly, the syntax of the lmeresampler confint() function resembles the default confint() function. The object for which confidence are calculated must be the lmeresamp object returned by the bootstrap() call. The type of confidence interval may be specified with method, for which possible values include: "basic" (basic interval), "norm" (normal-t interval), "boot-t" (bootstrap-t interval using output from the bootstrap() call), "perc" (percentile interval), and "all" (all of aforementioned methods, and the default if method is unspecified). The level at which the confidence interval is to be computed may be specified with level, which defaults to 0.95 if unspecified.

Notice that the output of the three intervals differ slightly. The normal-t interval returns intervals for all of the fixed effects, all of the random effects, and sigma, the bootstrap-t interval returns intervals for only the fixed effects, and both the basic and percentile-t intervals return intervals for parameters specified by .f. This has to do with the nature of the interval calculations and the functions available to pull the necessary information out of the models. The bootstrap-t interval, on the other hand, is more complicated. The way that bootstrap-t intervals are calculated necessitates that we use the standard errors of the random component estimates. While there is a way to calculate these standard errors, they are often unreliable. Thus we only provide functionality for bootstrap-t intervals for fixed effects. For the curious, the math behind the variance component standard error calculations may be found in Ben Bolker’s (one of the authors of, and the current maintainer of, lme4) two posts in the References section below.

Examples

confint(lme_reb_boot2) # all ci types, 0.95 confidence level
#> 
#> 95 basic interval: 
#>                     basic.lower basic.upper
#> beta.(Intercept)     12.8440930  15.7497846
#> beta.mathAge8         0.5868947   0.6811430
#> beta.genderM         -0.9118761   0.3587842
#> beta.classnonmanual  -0.1117500   1.4513645
#> sigma.(Intercept)     2.1188934   4.5182002
#> sigma2               16.7484212  22.5134118
#> 
#> 95 percent normal-t interval: 
#>                 norm.t.lower norm.t.upper
#> (Intercept)      12.71765712   15.5978447
#> mathAge8          0.58892933    0.6888497
#> genderM          -1.02495574    0.3105713
#> classnonmanual   -0.03972045    1.4798834
#> sd((Intercept))   1.36162176    2.4325683
#> sigma             4.21190797    4.6838768
#> 
#> 95 percent bootstrap-t interval: 
#>                boot.t.lower boot.t.upper
#> (Intercept)      14.1577509   14.1577509
#> mathAge8          0.6388895    0.6388895
#> genderM          -0.3571922   -0.3571922
#> classnonmanual    0.7200815    0.7200815
#> 
#> 95 percent percentile-t interval: 
#>                     perc.t.lower perc.t.upper
#> beta.(Intercept)     12.56571727   15.4714088
#> beta.mathAge8         0.59663599    0.6908843
#> beta.genderM         -1.07316860    0.1974917
#> beta.classnonmanual  -0.01120156    1.5519130
#> sigma.(Intercept)     2.10627572    4.5055825
#> sigma2               16.94270452   22.7076952

confint(lmer_reb_boot0, method = "boot-t", level= 0.90)
#> 90 percent bootstrap-t interval: 
#>                boot.t.lower boot.t.upper
#> (Intercept)      14.1473072   14.3040182
#> mathAge8          0.6385272    0.6439638
#> genderM          -0.3620349   -0.2893689
#> classnonmanual    0.7145713    0.7972530

7. Parallelization (User Extension)

The Idea

Parallelization (also known as parallel computing, parallel processing, or executing “in parallel”) is the idea of splitting a process into sub-processes that execute at the same time. This is of interest in the context of bootstrapping because bootstrapping is often both computation-heavy and time-consuming.

Using parallelization, the processes behind bootstrap() can be executed on multiple computer cores, depending on the makeup of the machine you are using. To check how many cores your machine has, run parallel::detectCores(). This will output the number of cores your CPU “logically” has, which is twice more than the number of cores it physically has. The number of logical cores is the number of cores you may actually use when working in parallel (the more cores you have available to you, the better!). We recommend using multiple cores when possible, without using all/most of your CPU’s cores, which it needs to run everything, not just RStudio. However, if your machine only has one core, parallelization is neither useful nor possible.

Performance

It is important to note that while the number of cores used with parallelization will be some scalar multiple of the default number of one core used for non-parallel processing, the runtime of parallel processes will not be increased by that same scalar. That is, running a process on two cores does not yield a runtime that is twice as fast as running the same process on one core. This is because parallelization takes some overhead to split the processes, so while runtime will substantially improve, it will not correspond exactly to the number of cores being used.

There are two types of parallelization techniques: forking (also known as multicore) and clustering. While forking tends to be slightly faster, it cannot be executed on Windows operating systems; thus, we will spend our time documenting clustering, which is system-agnostic. For more information on how to use forking with parallelization, check out the parallel package.

Implementation

We implement our parallel processing using Steve Weston and Rich Calaway’s guide on jointly using doParallel and foreach, as the code is both concise and simple enough for those without much experience with parallelization. While doParallel and foreach default to multicore (forking) functionality when using parallel computing on UNIX operating systems and snow (clustering) functionality on Windows systems, in order to be as explicit as possible, we will outline our examples using clustering. For more information on forking, please see Weston and Calaway’s guide, “Getting Started with doParallel and foreach” (the link can be found in the References section).

The basic idea with clusters is that they execute tasks as a “cluster” of computers, which means that each cluster needs to be fed in information separately. For this reason, clustering has more overhead than forking. Both doParallel and foreach use what Weston and Calaway call “snow-like functionality”, meaning that while the clustering is done using these two packages, it uses the syntax from snow (a now deprecated package). Clusters also need to be “made” and “stopped” with each call to a foreach() loop and doParallel, to explicitly tell the CPU when to begin and end the parallelization.

Examples

With all of this in mind, let’s explore an example using lmeresampler. Notice that we have the user implement the parallelization with the bootstrap() call itself, rather than doing it within lmeresampler itself. Thus, a modified, parallelized, call to bootstrap() is as follows:

library(purrr)
library(foreach)
library(doParallel)
set.seed(1234)
numCores <- 2

cl <- snow::makeSOCKcluster(numCores) # make a socket cluster
doParallel::registerDoParallel(cl)    # how the CPU knows to run in parallel

b_parallel2 <- foreach(B = rep(250, 2),
                       .combine = combine_lmeresamp,
                       .packages = c("lmeresampler", "lme4")) %dopar% {
  bootstrap(vcmodA, .f = fixef, type = "parametric", B = B)
}

stopCluster(cl)

Let’s compare the runtime of the above b_parallel2 with the same bootstrap run without parallelization. Note that differences in runtime will be somewhat arbitrary for a small number of resamples (say, 100), and will become more pronounced as the number of resamples increases (to 1000, for example), in favor of the parallelized bootstrap.

system.time(b_nopar  <- bootstrap(vcmodA, .f = fixef, type = "parametric", B = 1000))
#>    user  system elapsed 
#>   9.777   0.028   9.826

numCores <- 2
cl <- snow::makeSOCKcluster(numCores)
doParallel::registerDoParallel(cl)

system.time({
  b_parallel2 <- foreach(B = rep(500, 2),
                         .combine = combine_lmeresamp,
                         .packages = c("lmeresampler", "lme4", "purrr", "dplyr")) %dopar% {
                           bootstrap(vcmodA, .f = fixef, type = "parametric", B = B)
                         }
})
#>    user  system elapsed 
#>   0.062   0.007   7.527

stopCluster(cl)

Pretty useful stuff! The above can be applied to all bootstrap types and models fit using both lme4 and nlme.

8. Future Directions (Developer Extensions)

When fitting linear mixed effects models, it is common that users need to use data with crossed effects, that is, when lower-level observations exist in multiple factors of a higher-level. For example, if patients (level-1) adhere strictly to one doctor (level-2), but the doctors work for multiple hospitals (level-3), levels two and three are crossed. It is currently possible to use lmeresampler to bootstrap a model with crossed effects with the parametric bootstrap, however, future updates should explore adding functionality to allow bootstrapping of crossed models for the cases, residual, CGR, and REB bootstraps. Crossed relationships need to be evaluated differently than standard nested relationships, thus the code will need significant modifications in its evaluation process.

Furthermore, future extensions of lmeresampler may focus on implementing the existing functions for generalized linear mixed-effects (GLMM) models for all of the bootstraps types. This has been requested by past users.

References

Carpenter, J.R., Goldstein, H. and Rasbash, J. (2003), A novel bootstrap procedure for assessing the relationship between class size and achievement. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52: 431-443. DOI: 10.1111/1467-9876.00415

JSP728 Data

Leeden R.., Meijer E., Busing F.M. (2008) Resampling Multilevel Models. In: Leeuw J.., Meijer E. (eds) Handbook of Multilevel Analysis. Springer, New York, NY. DOI: 10.1007/978-0-387-73186-5_11

Raymond Chambers & Hukum Chandra (2013) A Random Effect Block Bootstrap for Clustered Data, Journal of Computational and Graphical Statistics, 22:2, 452-470, DOI: 10.1080/10618600.2012.681216

Ben Bolker’s lme4 Variance Component SE Tips

Ben Bolker’s nlme Variance Component SE Ideas

Weston and Calaway’s doParallel and foreach Guide

Vignette

Overview

The Data

The Call

1. The Parametric Bootstrap

Examples

bootstrap() output

2. The Residual Bootstrap

Examples

3. The Cases Bootstrap

Examples

4. The CGR (Semi-Parametric) Bootstrap

Examples

5. The Random Effects Block (REB) Bootstrap

Examples

6. Output Options

The print() Function

Examples

The confint() Function

Examples

7. Parallelization (User Extension)

The Idea

Performance

Implementation

Examples

8. Future Directions (Developer Extensions)

References

`bootstrap()` output

The `print()` Function

The `confint()` Function