Parametric Bootstrap Reference Distribution

pb_refdist() simulates a fixed number of bootstrap replicates.
pb_refdist_sequential() runs batches of simulations until a target number of extreme hits is reached (or a maximum number of simulations is hit), useful for efficient estimation of small p-values.

Both functions return objects of class "PBrefdist" with methods for summary(), plot(), and as.data.frame().

pb_refdist(
  fit1,
  fit0,
  nsim = 1000,
  engine = c("serial", "parallel", "future"),
  nworkers = 2,
  verbose = FALSE
)

pb_refdist_sequential(
  fit1,
  fit0,
  h = 20,
  nsim = 1000,
  batch_size = 50,
  engine = c("serial", "parallel", "future"),
  nworkers = 2,
  verbose = FALSE
)

Arguments

fit1: The larger (alternative) model, e.g. lm, gls, lme, lmer.
fit0: The smaller (null) model, nested in fit1.
nsim: Number of simulations (for pb_refdist) or maximum number of simulations (pb_refdist_sequential).
engine: Computation engine: "serial", "parallel", or "future".
nworkers: Number of workers for parallel/future backend.
verbose: Logical; if TRUE, print progress messages. Default is FALSE.
h: Number of extreme hits to target (for pb_refdist_sequential).
batch_size: Number of simulations per batch

Value

An object of class "PBrefdist" containing the observed statistic, the bootstrap replicates, degrees of freedom, asymptotic p-value, and optionally the number of hits and standard error.

Details

Compute a parametric bootstrap reference distribution for a likelihood ratio statistic, comparing a large (alternative) and a small (null) model. The distribution can be used to estimate a bootstrap p-value.

The sequential version is useful when one wants to control Monte Carlo error by targeting a fixed number of extreme values exceeding the observed test statistic.

Note

Best Practice: Always fit your models with the data= argument. This ensures all covariates used in the model formula are stored with the model object, enabling reliable simulation and refitting for bootstrap analysis, including on parallel workers. Without data=, refitting may fail in parallel contexts and reproducibility is compromised.

The returned object can be passed to summary(), plot(), and as.data.frame().

The function automatically ensures that the models have their required data embedded. This guarantees that parametric bootstrap simulations can be run in parallel workers without errors about missing variables, even if the original dataset was modified or removed from the global environment after fitting.