(EXPERIMENTAL) Bootstrap, Conformal, and Simulation-Based Inference
Source:R/inferences.R
inferences.RdWarning: This function is experimental. It may be renamed, the user interface may change, or the functionality may migrate to arguments in other marginaleffects functions.
Apply this function to a marginaleffects object to change the inferential method used to compute uncertainty estimates.
Usage
inferences(
x,
method,
R = 1000,
conf_type = "perc",
data_train = NULL,
data_test = NULL,
data_calib = NULL,
conformal_score = "residual_abs",
estimator = NULL,
...
)Arguments
- x
Object produced by one of the core
marginaleffectsfunctions, or a data frame suitable for the function supplied to theestimatorargument.- method
String
"delta": delta method standard errors
"boot" package
"fwb": fractional weighted bootstrap
"rsample" package
"simulation" from a multivariate normal distribution (Krinsky & Robb, 1986)
"conformal_split": prediction intervals using split conformal prediction (see Angelopoulos & Bates, 2022)
"conformal_cv+": prediction intervals using cross-validation+ conformal prediction (see Barber et al., 2020)
"conformal_full": prediction intervals using full conformal prediction (see Lei et al., 2018). Warning: This method is computationally expensive and typically much slower than split or CV+ methods.
"conformal_quantile": prediction intervals using full conformal prediction (see Romano et al., 2020).
- R
Number of resamples, simulations, or cross-validation folds.
- conf_type
String: type of bootstrap interval to construct.
boot: "perc", "norm", "basic", or "bca"fwb: "perc", "norm", "wald", "basic", "bc", or "bca"rsample: "perc" or "bca"simulation: "perc" or "wald"
- data_train
Data frame used to train/fit the model. If
NULL,marginaleffectstries to extract the data from the original model object. Test data are inferred directly from thenewdatasupplied to the originatingmarginaleffectscall (e.g.,predictions()).- data_test
Data frame make out of sample prediction. Only used for conformal inference. If
NULL, the data frame supplied tonewdatain the originalmarginaleffectscall is used.- data_calib
Data frame used for calibration in split conformal prediction.
- conformal_score
String. Warning: The
typeargument inpredictions()must generate predictions which are on the same scale as the outcome variable. Typically, this means thattypemust be "response" or "probs"."residual_abs" or "residual_sq" for regression tasks (numeric outcome)
"softmax" for classification tasks (when
predictions()returns agroupcolumns, such as multinomial or ordinal logit models.
- estimator
Function that accepts a data frame, fits a model, applies a
marginaleffectsfunction, and returns the object. Only supported withmethod = "rsample"ormethod = "boot". Whenmethod = "rsample", the output must include a "term" column. This is not always the case forpredictions(), in which case users may have to create the column manually.- ...
If
method = "boot", additional arguments are passed toboot::boot().If
method = "fwb", additional arguments are passed tofwb::fwb().If
method = "rsample", additional arguments are passed torsample::bootstraps(), unless the user supplies agroupargument, in which case all arguments are passed torsample::group_bootstraps().If
method = "conformal_full", additional arguments control the optimization process:var_multiplier: multiplier for initial search bounds (default: 10)max_iter: maximum iterations for root finding (default: 100)tolerance: tolerance for root finding convergence (default:.Machine$double.eps^0.25)
If
method = "conformal_quantile", additional arguments are passed toquantregForest::quantregForest()for fitting the quantile regression forest (e.g.,ntree,mtry,nodesize,nthreads).Additional arguments are ignored for other conformal methods (
conformal_split,conformal_cv+).
Details
When method = "simulation", we conduct simulation-based inference following the method discussed in Krinsky & Robb (1986):
Draw
Rsets of simulated coefficients from a multivariate normal distribution with mean equal to the original model's estimated coefficients and variance equal to the model's variance-covariance matrix (classical, "HC3", or other).Use the
Rsets of coefficients to computeRsets of estimands: predictions, comparisons, slopes, or hypotheses.Take quantiles of the resulting distribution of estimands to obtain a confidence interval (when
conf_type = "perc") and the standard deviation of simulated estimates to estimate the standard error (which is used for a Z-test and Wald confidence intervals whenconf_type = "wald").
When method = "fwb", drawn weights are supplied to the model fitting function's weights argument; if the model doesn't accept non-integer weights, this method should not be used. If weights were included in the original model fit, they are extracted by weights() and multiplied by the drawn weights. These weights are supplied to the wts argument of the estimation function (e.g., comparisons()).
Warning: custom model classes are not supported by inferences() because they are not guaranteed to come with an appropriate update() method.
References
Krinsky, I., and A. L. Robb. 1986. "On Approximating the Statistical Properties of Elasticities." Review of Economics and Statistics 68 (4): 715–9.
King, Gary, Michael Tomz, and Jason Wittenberg. "Making the most of statistical analyses: Improving interpretation and presentation." American journal of political science (2000): 347-361
Dowd, Bryan E., William H. Greene, and Edward C. Norton. "Computation of standard errors." Health services research 49.2 (2014): 731-750.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification." arXiv. https://doi.org/10.48550/arXiv.2107.07511.
Barber, Rina Foygel, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. 2020. "Predictive Inference with the Jackknife+." arXiv. http://arxiv.org/abs/1905.02928.
Lei, Jing, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. 2018. "Distribution-Free Predictive Inference for Regression." Journal of the American Statistical Association 113 (523): 1094–1111.
Romano, Yaniv, Evan Patterson, and Emmanuel Candes. 2020. "Conformalized quantile regression." Advances in neural information processing systems 32.
Parallel computation
The slopes() and comparisons() functions can use parallelism to
speed up computation. Operations are parallelized for the computation of
standard errors, at the model coefficient level. There is always
considerable overhead when using parallel computation, mainly involved
in passing the whole dataset to the different processes. Thus, parallel
computation is most likely to be useful when the model includes many parameters
and the dataset is relatively small.
Warning: In many cases, parallel processing will not be useful at all.
To activate parallel computation, users must load the future.apply package,
call plan() function, and set a global option.
options(marginaleffects_parallel = TRUE): parallelize delta method computation of standard errors.
options(marginaleffects_parallel_inferences = TRUE): parallelize "rsample" or "fwb" bootstrap computation in inferences().
options(marginaleffects_parallel_packages = TRUE): vector of strings with the names of modeling packages used to fit the model, ex: c("survival", "splines")
For example:
library(future.apply)
plan("multisession", workers = 4)
options(marginaleffects_parallel = FALSE)
options(marginaleffects_parallel_inferences = TRUE)
options(marginaleffects_parallel_packages = c("survival", "splines"))
slopes(model)To disable parallelism in marginaleffects altogether, you can set a global option:
options(marginaleffects_parallel = FALSE)Examples
if (FALSE) { # \dontrun{
library(magrittr)
set.seed(1024)
mod <- lm(Sepal.Length ~ Sepal.Width * Species, data = iris)
# bootstrap
avg_predictions(mod, by = "Species") %>%
inferences(method = "boot")
avg_predictions(mod, by = "Species") %>%
inferences(method = "rsample")
# Fractional (bayesian) bootstrap
avg_slopes(mod, by = "Species") %>%
inferences(method = "fwb") %>%
get_draws("rvar") %>%
data.frame()
# Simulation-based inference
slopes(mod) %>%
inferences(method = "simulation") %>%
head()
# Two-step estimation procedure: Propensity score + G-Computation
lalonde <- get_dataset("lalonde")
estimator <- function(data) {
# Step 1: Estimate propensity scores
fit1 <- glm(treat ~ age + educ + race, family = binomial, data = data)
ps <- predict(fit1, type = "response")
# Step 2: Fit weighted outcome model
m <- lm(re78 ~ treat * (re75 + age + educ + race),
data = data, weight = ps
)
# Step 3: Compute average treatment effect by G-computation
avg_comparisons(m, variables = "treat", wts = ps, vcov = FALSE)
}
inferences(lalonde, method = "rsample", estimator = estimator)
} # }