Test multiple parameters and compare nested models

Performs multi-parameter hypothesis tests for a vector of statistical parameters and compares nested statistical models obtained from multiply imputed data sets.

testModels(model, null.model, method = c("D1", "D2", "D3", "D4"), 
  use = c("wald", "likelihood"), ariv = c("default", "positive", "robust"),
  df.com = NULL, data = NULL)

Arguments

model: A list of fitted statistical models (“full” model) as produced by with.mitml.list or similar.
null.model: A list of fitted statistical models (“restricted” model) as produced by with.mitml.list or similar.
method: A character string denoting the method by which the test is performed. Can be "D1", "D2", "D3", or "D4" (see 'Details'). Default is "D1".
use: A character string denoting Wald- or likelihood-based based tests. Can be either "wald" or "likelihood". Only used if method = "D2".
ariv: A character string denoting how the ARIV is calculated. Can be "default", "positive", or "robust" (see 'Details').
df.com: (optional) A number denoting the complete-data degrees of freedom for the hypothesis test. Only used if method = "D1".
data: (optional) A list of imputed data sets (see 'Details'). Only used if method = "D4"

Details

This function compares two nested statistical models fitted to multiply imputed data sets by pooling Wald-like or likelihood-ratio tests.

Pooling methods for Wald-like tests of multiple parameters were introduced by Rubin (1987) and further developed by Li, Raghunathan and Rubin (1991). The pooled Wald test is referred to as \(D_1\) and can be used by setting method = "D1". \(D_1\) is the multi-parameter equivalent of testEstimates, that is, it tests multiple parameters simultaneously. For \(D_1\), the complete-data degrees of freedom are assumed to be infinite, but they can be adjusted for smaller samples by supplying df.com (Reiter, 2007).

An alternative method for Wald-like hypothesis tests was suggested by Li, Meng, Raghunathan and Rubin (1991). The procedure is called \(D_2\) and can be used by setting method = "D2". \(D_2\) calculates the Wald-test directly for each data set and then pools the resulting \(\chi^2\) values. The source of these values is specified by the use argument. If use = "wald" (the default), then a Wald test similar to \(D_1\) is performed. If use = "likelihood", then the two models are compared with a likelihood-ratio test instead.

Pooling methods for likelihood-ration tests were suggested by Meng and Rubin (1992). This procedure is referred to as \(D_3\) and can be used by setting method = "D3". \(D_3\) compares the two models by pooling the likelihood-ratio test across multiply imputed data sets.

Finally, an improved method for pooling likelihood-ratio tests was recommended by Chan & Meng (2019). This method is referred to as \(D_4\) and can be used by setting method = "D4". \(D_4\) also compares models by pooling the likelihood-ratio test but does so in a more general and efficient manner.

The function supports different classes of statistical models depending on which method is chosen. \(D_1\) supports models that define coef and vcov methods (or similar) for extracting the parameter estimates and their estimated covariance matrix. \(D_2\) can be used for the same models (if use = "wald" and models that define a logLik method (if use = "likelihood"). \(D_3\) supports linear models, linear mixed-effects models (see Laird, Lange, & Stram, 1987) with an arbitrary cluster structed if estimated with lme4 or a single cluster if estimated by nlme, and structural equation models estimated with lavaan (requires ML estimator, see 'Note'). Finally, \(D_4\) supports models that define a logLik method but can fail if the data to which the model was fitted cannot be found. In such a case, users can provide the list of imputed data sets directly by specifying the data argument or refit with the include.data argument in with.mitml.list. Support for other statistical models may be added in future releases.

The \(D_4\), \(D_3\), and \(D_2\) methods support different estimators of the relative increase in variance (ARIV), which can be specified with the ariv argument. If ariv = "default", the default estimators are used. If ariv = "positive", the default estimators are used but constrained to take on strictly positive values. This is useful if the estimated ARIV is negative. If ariv = "robust", which is available only for \(D_4\), the "robust" estimator proposed by Chan & Meng (2019) is used. This method should be used with caution, because it requires much stronger assumptions and may result in liberal inferences if these assumptions are violated.

Value

A list containing the results of the model comparison. A print method is used for more readable output.

Note

The methods \(D_4\), \(D_3\), and the likelihood-based \(D_2\) assume that models were fit using maximum likelihood (ML). Models fit using REML are automatically refit using ML. Models fit in 'lavaan' using the MLR estimator or similar techniques that require scaled \(chi^2\) difference tests are currently not supported.

References

Chan, K. W., & Meng, X.-L. (2019). Multiple improvements of multiple imputation likelihood ratio tests. ArXiv:1711.08822 [Math, Stat]. https://arxiv.org/abs/1711.08822

Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. Journal of the American Statistical Association, 82, 97-105.

Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65-92.

Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.

Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103-111.

Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502-508.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.

Author

Simon Grund

Examples

data(studentratings)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
#> Running burn-in phase ...
#> Creating imputed data set ( 1 / 5 ) ...
#> Creating imputed data set ( 2 / 5 ) ...
#> Creating imputed data set ( 3 / 5 ) ...
#> Creating imputed data set ( 4 / 5 ) ...
#> Creating imputed data set ( 5 / 5 ) ...
#> Done!

implist <- mitmlComplete(imp)

# * Example 1: multiparameter hypothesis test for 'ReadDis' and 'SES'
# This tests the hypothesis that both effects are zero.

require(lme4)
fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE))
fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE))

# apply Rubin's rules
testEstimates(fit1)
#> 
#> Call:
#> 
#> testEstimates(model = fit1)
#> 
#> Final parameter estimates and inferences obtained from 5 imputed data sets.
#> 
#>              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
#> (Intercept)   582.370    14.887    39.118    95.466     0.000     0.257     0.221 
#> ReadDis       -35.762     5.368    -6.662    79.640     0.000     0.289     0.243 
#> 
#> Unadjusted hypothesis test as appropriate in larger samples.
#> 

# multiparameter hypothesis test using D1 (default)
testModels(fit1, fit0)
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0)
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D1
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      44.384       1  79.640   0.000   0.289 
#> 
#> Unadjusted hypothesis test as appropriate in larger samples.
#> 

# ... adjusting for finite samples
testModels(fit1, fit0, df.com = 47)
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0, df.com = 47)
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D1
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      44.384       1   4.000   0.003   0.289 
#> 
#> Hypothesis test adjusted for small samples with df=[47]
#> complete-data degrees of freedom.
#> 

# ... using D2 ("wald", using estimates and covariance-matrix)
testModels(fit1, fit0, method = "D2")
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0, method = "D2")
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D2 (wald)
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      44.816       1  87.901   0.000   0.271 
#> 

# ... using D2 ("likelihood", using likelihood-ratio test)
testModels(fit1, fit0, method = "D2", use = "likelihood")
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0, method = "D2", use = "likelihood")
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D2 (likelihood)
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      43.718       1 103.172   0.000   0.245 
#> 

# ... using D3 (likelihood-ratio test, requires ML fit)
testModels(fit1, fit0, method = "D3")
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0, method = "D3")
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D3
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      40.854       1  63.250   0.000   0.336 
#> 

# ... using D4 (likelihood-ratio test, requires ML fit)
testModels(fit1, fit0, method = "D4")
#> 
#> Call:
#> 
#> testModels(model = fit1, null.model = fit0, method = "D4")
#> 
#> Model comparison calculated from 5 imputed data sets.
#> Combination method: D4
#> 
#>     F.value     df1     df2   P(>F)     RIV 
#>      40.858       1  63.280   0.000   0.336 
#> 
#> Data for stacking were automatically extracted from the fitted models.
#>