Variance Inflation Factors

Calculates variance-inflation and generalized variance-inflation factors (VIFs and GVIFs) for linear, generalized linear, and other regression models.

Usage

vif(mod, ...)

# Default S3 method
vif(mod, ...)

# S3 method for class 'lm'
vif(mod, type=c("terms", "predictor"), ...)

# S3 method for class 'merMod'
vif(mod, ...)

# S3 method for class 'polr'
vif(mod, ...)

# S3 method for class 'svyolr'
vif(mod, ...)

Arguments

mod: for the default method, an object that responds to coef, vcov, and model.matrix, such as a glm object.
type: for unweighted lm objects only, how to handle models that contain interactions: see Details below.
...: not used.

Details

If all terms in an unweighted linear model have 1 df, then the usual variance-inflation factors are calculated.

If any terms in an unweighted linear model have more than 1 df, then generalized variance-inflation factors (Fox and Monette, 1992) are calculated. These are interpretable as the inflation in size of the confidence ellipse or ellipsoid for the coefficients of the term in comparison with what would be obtained for orthogonal data.

The generalized VIFs are invariant with respect to the coding of the terms in the model (as long as the subspace of the columns of the model matrix pertaining to each term is invariant). To adjust for the dimension of the confidence ellipsoid, the function also prints \(GVIF^{1/(2\times df)}\) where \(df\) is the degrees of freedom associated with the term.

Through a further generalization, the implementation here is applicable as well to other sorts of models, in particular weighted linear models, generalized linear models, and mixed-effects models: Rather than computing the GVIFs from the correlation matrix of the columns of the model matrix (disregarding the regression constant), vif() uses the correlation matrix of the coefficients. Using either correlation matrix produces the same result for a linear model fit by least-squares (as pointed out by Henric Nilsson; see Fox, 2016, Exercise 13.5), but using the correlation matrix of the coefficients works with other regression models, such as GLMs, while using the correlations among the columns of the model matrix does not.

Two methods of computing GVIFs are provided for unweighted linear models:

Setting type="terms" (the default) behaves like the default method, and computes the GVIF for each term in the model, ignoring relations of marginality among the terms in models with interactions. GVIFs computed in this manner aren't generally sensible.
Setting type="predictor" focuses in turn on each predictor in the model, combining the main effect for that predictor with the main effects of the predictors with which the focal predictor interacts and the interactions; e.g., in the model with formula y ~ a*b + b*c, the GVIF for the predictor a also includes the b main effect and the a:b interaction regressors; the GVIF for the predictor c includes the b main effect and the b:c interaction; and the GVIF for the predictor b includes the a and c main effects and the a:b and a:c interactions (i.e., the whole model), and is thus necessarily 1. These predictor GVIFs should be regarded as experimental.

Specific methods are provided for ordinal regression model objects produced by polr in the MASS package and svyolr in the survey package, which are "intercept-less"; VIFs or GVIFs for linear and similar regression models without intercepts are generally not sensible.

Value

A vector of VIFs, or a matrix containing one row for each term, and columns for the GVIF, df, and \(GVIF^{1/(2\times df)}\), the last of which is intended to be comparable across terms of different dimension.

References

Fox, J. and Monette, G. (1992) Generalized collinearity diagnostics. JASA, 87, 178–183.

Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

Fox, J. and Weisberg, S. (2018) An R Companion to Applied Regression, Third Edition, Sage.

Author

John Fox jfox@mcmaster.ca and Henric Nilsson

Examples

vif(lm(prestige ~ income + education, data=Duncan))
#>    income education 
#>    2.1049    2.1049 
vif(lm(prestige ~ income + education + type, data=Duncan))
#>               GVIF Df GVIF^(1/(2*Df))
#> income    2.209178  1        1.486330
#> education 5.297584  1        2.301648
#> type      5.098592  2        1.502666
vif(lm(prestige ~ (income + education)*type, data=Duncan),
    type="terms") # not recommended
#> there are higher-order terms (interactions) in this model
#> consider setting type = 'predictor'; see ?vif
#>                       GVIF Df GVIF^(1/(2*Df))
#> income            4.824438  1        2.196460
#> education        32.778424  1        5.725244
#> type            480.931237  2        4.682963
#> income:type     176.244403  2        3.643584
#> education:type 1233.746316  2        5.926612
vif(lm(prestige ~ (income + education)*type, data=Duncan),
    type="predictor")
#> GVIFs computed for predictors
#>               GVIF Df GVIF^(1/(2*Df))    Interacts With Other Predictors
#> income    375.2745  5        1.808985              type        education
#> education 119.2534  5        1.613047              type           income
#> type        1.0000  8        1.000000 income, education             --