Print matrix condition numbers column-by-column
condiNumber.RdThis function prints the condition number of a matrix while adding
columns one-by-one. This is useful for testing multicollinearity and
other numerical problems. It is a generic function with a default
method, and a method for maxLik objects.
Usage
condiNumber(x, ...)
# Default S3 method
condiNumber(x, exact = FALSE, norm = FALSE,
printLevel=print.level, print.level=1, digits = getOption( "digits" ), ... )
# S3 method for class 'maxLik'
condiNumber(x, ...)Arguments
- x
numeric matrix, condition numbers of which are to be printed
- exact
logical, should condition numbers be exact or approximations (see
kappa)- norm
logical, whether the columns should be normalised to have unit norm
- printLevel
numeric, positive value will output the numbers during the calculations. Useful for interactive work.
- print.level
same as ‘printLevel’, for backward compatibility
- digits
minimal number of significant digits to print (only relevant if argument
print.levelis larger than zero).- ...
Further arguments to
condiNumber.defaultare currently ignored; further arguments tocondiNumber.maxLikare passed tocondiNumber.default.
Details
Statistical model often fail because of a high correlation between the explanatory variables in the linear index (multicollinearity) or because the evaluated maximum of a non-linear model is virtually flat. In both cases, the (near) singularity of the related matrices may help to understand the problem.
condiNumber inspects the matrices column-by-column and
indicates which variables lead to a jump in the condition
number (cause singularity).
If the matrix column name does not immediately indicate the
problem, one may run an OLS model by estimating this column
using all the previous columns as explanatory variables. Those
columns that explain almost all the variation in the current one will
have very high
\(t\)-values.
Value
Invisible vector of condition numbers by column. If the start values
for maxLik are named, the condition numbers are named
accordingly.
Examples
set.seed(0)
## generate a simple nearly multicollinear dataset
x1 <- runif(100)
x2 <- runif(100)
x3 <- x1 + x2 + 0.000001*runif(100) # this is virtually equal to x1 + x2
x4 <- runif(100)
y <- x1 + x2 + x3 + x4 + rnorm(100)
m <- lm(y ~ -1 + x1 + x2 + x3 + x4)
print(summary(m)) # note the outlandish estimates and standard errors
#>
#> Call:
#> lm(formula = y ~ -1 + x1 + x2 + x3 + x4)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.01496 -0.70762 -0.02821 0.60782 2.39831
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1 -1.374e+05 3.762e+05 -0.365 0.716
#> x2 -1.374e+05 3.762e+05 -0.365 0.716
#> x3 1.374e+05 3.762e+05 0.365 0.716
#> x4 4.862e-01 3.204e-01 1.518 0.132
#>
#> Residual standard error: 1.044 on 96 degrees of freedom
#> Multiple R-squared: 0.8808, Adjusted R-squared: 0.8759
#> F-statistic: 177.4 on 4 and 96 DF, p-value: < 2.2e-16
#>
# while R^2 is 0.88. This suggests multicollinearity
condiNumber(model.matrix(m)) # note the value 'explodes' at x3
#> x1 1
#> x2 3.413135
#> x3 14095268
#> x4 11680350
## we may test the results further:
print(summary(lm(x3 ~ -1 + x1 + x2)))
#>
#> Call:
#> lm(formula = x3 ~ -1 + x1 + x2)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.579e-07 -1.886e-07 -7.440e-09 2.539e-07 6.849e-07
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> x1 1.000e+00 8.418e-08 11879172 <2e-16 ***
#> x2 1.000e+00 8.480e-08 11792743 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 3.014e-07 on 98 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 6.722e+14 on 2 and 98 DF, p-value: < 2.2e-16
#>
# Note the extremely high t-values and R^2: x3 is (almost) completely
# explained by x1 and x2