lrm.fit

Logistic Model Fitter

Usage

lrm.fit(
  x,
  y,
  offset = 0,
  initial,
  opt_method = c("NR", "nlminb", "LM", "glm.fit", "nlm", "BFGS", "L-BFGS-B", "CG",
    "Nelder-Mead"),
  maxit = 50,
  reltol = 1e-10,
  abstol = if (opt_method %in% c("NR", "LM")) 1e+10 else 0,
  gradtol = if (opt_method %in% c("NR", "LM")) 0.001 else 1e-05,
  factr = 1e+07,
  eps = 5e-04,
  minstepsize = 0.01,
  trace = 0,
  tol = .Machine$double.eps,
  penalty.matrix = NULL,
  weights = NULL,
  normwt = FALSE,
  transx = FALSE,
  compstats = TRUE,
  inclpen = TRUE,
  initglm = FALSE,
  y.precision = 7
)

Arguments

x

design matrix with no column for an intercept. If a vector is transformed to a one-column matrix.

y

response vector, numeric, categorical, or character. For ordinal regression, the order of categories comes from factor levels, and if y is not a factor, from the numerical or alphabetic order of y values.

offset

optional numeric vector containing an offset on the logit scale

initial

vector of initial parameter estimates, beginning with the intercepts

opt_method

optimization method, with possible values

'NR' : the default, standard Newton-Raphson iteration using the gradient and Hessian, with step-helving. All three convergence criteria of eps, gradtol, abstol must be satisfied. Relax some of these if you do not want to consider some of them at all in judging convergence. The defaults for the various tolerances for NR result in convergence being mainly judged by eps in most uses. Tighten the non-eps parameters to give more weight to the other criteria.
'LM' : the Levenberg-Marquardt method, with the same convergence criteria as 'NR'
'nlminb' : a quasi-Newton method using stats::nlminb() which uses gradients and the Hessian. This is a fast and robust algorithm.
'glm.fit' : for binary y without penalization only
'nlm' : see stats::nlm(); not highly recommended
'BFGS' :
'L-BFGS-B' :
'CG' :
'Nelder-Mead' : see stats::optim() for these 4 methods

maxit

maximum number of iterations allowed, which means different things for different opt_method. For NR it is the number of updates to parameters not counting step-halving steps. When maxit=1, initial is assumed to contain the maximum likelihood estimates already, and those are returned as coefficients, along with u, info.matrix (negative Hessian) and deviance. stats are only computed if compstats is explicitly set to TRUE by the user.

reltol

used by BFGS, nlminb, glm.fit to specify the convergence criteria in relative terms with regard to -2 LL, i.e., convergence is assume when one minus the fold-change falls below reltol

abstol

used by NR (maximum absolute change in parameter estimates from one iteration to the next before convergence can be declared; by default has no effect), nlminb (by default has no effect; see abs.tol argument; set to e.g. 0.001 for nlminb when there is complete separation)

gradtol

used by NR and LM (maximum absolute gradient before convergence can be declared) and nlm (similar but for a scaled gradient). For NR and LM gradtol is multiplied by the the sample size / 1000, because the gradient is proportional to sample size.

factr

see stats::optim() documentation for L-BFGS-B

eps

difference in -2 log likelihood for declaring convergence with opt_method='NR'. At present, the old lrm.fit approach of still declaring convergence even if the -2 LL gets worse by eps/10 while the maximum absolute gradient is below 1e-9 is not implemented. This handles the case where the initial estimates are actually MLEs, and prevents endless step-halving.

minstepsize

used with opt_method='NR' to specify when to abandon step-halving

trace

set to a positive integer to trace the iterative process. Some optimization methods distinguish trace=1 from trace higher than 1.

tol

QR singularity criterion for opt_method='NR' updates; ignored when inverting the final information matrix because chol is used for that.

penalty.matrix

a self-contained ready-to-use penalty matrix - see lrm(). It is $p x p$ where $p$ is the number of columns of x.

weights

a vector (same length as y) of possibly fractional case weights

normwt

set to TRUE to scale weights so they sum to $n$, the length of y; useful for sample surveys as opposed to the default of frequency weighting

transx

set to TRUE to center x and QR-factor it to orthogonalize. See this for details.

compstats

set to FALSE to prevent the calculation of the vector of model statistics

inclpen

set to FALSE to not include the penalty matrix in the Hessian when the Hessian is being computed on transformed x, vs. adding the penalty after back-transforming. This should not matter.

initglm

set to TRUE to compute starting values for an ordinal model by using glm.fit to fit a binary logistic model for predicting the probability that y exceeds or equals the median of y. After fitting the binary model, the usual starting estimates for intercepts (log odds of cumulative raw proportions) are all adjusted so that the intercept corresponding to the median is the one from glm.fit.

y.precision

when y`` is numeric, values may need to be rounded to avoid unpredictable behavior with [unique()] with floating-point numbers. Default is to round floating point y` to 7 decimal places.

Value

a list with the following elements:

call: the R call to lrm.fit
freq: vector of y frequencies
ymedian: median of original y values if y is numeric, otherwise the median of the integer-recorded version of y
yunique: vector of distinct original y values, subject to rounding
sumty: vector of weighted y frequencies
stats: vector with a large number of indexes and model parameters (NULL if compstats=FALSE):
- Obs: number of observations
- Max Deriv: maximum absolute gradiant
- Model L.R.: overall model LR chi-square statistic
- d.f.: degrees of freedom (number of non-intercepts)
- P: p-value for the overall Model L.R. and d.f.
- C: concordance probability between predicted probability and y
- Dxy: Somer's Dxy rank correlation between predicted probability and y, = 2(C - 0.5)
- Gamma:
- Tau-a:
- R2: documented here; the first element, with the plain 'R2' name is Nagelkerke's $R^2$
- Brier: Brier score. For ordinal models this is computed with respect the the median intercept.
- g: g-index (Gini's mean difference of linear predictors)
- gr: g-index on the odds ratio scale
- gp: g-index on the probability scale
fail: TRUE if any matrix inversion or failure to converge occurred, FALSE otherwise
coefficients:
info.matrix: normally a list of 3 elements a, b, ab with a being a $k x 2$ matrix for $k$ intercepts, b being $p x p$ for $p$ predictors, and ab being $k x p$. See infoMxop() for easy ways of operating on these 3 elements. When info.matrix is not a 3-element list, as when transx=TRUE, it will have an intercepts attribute defining the number of intercepts in the model.
u: gradient vector
iter: number of iterations required. For some optimization methods this is a vector.
deviance: vector of deviances: intercepts-only, intercepts + offset (if offset is present), final model (if x is used)
non.slopes: number of intercepts in the model
linear.predictors: vector of linear predictors at the median intercept
penalty.matrix: penalty matrix or NULL
weights: weights or NULL
xbar: vector of column means of x, or NULL if transx=FALSE
xtrans: input value of transx
R: R matrix from QR to be used to rotate parameters back to original scale in the future
Ri: inverse of R
opt_method: input value

Details

Fits a binary or propoortional odds ordinal logistic model for a given design matrix and response vector with no missing values in either. Ordinary or quadratic penalized maximum likelihood estimation is used.

lrm.fit implements a large number of optimization algorithms with the default being Newton-Raphson with step-halving. For binary logistic regression without penalization iteratively reweighted least squares method in stats::glm.fit() is an option. The -2 log likeilhood, gradient, and Hessian (negative information) matrix are computed in Fortran for speed. Optionally, the x matrix is mean-centered and QR-factored to help in optimization when there are strong collinearities. Parameter estimates and the covariance matrix are adjusted to the original x scale after fitting. More detail and comparisons of the various optimization methods may be found here. For ordinal regression with a large number of intercepts (distinct y values less one) you may want to use `optim_method='BFGS', which does away with the need to compute the Hessian. This will be helpful if statistical tests and confidence intervals are not being computed, or when only likelihood ratio tests are done.

When using Newton-Raphson or Levenberg-Marquardt optimization, sparse Hessian/information/variance-covariance matrices are used throughout. For nlminb the Hessian has to be expanded into full non-sparse form, so nlminb will not be very efficient for a large number of intercepts.

When there is complete separation (Hauck-Donner condition), i.e., the MLE of a coefficient is $\pm\infty$, and y is binary and there is no penalty, glm.fit may not converge because it does not have a convergence parameter for the deviance. Setting trace=1 will reveal that the -2LL is approaching zero but doesn't get there, relatively speaking. In such cases the default of NR with eps=5e-4 or using nlminb with its default of abstol=0.001 works well.

Author

Frank Harrell fh@fharrell.com

Examples

if (FALSE) { # \dontrun{
# Fit an additive logistic model containing numeric predictors age,
# blood.pressure, and sex, assumed to be already properly coded and
# transformed

fit <- lrm.fit(cbind(age,blood.pressure,sex=='male'), death)
} # }