Logistic Model Fitter
Usage
lrm.fit(
x,
y,
offset = 0,
initial,
opt_method = c("NR", "nlminb", "LM", "glm.fit", "nlm", "BFGS", "L-BFGS-B", "CG",
"Nelder-Mead"),
maxit = 50,
reltol = 1e-10,
abstol = if (opt_method %in% c("NR", "LM")) 1e+10 else 0,
gradtol = if (opt_method %in% c("NR", "LM")) 0.001 else 1e-05,
factr = 1e+07,
eps = 5e-04,
minstepsize = 0.01,
trace = 0,
tol = .Machine$double.eps,
penalty.matrix = NULL,
weights = NULL,
normwt = FALSE,
transx = FALSE,
compstats = TRUE,
inclpen = TRUE,
initglm = FALSE,
y.precision = 7
)Arguments
- x
design matrix with no column for an intercept. If a vector is transformed to a one-column matrix.
- y
response vector, numeric, categorical, or character. For ordinal regression, the order of categories comes from
factorlevels, and ifyis not a factor, from the numerical or alphabetic order ofyvalues.- offset
optional numeric vector containing an offset on the logit scale
- initial
vector of initial parameter estimates, beginning with the intercepts
- opt_method
optimization method, with possible values
'NR': the default, standard Newton-Raphson iteration using the gradient and Hessian, with step-helving. All three convergence criteria ofeps, gradtol, abstolmust be satisfied. Relax some of these if you do not want to consider some of them at all in judging convergence. The defaults for the various tolerances forNRresult in convergence being mainly judged byepsin most uses. Tighten the non-epsparameters to give more weight to the other criteria.'LM': the Levenberg-Marquardt method, with the same convergence criteria as'NR''nlminb': a quasi-Newton method usingstats::nlminb()which uses gradients and the Hessian. This is a fast and robust algorithm.'glm.fit': for binaryywithout penalization only'nlm': seestats::nlm(); not highly recommended'BFGS':'L-BFGS-B':'CG':'Nelder-Mead': seestats::optim()for these 4 methods
- maxit
maximum number of iterations allowed, which means different things for different
opt_method. ForNRit is the number of updates to parameters not counting step-halving steps. Whenmaxit=1,initialis assumed to contain the maximum likelihood estimates already, and those are returned ascoefficients, along withu,info.matrix(negative Hessian) anddeviance.statsare only computed ifcompstatsis explicitly set toTRUEby the user.- reltol
used by
BFGS,nlminb,glm.fitto specify the convergence criteria in relative terms with regard to -2 LL, i.e., convergence is assume when one minus the fold-change falls belowreltol- abstol
used by
NR(maximum absolute change in parameter estimates from one iteration to the next before convergence can be declared; by default has no effect),nlminb(by default has no effect; seeabs.tolargument; set to e.g. 0.001 fornlminbwhen there is complete separation)- gradtol
used by
NRandLM(maximum absolute gradient before convergence can be declared) andnlm(similar but for a scaled gradient). ForNRandLMgradtolis multiplied by the the sample size / 1000, because the gradient is proportional to sample size.- factr
see
stats::optim()documentation forL-BFGS-B- eps
difference in -2 log likelihood for declaring convergence with
opt_method='NR'. At present, the oldlrm.fitapproach of still declaring convergence even if the -2 LL gets worse byeps/10while the maximum absolute gradient is below 1e-9 is not implemented. This handles the case where the initial estimates are actually MLEs, and prevents endless step-halving.- minstepsize
used with
opt_method='NR'to specify when to abandon step-halving- trace
set to a positive integer to trace the iterative process. Some optimization methods distinguish
trace=1fromtracehigher than 1.- tol
QR singularity criterion for
opt_method='NR'updates; ignored when inverting the final information matrix becausecholis used for that.- penalty.matrix
a self-contained ready-to-use penalty matrix - see
lrm(). It is \(p x p\) where \(p\) is the number of columns ofx.- weights
a vector (same length as
y) of possibly fractional case weights- normwt
set to
TRUEto scaleweightsso they sum to \(n\), the length ofy; useful for sample surveys as opposed to the default of frequency weighting- transx
set to
TRUEto centerxand QR-factor it to orthogonalize. See this for details.- compstats
set to
FALSEto prevent the calculation of the vector of model statistics- inclpen
set to
FALSEto not include the penalty matrix in the Hessian when the Hessian is being computed on transformedx, vs. adding the penalty after back-transforming. This should not matter.- initglm
set to
TRUEto compute starting values for an ordinal model by usingglm.fitto fit a binary logistic model for predicting the probability thatyexceeds or equals the median ofy. After fitting the binary model, the usual starting estimates for intercepts (log odds of cumulative raw proportions) are all adjusted so that the intercept corresponding to the median is the one fromglm.fit.- y.precision
when
y`` is numeric, values may need to be rounded to avoid unpredictable behavior with [unique()] with floating-point numbers. Default is to round floating pointy` to 7 decimal places.
Value
a list with the following elements:
call: the R call tolrm.fitfreq: vector ofyfrequenciesymedian: median of originalyvalues ifyis numeric, otherwise the median of the integer-recorded version ofyyunique: vector of distinct originalyvalues, subject to roundingsumty: vector of weightedyfrequenciesstats: vector with a large number of indexes and model parameters (NULLifcompstats=FALSE):Obs: number of observationsMax Deriv: maximum absolute gradiantModel L.R.: overall model LR chi-square statisticd.f.: degrees of freedom (number of non-intercepts)P: p-value for the overallModel L.R.andd.f.C: concordance probability between predicted probability andyDxy: Somer's Dxy rank correlation between predicted probability andy, = 2(C - 0.5)Gamma:Tau-a:R2: documented here; the first element, with the plain'R2'name is Nagelkerke's \(R^2\)Brier: Brier score. For ordinal models this is computed with respect the the median intercept.g: g-index (Gini's mean difference of linear predictors)gr: g-index on the odds ratio scalegp: g-index on the probability scale
fail:TRUEif any matrix inversion or failure to converge occurred,FALSEotherwisecoefficients:info.matrix: normally a list of 3 elementsa,b,abwithabeing a $k x 2$ matrix for $k$ intercepts,bbeing $p x p$ for $p$ predictors, andabbeing $k x p$. SeeinfoMxop()for easy ways of operating on these 3 elements. Wheninfo.matrixis not a 3-element list, as whentransx=TRUE, it will have aninterceptsattribute defining the number of intercepts in the model.u: gradient vectoriter: number of iterations required. For some optimization methods this is a vector.deviance: vector of deviances: intercepts-only, intercepts + offset (ifoffsetis present), final model (ifxis used)non.slopes: number of intercepts in the modellinear.predictors: vector of linear predictors at the median interceptpenalty.matrix: penalty matrix orNULLweights:weightsorNULLxbar: vector of column means ofx, orNULLiftransx=FALSExtrans: input value oftransxR: R matrix from QR to be used to rotate parameters back to original scale in the futureRi: inverse ofRopt_method: input value
Details
Fits a binary or propoortional odds ordinal logistic model for a given design matrix and response vector with no missing values in either. Ordinary or quadratic penalized maximum likelihood estimation is used.
lrm.fit implements a large number of optimization algorithms with the default being Newton-Raphson with step-halving. For binary logistic regression without penalization iteratively reweighted least squares method in stats::glm.fit() is an option. The -2 log likeilhood, gradient, and Hessian (negative information) matrix are computed in Fortran for speed. Optionally, the x matrix is mean-centered and QR-factored to help in optimization when there are strong collinearities. Parameter estimates and the covariance matrix are adjusted to the original x scale after fitting. More detail and comparisons of the various optimization methods may be found here. For ordinal regression with a large number of intercepts (distinct y values less one) you may want to use `optim_method='BFGS', which does away with the need to compute the Hessian. This will be helpful if statistical tests and confidence intervals are not being computed, or when only likelihood ratio tests are done.
When using Newton-Raphson or Levenberg-Marquardt optimization, sparse Hessian/information/variance-covariance matrices are used throughout. For nlminb the Hessian has to be expanded into full non-sparse form, so nlminb will not be very efficient for a large number of intercepts.
When there is complete separation (Hauck-Donner condition), i.e., the MLE of a coefficient is \(\pm\infty\), and y is binary and there is no penalty, glm.fit may not converge because it does not have a convergence parameter for the deviance. Setting trace=1 will reveal that the -2LL is approaching zero but doesn't get there, relatively speaking. In such cases the default of NR with eps=5e-4 or using nlminb with its default of abstol=0.001 works well.
Author
Frank Harrell fh@fharrell.com
Examples
if (FALSE) { # \dontrun{
# Fit an additive logistic model containing numeric predictors age,
# blood.pressure, and sex, assumed to be already properly coded and
# transformed
fit <- lrm.fit(cbind(age,blood.pressure,sex=='male'), death)
} # }