Check Parallelism Assumption of Ordinal Semiparametric Models

orm models are refitted as a series of binary models for a sequence of cutoffs on the dependent variable. Regression coefficients from this sequence are plotted against cutoffs using ggplot2 with one panel per regression coefficient. When censoring is present, whether or not Y is greater than or equal to the current cutoff is not always possible, and such observations are ignored.

Usage

ordParallel(
  fit,
  which,
  terms = onlydata,
  m,
  maxcuts = 75,
  lp = FALSE,
  onlydata = FALSE,
  scale = c("iqr", "none"),
  conf.int = 0.95,
  alpha = 0.15
)

Arguments

fit: a fit object from orm with x=TRUE, y=TRUE in effect
which: specifies which columns of the design matrix are assessed. By default, all columns are analyzed.
terms: set to TRUE to collapse all components of each predictor into a single column weighted by the original regression coefficients but scaled according to scale. This means that each predictor will have a regression coefficient of 1.0 when refitting the original model on this transformed X matrix, before any further scaling. Plots will then show the relative effects over time, i.e., the slope of these combined columns over cuts on Y, so that deviations indicate non-parallelism. But since in this case only relative effects are shown, a weak predictor may be interpreted as having an exagerrated y-dependency if scale='none'. terms detauls to TRUE when onlydata=TRUE.
m: the lowest cutoff is chosen as the first Y value having at meast m observations to its left, and the highest cutoff is chosen so that there are at least m observations tot he right of it. Cutoffs are equally spaced between these values. If omitted, m is set to the minimum of 50 and one quarter of the sample size.
maxcuts: the maximum number of cutoffs analyzed
lp: plot the effect of the linear predictor across cutpoints instead of analyzing individual predictors
onlydata: set to TRUE to return a data frame suitable for modeling effects of cuts, instead of constructing a graph. The returned data frame has variables Ycut, Yge_cut, obs, and the original names of the predictors. Ycut has the cutpoint on the original scale. Yge_cut is TRUE/FALSE dependent on whether the Y variable is greater than or equal to Ycut, with NA if censoring prevented this determination. The obs variable is useful for passing as the cluster argument to robcov() to account for the high correlations in regression coefficients across cuts. See the example which computes Wald tests for parallelism where the Ycut dependence involves a spline function. But since terms was used, each predictor is reduced to a single degree of freedom.
scale: applies to terms=TRUE; set to 'none' to leave the predictor terms scaled by regression coefficient so the coefficient of each term in the overall fit is 1.0. The default is to scale terms by the interquartile-range (Gini's mean difference if IQR is zero) of the term. This prevents changes in weak predictors over different cutoffs from being impressive.
conf.int: confidence level for computing Wald confidence intervals for regression coefficients. Set to 0 to suppress confidence bands.
alpha: saturation for confidence bands

Value

ggplot2 object or a data frame

Details

Whenver a cut gives rise to extremely high standard error for a regression coefficient, the confidence limits are set to NA. Unreasonable standard errors are determined from the confidence interval width exceeding 7 times the standard error at the middle Y cut.

Author

Frank Harrell

Examples

if (FALSE) { # \dontrun{
f <- orm(..., x=TRUE, y=TRUE)
ordParallel(f, which=1:5)  # first 5 betas

getHdata(nhgh)
set.seed(1)
nhgh$ran <- runif(nrow(nhgh))
f <- orm(gh ~ rcs(age, 4) + ran, data=nhgh, x=TRUE, y=TRUE)
ordParallel(f)  # one panel per parameter (multiple parameters per predictor)
dd <- datadist(nhgh); options(datadist='dd')
ordParallel(f, terms=TRUE)
d <- ordParallel(f, maxcuts=30, onlydata=TRUE)
dd2 <- datadist(d); options(datadist='dd2')  # needed for plotting
g <- orm(Yge_cut ~ (age + ran) * rcs(Ycut, 4), data=d, x=TRUE, y=TRUE)
h <- robcov(g, d$obs)
anova(h)
qu <- quantile(d$age, c(1, 3)/4)
qu
cuts <- sort(unique(d$Ycut))
cuts
z <- contrast(h, list(age=qu[2], Ycut=cuts),
                 list(age=qu[1], Ycut=cuts))
z <- as.data.frame(z[.q(Ycut, Contrast, Lower, Upper)])
ggplot(z, aes(x=Ycut, y=Contrast)) + geom_line() +
  geom_ribbon(aes(ymin=Lower, ymax=Upper), alpha=0.2)
} # }