binregplot.RdCreates a display of observed and fitted values for a binary regression model with one numeric predictor, conditioned by zero or many co-factors.
binreg_plot(model, main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL,
pred_var = NULL, pred_range = c("data", "xlim"),
group_vars = NULL, base_level = NULL, subset,
type = c("response", "link"), conf_level = 0.95, delta = FALSE,
pch = NULL, cex = 0.6, jitter_factor = 0.1,
lwd = 5, lty = 1, point_size = 0, col_lines = NULL, col_bands = NULL,
legend = TRUE, legend_pos = NULL, legend_inset = c(0, 0.1),
legend_vgap = unit(0.5, "lines"),
labels = FALSE, labels_pos = c("right", "left"),
labels_just = c("left","center"), labels_offset = c(0.01, 0),
gp_main = gpar(fontface = "bold", fontsize = 14),
gp_legend_frame = gpar(lwd = 1, col = "black"),
gp_legend_title = gpar(fontface = "bold"),
newpage = TRUE, pop = FALSE, return_grob = FALSE)
grid_abline(a, b, ...)a binary regression model fitted with glm.
user-specified main title.
x-axis label. Defaults to the name of the (first) numeric predictor.
y-axis label. Defaults to the name of the response - within either 'P(...)' or 'logit(...)', depending on the response type.
Range of the x-axis. Defaults to the range of the numeric predictor.
Range of the y-axis. Defaults to the unit interval on
probability scale or the fitted values range on the link scale,
depending on type.
character string of length 1 giving the name of the numeric predictor. Defaults to the first one found in the data set.
"data", "xlim", or a numeric vector.
If "data", the numeric predictor corresponds to the observed values. If
"xlim", 100 values are taken from the "xlim"
range. A numeric vector will be interpreted as the values to be predicted.
optional character string of conditioning
variables. Defaults to all factors found in the data set, response
excluded. If FALSE, no variables are used for conditioning.
vector of length one. If the response is a vector,
this specifies the base ('no effect') value of the
response variable
(e.g., "Placebo", 0, FALSE, etc.) and defaults
to the first level for
factor responses, or 0 for numeric/binary variables. This controls
which observations will be plotted on the top or the bottom of the
display. If the response is a matrix with success and failure
column, this specifies the one to be interpreted as failure
(default: 2), either as an integer, or as a
string ("success" or "failure"). The proportions of
successes will be plotted as observed values.
an optional vector specifying a subset of the data rows. The value is evaluated in the data environment, so expressions can be used to select the data (see examples).
either "response" or "link" to select the scale of the fitted values. The y-axis will be adapted accordingly.
confidence level used for calculating confidence bands.
logical; indicates whether the delta method should be employed for calculating the limits of the confidence band or not (see details).
character or numeric vector of symbols used for plotting the (possibly conditioned) observed values, recycled as needed.
size of the plot symbols (in lines).
argument passed to jitter
used for the points representing the observed values.
Line width for the fitted values.
Line type for the fitted values.
size of points for the fitted values in char units (default: 0, so no points are plotted).
character vector specifying the colors of the fitted
lines and confidence bands,
by default chosen with rainbow_hcl. The
confidence bands are using alpha blending with alpha = 0.2.
logical; if TRUE (default), a legend is drawn.
numeric vector of length 2, specifying x and y
coordinates of the legend, or a character string (e.g., "topleft",
"center" etc.). Defaults to "topleft" if the fitted curve's slope is
positive, and "topright" else.
numeric vector or length 2 specifying the inset from the legend's x and y coordinates in npc units.
vertical space between the legend's line entries.
logical; if TRUE, labels corresponding to the
factor levels are plotted next to the fitted lines.
either "right" or "left", determining on which side
of the fitted lines (start or end) the labels should be placed.
character vector of length 2, specifying the
relative justification of the labels to their coordinates. See the
documentation of the just parameter of
grid.text for more details.
numeric vector of length 2, specifying the offset of the labels' coordinates in npc units.
object of class "gpar" used for the main title.
object of class "gpar" used for the
legend frame.
object of class "gpar" used for the
legend title.
logical; if TRUE, the plot is drawn on a new page.
logical; if TRUE, all newly generated viewports are
popped after plotting.
logical. Should a snapshot of the display be returned as a grid grob?
intercept; alternatively, a regression model from which
coefficients can be extracted via coef.
slope.
Further arguments passed to grid.abline.
The primary purpose of binreg_plot() is to visualize observed and
fitted values for binary regression models (like the logistic or probit
regression model) with one numeric predictor. If one or more
categorical predictors are used in the model, the fitted values are
conditioned on them, i.e. separate curves are drawn corresponding to
the factor level combinations. Thus, it shows a full-model plot, not a
conditional plot where several models would be fit to data subsets.
The implementation relies on objects returned by
glm, as it uses its "terms" and
"model" components.
The function tries to determine suitable values for the legend and/or labels, but depending on the data, this might require some tweaking.
By default, the limits of the confidence band are determined for the
linear predictor (i.e., on the link scale) and transformed to response
scale (if this is the chosen plot type) using the inverse link
function. If delta is TRUE, the limits
are determined on the response scale. Note that the resulting band using the
delta method is symmetric around the fitted mean,
but may exceed the unit interval (on the response scale) and
will be cut off.
grid_abline() is a simple convenience wrapper for
grid.abline with similar behavior than
abline in that it extracts coefficients from
a regression model, if given instead of the intercept a.
if return_grob is TRUE, a grob object corresponding to
the plot. NULL (invisibly) else.
Michael Friendly (2000), Visualizing Categorical Data. SAS Institute, Cary, NC.
## Simple model with no conditioning variables
art.mod0 <- glm(Improved > "None" ~ Age, data = Arthritis, family = binomial)
binreg_plot(art.mod0, "Arthritis Data")
binreg_plot(art.mod0, type = "link") ## logit scale
## one conditioning factor
art.mod1 <- update(art.mod0, . ~ . + Sex)
binreg_plot(art.mod1)
binreg_plot(art.mod1, legend = FALSE, labels = TRUE, xlim = c(20, 80))
## two conditioning factors
art.mod2 <- update(art.mod1, . ~ . + Treatment)
binreg_plot(art.mod2)
binreg_plot(art.mod2, subset = Sex == "Male") ## subsetting
## some tweaking
binreg_plot(art.mod2, gp_legend_frame = gpar(col = NA, fill = "white"), col_bands = NA)
binreg_plot(art.mod2, legend = FALSE, labels = TRUE,
labels_pos = "left", labels_just = c("left", "top"))
## model with grouped response data
shuttle.mod <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature,
data = SpaceShuttle, na.action = na.exclude, family = binomial)
binreg_plot(shuttle.mod, xlim = c(30, 81), pred_range = "xlim",
ylab = "O-Ring Failure Probability", xlab = "Temperature (F)")