Regression for a Parametric Survival Model

Fit a parametric survival regression model. These are location-scale models for an arbitrary transform of the time variable; the most common cases use a log transformation, leading to accelerated failure time models.

Usage

survreg(formula, data, weights, subset, 
        na.action, dist="weibull", init=NULL, scale=0, 
        control,parms=NULL,model=FALSE, x=FALSE,
        y=TRUE, robust=FALSE, cluster, score=FALSE, ...)

Arguments

formula: a formula expression as for other regression models. The response is usually a survival object as returned by the Surv function. See the documentation for Surv, lm and formula for details.
data: a data frame in which to interpret the variables named in the formula, weights or the subset arguments.
weights: optional vector of case weights
subset: subset of the observations to be used in the fit
na.action: a missing-data filter function, applied to the model.frame, after any subset argument has been used. Default is options()\$na.action.
dist: assumed distribution for y variable. If the argument is a character string, then it is assumed to name an element from survreg.distributions. These include "weibull", "exponential", "gaussian", "logistic","lognormal" and "loglogistic". Otherwise, it is assumed to be a user defined list conforming to the format described in survreg.distributions.
parms: a list of fixed parameters. For the t-distribution for instance this is the degrees of freedom; most of the distributions have no parameters.
init: optional vector of initial values for the parameters.
scale: optional fixed value for the scale. If set to <=0 then the scale is estimated.
control: a list of control values, in the format produced by survreg.control. The default value is survreg.control()
model,x,y: flags to control what is returned. If any of these is true, then the model frame, the model matrix, and/or the vector of response times will be returned as components of the final result, with the same names as the flag arguments.
score: return the score vector. (This is expected to be zero upon successful convergence.)
robust: Use robust sandwich error instead of the asymptotic formula. Defaults to TRUE if there is a cluster argument.
cluster: Optional variable that identifies groups of subjects, used in computing the robust variance. Like model variables, this is searched for in the dataset pointed to by the data argument.
...: other arguments which will be passed to survreg.control.

Value

an object of class survreg is returned.

Details

All the distributions are cast into a location-scale framework, based on chapter 2.2 of Kalbfleisch and Prentice. The resulting parameterization of the distributions is sometimes (e.g. gaussian) identical to the usual form found in statistics textbooks, but other times (e.g. Weibull) it is not. See the book for detailed formulas.

When using weights be aware of the difference between replication weights and sampling weights. In the former, a weight of '2' means that there are two identical observations, which have been combined into a single row of data. With sampling weights there is a single observed value, with a weight used to achieve balance with respect to some population. To get proper variance with replication weights use the default variance, for sampling weights use the robust variance. Replication weights were once common (when computer memory was much smaller) but are now rare.

All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula. Note that values calculated inside the formula, such as mean(x), are evaluated before subsetting - which may lead to unexpected results if used with subset. For more information see the Details section of model.frame.

References

Kalbfleisch, J. D. and Prentice, R. L., The statistical analysis of failure time data, Wiley, 2002.

Examples

# Fit an exponential model: the two fits are the same
survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, dist='weibull',
                                    scale=1)
#> Call:
#> survreg(formula = Surv(futime, fustat) ~ ecog.ps + rx, data = ovarian, 
#>     dist = "weibull", scale = 1)
#> 
#> Coefficients:
#> (Intercept)     ecog.ps          rx 
#>   6.9618376  -0.4331347   0.5815027 
#> 
#> Scale fixed at 1 
#> 
#> Loglik(model)= -97.2   Loglik(intercept only)= -98
#> 	Chisq= 1.67 on 2 degrees of freedom, p= 0.434 
#> n= 26 
survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian,
        dist="exponential")
#> Call:
#> survreg(formula = Surv(futime, fustat) ~ ecog.ps + rx, data = ovarian, 
#>     dist = "exponential")
#> 
#> Coefficients:
#> (Intercept)     ecog.ps          rx 
#>   6.9618376  -0.4331347   0.5815027 
#> 
#> Scale fixed at 1 
#> 
#> Loglik(model)= -97.2   Loglik(intercept only)= -98
#> 	Chisq= 1.67 on 2 degrees of freedom, p= 0.434 
#> n= 26 

#
# A model with different baseline survival shapes for two groups, i.e.,
#   two different scale parameters
survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), lung)
#> Call:
#> survreg(formula = Surv(time, status) ~ ph.ecog + age + strata(sex), 
#>     data = lung)
#> 
#> Coefficients:
#> (Intercept)     ph.ecog         age 
#>  6.73234505 -0.32443043 -0.00580889 
#> 
#> Scale:
#>     sex=1     sex=2 
#> 0.7834211 0.6547830 
#> 
#> Loglik(model)= -1137.3   Loglik(intercept only)= -1146.2
#> 	Chisq= 17.8 on 2 degrees of freedom, p= 0.000137 
#> n=227 (1 observation deleted due to missingness)

# There are multiple ways to parameterize a Weibull distribution. The survreg 
# function embeds it in a general location-scale family, which is a 
# different parameterization than the rweibull function, and often leads
# to confusion.
#   survreg's scale  =    1/(rweibull shape)
#   survreg's intercept = log(rweibull scale)
#   For the log-likelihood all parameterizations lead to the same value.
y <- rweibull(1000, shape=2, scale=5)
survreg(Surv(y)~1, dist="weibull")
#> Call:
#> survreg(formula = Surv(y) ~ 1, dist = "weibull")
#> 
#> Coefficients:
#> (Intercept) 
#>    1.612813 
#> 
#> Scale= 0.4869062 
#> 
#> Loglik(model)= -2186   Loglik(intercept only)= -2186
#> n= 1000 

# Economists fit a model called `tobit regression', which is a standard
# linear regression with Gaussian errors, and left censored data.
tobinfit <- survreg(Surv(durable, durable>0, type='left') ~ age + quant,
              data=tobin, dist='gaussian')