Parametric Regression Models for Time-to-Event Data

Stata can estimate a number of parametric models. The people who wrote the estimation procedures distinguish two classes of models, proportional hazard models and accelerated failure time (AFT) models. This distinction is often, but not universally made in the literature. In fact, there are two models that can be expressed both as proportional hazard and as AFT models, to wit, the exponential and the Weibull model.


Basic elements of regression models

The models discussed here are requested by streg. Note that, just as in the case of graphing survivor functions with sts, information about time to failure and about censoring is provided via the stset command. Thus in the streg command these variables do not appear. You will start right away with indicating covariates and with options that define and specify your model.

Thus,

streg status gender i.group, distribution(e)

will estimate an exponential model with "status", "gender" and "group" (treated as a factor variable) as covariates. Option distribution(e) may be abbreviated as d(e) and stands for the exponential model.

Here is a list of the options for the available models:

d(e)  exponential model
d(w)  Weibull model
d(gom)  Gompertz model
d(ll) or d(logl)  loglogistic model
d(ln) or d(logn)  lognormal model
d(gam)  generalized gamma model – up to (and including) Stata version 13 d(ggam)  generalized gamma model – as of Stata version 14

Some further options that may be of interest:

nohr can be used with exponential, Weibull and Gompertz models to obtain regression coefficients instead of hazard ratios.
time or t can be used with exponential and Weibull models to obtain accelerated failure time (instead of proportional hazard) specification.
frailty(gamma) or fr(g) adds a term for unobserved heterogeneity (or frailty) that follows a gamma distribution.
frailty(invgaussian) or fr(i) adds a term for unobserved heterogeneity (or frailty) that follows an inverse Gaussian distribution.
anc(list of variables) the "ancillary" (or shape) parameter that is part of most models can be modelled as a function of the coviarates on varlist (cannot be used together with frailty).
anc2(list of variables) similar to "anc", this option refers to the second ancillary parameter of the gamma distribution.

Postestimation

estat ic will display the AIC and BIC
stcurve, hazard will produce a graph of the estimated hazard rate. Alternatively, options surv, cumhaz or cif (cumulative incidence function) are available.

Some special options for stcurve are exemplified in the following table:

, at1(var15=1) at2(var15=2) will display the requested graph twice, once for var15 = 1 (and all other variables set to their mean) and once for var15 = 2 (and all other variables set to their mean).
, range(0 240) will display the graph for times from 0 to 240..

Episode splitting

Episode splitting can have two purposes: First, it may be necessary for incorporating time-varying covariates. Second, it is mandatory for estimation of the piecewise constant exponential model.

At the moment, I can present only a few remarks:

  • Episode splitting can be achieved by procedure stplit. However, this will not always be the most convenient way. Particularly for including time-varying covariates, it may be easier to use the expand command. For instance, if you wish to split episodes at the age of first marriage, it will most likely be easier to duplicate (via expand 2) the episodes of all persons who marry during an episode and then use standard data transformation procedures to do the rest (i.e. creating the time variable, the event ["failure"] variable and the dummy variable that indicates whether or not a person is married during the respective episode).
  • stplit is more convenient if you want to estimate piecewise constant models. Note, however, that Jesper Sørensen has written an ado file stpiece with the help of which estimation of such models is much easier. This file can be installed with the command ssc install stpiece.

Note that splitting episodes means that at least some cases are represented by more than one episode. Stata has to be informed about this when the data are stset via the id option:

stset duration, failure(event) id(caseid)

Here, "duration" is the variable that informs stata about failure (or censoring times, "event" is the binary variable that informs stata whether a case is censored (event=0) or not (event=1), and finally, "caseid" is the variable that uniquely identifies each case. Of course, this variable might as well be called "id".

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 31 May 2018