Parametric Regression Models for Time-to-Event Data
Stata can estimate a number of parametric models. The people who wrote the estimation procedures distinguish two classes of models, proportional hazard models and accelerated failure time (AFT) models. This distinction is often, but not universally made in the literature. In fact, there are two models that can be expressed both as proportional hazard and as AFT models, to wit, the exponential and the Weibull model.
Basic elements of regression models
The models discussed here are requested by
streg. Note that, just as in the case of graphing survivor functions with sts, information about time to failure and about censoring is provided via the
stset command. Thus in the
streg command these variables do not appear. You will start right away with indicating covariates and with options that define and specify your model.
streg status gender i.group, distribution(e)
will estimate an exponential model with "status", "gender" and "group" (treated as a factor variable) as covariates. Option
distribution(e) may be abbreviated as
d(e) and stands for the exponential model.
Here is a list of the options for the available models:
||generalized gamma model – up to (and including) Stata version 13|
||generalized gamma model – as of Stata version 14|
Some further options that may be of interest:
||can be used with exponential, Weibull and Gompertz models to obtain regression coefficients instead of hazard ratios.|
||can be used with exponential and Weibull models to obtain accelerated failure time (instead of proportional hazard) specification.|
||adds a term for unobserved heterogeneity (or frailty) that follows a gamma distribution.|
||adds a term for unobserved heterogeneity (or frailty) that follows an inverse Gaussian distribution.|
||the "ancillary" (or shape) parameter that is part of most models can be modelled as a function of the coviarates on varlist (cannot be used together with frailty).|
||similar to "anc", this option refers to the second ancillary parameter of the gamma distribution.|
||will display the AIC and BIC|
||will produce a graph of the estimated hazard rate. Alternatively, options
Some special options for
stcurve are exemplified in the following table:
||will display the requested graph twice, once for var15 = 1 (and all other variables set to their mean) and once for var15 = 2 (and all other variables set to their mean).|
||will display the graph for times from 0 to 240..|
Stata version 16, as updated on 5 November, 2020, includes some more features for specifying the covariate values for
Episode splitting can have two purposes: First, it may be necessary for incorporating time-varying covariates. Second, it is mandatory for estimation of the piecewise constant exponential model.
At the moment, I can present only a few remarks:
- Episode splitting can be achieved by procedure
stplit. However, this will not always be the most convenient way. Particularly for including time-varying covariates, it may be easier to use the
expandcommand. For instance, if you wish to split episodes at the age of first marriage, it will most likely be easier to duplicate (via
expand 2) the episodes of all persons who marry during an episode and then use standard data transformation procedures to do the rest (i.e. creating the time variable, the event ["failure"] variable and the dummy variable that indicates whether or not a person is married during the respective episode).
stplitis more convenient if you want to estimate piecewise constant models. Note, however, that Jesper Sørensen has written an ado file
stpiecewith the help of which estimation of such models is much easier. This file can be installed with the command
ssc install stpiece.
Splitting episodes means that at least some cases are represented by more than one episode. Stata has to be informed about this when the data are
stset via the
stset duration, failure(event) id(caseid)
Here, "duration" is the variable that informs stata about failure (or censoring) times, "event" is the binary variable that informs stata whether a case is censored (event=0) or not (event=1), and finally, "caseid" is the variable that uniquely identifies each case. Of course, this variable might as well be called "id".
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 13 Nov 2020