Internet Guide to Stata |
Print article |
Stata can estimate a number of parametric models. The people who wrote the estimation procedures distinguish two classes of models, proportional hazard models and accelerated failure time (AFT) models. This distinction is often, but not universally made in the literature. In fact, there are two models that can be expressed both as proportional hazard and as AFT models, to wit, the exponential and the Weibull model.
The models discussed here are requested by streg. Note that, just as in the case of graphing survivor functions with sts, information about time to failure and about censoring is provided via the stset command. Thus in the streg command these variables do not appear. You will start right away with indicating covariates and with options that define and specify your model.
Thus,
streg status gender i.group, distribution(e)
will estimate an exponential model with "status", "gender" and "group" (treated as a factor variable) as covariates. option distribution(e) may be abbreviated as d(e) and stands for the exponential model.
Here is a list of the options for the available models:
d(e) | exponential model |
d(w) | Weibull model |
d(gom) | Gompertz model |
d(ll) or d(logl) | loglogistic model |
d(ln) or d(logn) | lognormal model | d(gam) | generalized gamma model |
Some further options that may be of interest:
nohr | can be used with exponential, Weibull and Gompertz models to obtain regression coefficients instead of hazard ratios. |
time or t | can be used with exponential and Weibull models to obtain accelerated failure time (instead of proportional hazard) specification. |
frailty(gamma) or fr(g) | adds a term for unobserved heterogeneity (or frailty) that follows a gamma distribution. |
frailty(invgaussian) or fr(i) | adds a term for unobserved heterogeneity (or frailty) that follows an inverse Gaussian distribution. |
anc(list of variables) | the "ancillary" (or shape) parameter that is part of most models can be modelled as a function of the coviarates on varlist (cannot be used together with frailty). |
anc2(list of variables) | similar to "anc", this option refers to the second ancillary parameter of the gamma distribution. |
estat ic | will display the AIC and BIC |
stcurve, hazard | will produce a graph of the estimated hazard rate. Alternatively, options surv, cumhaz or cif (cumulative incidence function) are available. |
Some special options for stcurve are exemplified in the following table:
, at1(var15=1) at2(var15=2) | will display the requested graph twice, once for var15 = 1 (and all other variables set to their mean) and once for var15 = 2 (and all other variables set to their mean). |
, range(0 240) | will display the graph for times from 0 to 240.. |
Episode splitting can have two purposes: First, it may be necessary for incorporating time-varying covariates. Second, it is mandatory for estimation of the piecewise constant exponential model.
At the moment, I can present only a few remarks:
Note that splitting episodes means that at least some cases are represented by more than one episode. Stata has to be informed about this when you stset our data with the id option:
stset duration, failure(event) id(caseid)
Here, "duration" is the variable that informs stata about failure (or censoring times, "event" is the binary variable that informs stata whether a case is censored (event=0) or not (event=1), and finally, "caseid" is the variable that uniquely identifies each case. Of course, this variable might as well be called "id".
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 25 Jun 2013