Setting Time-to-Event Data and Doing First Analyses

stset

For most analyses of time to event data, you first have to "stset" your data. This means that you provide information for Stata about the "duration" and the "event" variables and possible other things that are important. Note that events are termed "failures" in Stata's handbook and in the help system. The "event" (or failure) variable distinguishes observations where an event occurred from censored observations (no failure was observed).

In the case of single event data (i.e. one record per observation), typically you will use one of the following commands:

stset duration, failure(event)

informs Stata that time to event is stored in variable "duration" (of course, the variable may be named differently); it is assumed that the variable "event" (or whatever it is named) has value 0 or missing in the case of censored observations, with all other values indicating failures (events).

stset duration, f(event== 2 3)

informs Stata that time to event is stored in variable "duration" and that the "events" (or failures) are denoted by values 2 and 3, with all other values indicating censored observations failure has been abbreviated to f).

If all observations end in failure (something that occurs rarely in social science applications), you may omit the failure option.

Some further options for single-episode data:

exit(): This option restricts observation (and thus, analysis) time. By way of example, exit(time 36) could be used to restrict analysis to the first three years (provided that time scale is measured in months and starts at 0). All cases with an observation time longer than 36 months will be treated as censored at 36 months. Note that you may also indicate the name of a variable instead of a fixed number (or some other mathematical expression).

origin(): This option is required when the time cases are "at risk" (that is, analysis time) does not start at 0 (wich is assumed by default). Again, you may indicate a number or some expression, or alternatively a variable that indicates the beginning of analysis time, for instance, the time an individual became unemployed. An example for the latter might be origin(time uedate), with "uedate" as the variable that contains date of unemployment.

enter(): This option is required when the beginning of being "at risk" and the beginning of observation do not coincide. Imagine a study in which being "at risk" (of whatever) may start at birth, but children enter the study only after a couple of months or years.

scale(): This option transforms analysis time by a given factor. Assuming, e.g., that time is measured in months but you wish it to be expressed as years (e.g., in graphs), you may use scale(12).

Note that there are more options, not least for repeated event data (multiple observations per case), which are not covered here. Also, the options explained here may look different in the case of multiple observation data.

Survivor or (cumulative) hazard functions

sts graph

will produce the graph of the survivor function, estimated via the Kaplan-Meier procedure. You can see that no information is provided about time-to-event or censored observations; this has been done via the stset command described in the previous section.

sts graph, failure

will graph 1-S(t), whereas

sts graph, hazard by(group)

will graph a (smoothed) hazard rate for each value of variable group.

Note that as a further graph option, cumhaz will produce a graph of the cumulative hazard. The by option, mentioned here together with hazard, can be used for all graphs.