Survival Analysis

Survival analysis estimates a survivor function, based on the time that is observed until some specific event occurs (which indeed may be death - the root of these procedures lies in insurance statistics, and nowadays they are very common in medical research). Survival analysis can deal with the problem that often that specific event is not observed, either because it is an event that does not occur in all individuals (obviously I'm not talking about death now, but rather about things like marriage or getting unemployed) or because observation time is limited (such as in most medical studies, but of course also in much social science research). Cases where the specific event is not observed are called censored observations. However, they must not be omitted from the analysis; rather, we use the information that at least unto the end of our observation period, indeed no event occurred. Thus, for all cases, you need at least two variables: A time variable indicating how long the individual case was observed, and a status variable indicating whether observation terminated with or without an event.

Here I deal with two procedures that estimate survival functions, perhaps for more than one group, and in the case of more than one group also compute statistics to test whether these survival functions are significantly different. The first procedure is the Life Table estimator, which is most suitable if we have many data and/or many cases where events (and also censorings) happen at the same time - perhaps due to coarse measurements, such as when time until death is recorded in years. The second procedure is called Kaplan-Meier-estimator and should be employed if the number of observations is not too large and/or not too many events (and censorings) occur at the same time.

Example for Life Table estimation:

SURVIVAL TABLE = zeit BY g1(1 2)
  / INTERVAL = THRU 40 BY 2 THRU 120 BY 3
  / STATUS = ziel (1, 3)
  / PRINT = NOTABLE
  / PLOTS ( SURVIVAL LOGSURV ) = zeit BY g1
  / COMPARE = zeit BY g1.

The line beginning with keyword TABLE indicates (after the equals sign) the time variable and (optionally) after keyword BY a variable indicating group membership (for instance, one of several treatment regimes in a medical study). The lowest and the highest value of the grouping variable that you wish to use in the analysis follows in parentheses.

The next line tells SPSS how to compute the time intervals. Here, SPSS will group the first 40 time units into intervals of 2 units and the remaining units (up to unit 120) into intervals of 3 (everything that happens after 120 units will be ignored). The STATUS line provides SPSS with information which value(s) of the status variable is or are to be considered as events (all others will be considered as censored cases).

Finally, SPSS is told not to PRINT the life table, which may be helpful in the case of very long tables. But note that the median survival times will be displayed only if you request life tables, which you may do implicitly by just omitting the PRINT subcommand (printing the table is the default for this case) or by requesting explicitly PRINT TABLES. Also, SPSS is requested to PLOT the survivor function and the logged survivor function for the groups that are defined (in our example) by variable g1, and to COMPARE these groups using a test statistic (the test statistic presented here is quite uncommon; more common statistics are available with the Kaplan-Meier procedure).


Example for Kaplan-Meier estimation:

KM zeit BY g2
  / STATUS = ziel (1)
  / PRINT TABLE MEAN
  / PLOT SURVIVAL LOGSURV
  / TEST LOGRANK BRESLOW TARONE.

The line after keyword KM indicates the time variable and (optionally) after keyword BY a variable indicating group membership (for instance, one of several treatment regimes in a medical study). Next, SPSS is told to print a table with the estimated survivor function (be aware that each case in your data will provide one row in this table!) and to compute the mean survival times for each group.

The PLOT line informs SPSS that you want to get plots of the survivor function and the logged survivor function. The TEST line requests three very common test statistics.

© W. Ludwig-Mayerhofer, IGSW | Last update: 21 May 2010