Wolfgang Ludwig-Mayerhofer's Introduction to TDA

Analysis of an example data set

The data set to which I will refer in the following has been taken from Yamaguchi (1991), Table 6.6. It comes from a larger study of high school seniors, whose main results have been published by Coleman/Hoffer (1987) and Coleman/Hoffer/Kilgore (1982). The data refer to persons who have entered four year colleges and for whom follow-up information was gathered about whether they finished college or dropped out of it prematurely. You can download these data here as an ASCII file named yam152_a.dat. A very brief description of the data is given in chapter 1 (see Introduction, The data set); for a full discussion, I may refer you to Prof Yamaguchi's book mentioned in the introduction.

This data set is interesting for several reasons. The first is technical. The data set contains one duration with value zero, and we may learn something about how such values may arise and, above all, how to handle them. Second, more on the substantial level, we may discuss a few aspects of data analysis and see how different definitions of variables may influence the results obtained. Third, and most important, there are two kinds of time-dependent covariates involved, and we may therefore use this occasion to rehearse TDA's capabilities to deal with this type of variables.

Although this paper is about TDA, and not about event history analysis, a word on the kind of model used is in order. Time-dependent covariates are closely associated with one special model, the Partial Likelihood model developed by Cox. The main reason is that time-dependent covariates are easily incorporated in this model. But the method of episode splitting may be employed to create a data set with time-dependent covariates that can be used in the estimation of any of TDA's models. More on this may be found in Blossfeld/Hamerle/Mayer (1986, 1989), although the example they provide is misleading, at least in the German edition (I don't know the English one). (The example will do in the framework of Cox models and simple exponential models, but not when models with time varying hazard rates are estimated.) In the following, both methods of dealing with time-dependent covariates will be employed, and therefore we will stick to the Cox model most of the time.

Last update: 1 Dec 1999