Wolfgang Ludwig-Mayerhofer's Introduction to TDA

Incorporating time-dependent covariates in the Cox model

The conventional way of using time-dependent covariates within the Partial Likelihood framework, which is implemented in several well-known statistical packages (including SPSS for Windows), but as well in TDA, makes use of the programs' feature to relate variables to the process time. This is the way the problem is discussed by Yamaguchi, and since TDA differs not very much from BMDP in this respect, the solutions Yamaguchi offers may easily be transferred. Note, however, that things get a little bit complicated due to the fact that TDA, in contrast to BMDP, cannot handle zero durations. This also affects the way time-dependent covariates are to be defined.

Let's look first at the more simple case, marriage. The data set contains the variable MRG which is coded in a way that reflects the time elapsed from a fixed reference point in time (January 1980), common to all subjects in the study. Thus, case # 8 has a marriage date of 33, that is, 33 months after the beginning of the year 1980. A value of 99 indicates that the subject has never married during the observation period. To relate the time of marriage to the process of visiting college, we have to consider the variable STM, which gives the month in which the student started going to college. Case # 8, for instance started college in month 8, which means that he married 25 months after starting college. Thus, what we have to compare to the process time, which is referred to in TDA (as in BMDP) by the name "time", is the difference between MRG and STM. If this difference is smaller than the time elapsed since the student started college, then obviously the student has married (this includes cases like # 35 who has married 2 months prior to starting college, thus being married from the beginning). Note that, for instance, student # 8 has dropped out of college after 4 months, that is, about 21 months before marrying, and so he, like others, will never reach the point in time when he married, as far as our analysis of visiting college is concerned.

Time-dependent covariates are incorporated in the edef (  ); section. In the example data set, starting month is V9 and time of marriage is V8. Then, neglecting for a moment the fact that the duration variable has been modified to account for the zero duration (see "Analysis without time-depenent covariates"), we will add a line as follows:

V11 (Marriage) = ge(time + V9 - V8,0),

This means that as soon as the process time + time of marriage (measured in relation to process time, i.e. with starting month subtracted) is equal or greater to 0 (zero), variable V11 will have value 1; otherwise, V11 will have a value of 0 (zero). Thus, we will have a dummy variable with "1" indicating "married".

Now we have to account for the fact that a constant of 1 was added to the duration variable to avoid the problem of the zero duration. This can be achieved by subtracting a value of "1" from the marriage variable, but I may mention that it does not change the results in a really substantial way. Still, the line of code in my example command files looks like this:

V11 (Marriage) = ge(time + V9 - V8 - 1,0),

Coding the month variables to represent the seasonal effects is a litte bit more sophisticated. Here, process time (for each individual case) has to be related to the starting month in a way to yield information about the actual month in calendar time. Again, process time has to be added to the starting month to give the information, say, that when a student has started college in month 9, 21 months later we have month 30 in calendar time. But how can we tell TDA that month 30 is equivalent to June? Again, Yamaguchi provides a fine solution: For each point in time, we have to subtract a multiple of 12, such that the remainder is in the range of 1 (January) to 11 (November), with December coded as zero.

In the following, I give (parts of) my way of coding this in TDA. Note that in the BMDP example provided by Yamaguchi, he has taken great care of making every computational step explicit, while I have tried to make the code as tight as possible.

V12 (January) = eq (
(time + V9-1)%12,1
),
.
.
.
V21 (November) = eq (
(time + V9-1)%12,11
),
V21 (December) = eq (
(time + V9-1)%12,0
),

What is it these commands are doing? The expression (time + V9-1)%12 first computes the sum of "time" (i.e., the process time) and V9 (calendar time) and subtracts a value of 1. The latter is necessary because if a person actually dropped out in, say, May, for TDA this looks like June, as we have added a constant of 1 to the duration variable. This entire sum (in parentheses) is then divided by 12 and the remainder is then compared to the value after the comma. In other words, % is TDA's modulus operator. Thus, for instance, if the expression time + V9-1 results in a value of 9, 9/12 yields 0, remainder 9. If time + V9-1 results in 24, 24/12 yields 2, remainder 0, and obviously month 24 indeed is December (of the second year in college).

Note once again that all these commands have to be included in the edef (  ); part of your command file. Do not forget to include the variables you have created here in the list of covariates in the rate (  ); part of your command file.

The example command file with all the commands is y152_2a.cf.

Last update: 28 Jan 2000