Defining key variables

Wolfgang Ludwig-Mayerhofer's Introduction to TDA

Defining key variables for event history analysis

Transition data consist of three types of variables:

1. State variables,
2. duration variables, and
3. covariates.

The state variables indicate the "state space" in which individuals may occupy one or several states during the observation period. In many programs for transition data, it is assumed that all individuals start in the same state and that the only information about the destination state is whether a transition to that state has occurred or not. (This is the STATUS variable used in many programs.) TDA offers a much more flexible way of handling the state space. The origin state and the destination state have to be defined explicitly, and therefore transitions from and to several states may be analyzed simultaneously. Censored observations (observations that have not left the origin state) are those in which the destination state is identical to the origin state. However, there is no problem if your data are set up in the traditional way (as it is in our example), that is, when you have no origin state variable. Since this is meaningful only when the origin state is the same for all observations, you may create your origin state variable by choosing an appropriate constant.

Note that the state variables may assume several values. For instance, your data set may consist of episodes of, say, full employment, part-time employment, and unemployment, that are coded as, say, 1, 2 and 3. In the same way, you may wish to distinguish several destination states, be it the three states mentioned or possible additional states (for instance, retirement).

Information about duration is also employed in a way that is different from that of most other programs. Usually, the data contain a single variable indicating the time elapsed until an individual was observed either as experiencing an event or as censored. With TDA, time of start and time of last observation have to be defined explicitly, the duration being computed by TDA itself as the difference of both. Again, don't worry if you have only a duration variable, since in this case your starting time is zero for all individuals and the appropriate starting time variable is easily created.

Covariates in analysis of transition data are, with one important exception, very much like the independent variables in any other type of data analysis. Variables may be either interval scaled or categorical (with ordinal variables being treated as interval or as categorical according to your preferences). The exception is that with event history analysis, you have the possibility to analyze the influence of covariates that change over time. Still, apart from this reference to time, time-varying covariates have to be constructed just like any other covariates.

All the variables used for the analysis already have to be defined in the nvar( ); section of your command file. But TDA now has to know which of these variables are the state and the duration variables. This information is provided in the

edef( );

command, which in addition is also used for defining time-dependent covariates. More information is provided in chapter 3.3 of the TDA manual.

The edef( ); command has at least four subcommands:

ts =	definition of starting time
tf =	definition of ending time
org =	origin state
des =	destination state

On the right hand of the equals sign, there must be a reference to one of the variables defined in the nvar ( ); section or a constant. According to the manual, more complex expressions are possible, but it is recommended to define such variables outside the edef ( ); command, the only exception being time-dependent covariates, which are defined in the usual way as described in "Data files and variables" (but not by reference to columns in the data set, but rather to variables already defined in the nvar ( ); section).

I will postpone a description of defining time-dependent covariates to "Incorporating time-dependent covariates". Here I will just show what the four subcommands mentioned above look like in our example data set:

edef (
ts = V1,	# starting time = 0
tf = V2,	# ending time: V2
org = V1,	# origin state = 0
des = V3,	# destination state = V3
);

Note that instead of referring to V1 in the definition of the starting time and the origin state, a constant with value 0 might have been used (simply by writing, e.g., ts = 0). I have employed the somewhat more complicated reference to V1 to show that you can use a variable more than once in the edef ( ); command. That is, once a variable has been defined in the nvar ( ); command, it can be invoked several times if appropriate.

Last update: 25 Nov 1999