Analysing Multiply Imputed Datasets

Note: This section refers to Stata 11 or higher. Here, analysis of multiply imputed data is achieved by commands that start with mi. For data analysis, this command often is a composite prefix (mi ...:) which is followed by a standard Stata command. Before version 11, analysis of such data was possible with the help of ados; the basic commands started with mim. But I won't say anything about this here.

Please note also that this is a very brief section that describese only a minimum of commands and almost no options.

Basic commands (pre-analysis)

mi query

tells you whether you are dealing with a multiply imputed data set (or, to be exact, with a data set that is 'mi set'), and if so, about the 'style' (or structure) of the mi data set (flong, mlong and so on, with flong being the most simple style where the original data and the imputed data sets stacked below each other in one single file) and the number of data sets with imputed values.

mi describe

gives same basic information about the imputed variables, the passive variables, and some other things.

mi varying

can help you to verify that the process of defining your data (e.g., which variables contain imputed values and which don't) has worked correctly.

Basic analyses

mi xeq: tab particip gender, col m

mi xeq 2 5 7: tab particip gender, col m

The first command will produce a crosstabulation of variables particip by gender for each data set, starting with the original data. The second command will produce the crosstabulation for data sets 2, 5 and 7 only.

mi xeq can be used with many commands for data analysis, perhaps even with any command for data analysis. In the special case of 'flongsep' data, it can also be used for data transformation.

Estimation

In the final analysis, multiply imputed data are created and used for parameter estimation. Here, you should not only think about the coefficients of statistical models; means or proportions likewise are quantities for which you may wish to obtain estimates. Estimation is based on analyzing each imputed data set and pooling the results; Stata accomplishes both steps with a single command.

mi estimate: proportion status

mi estimate: regress income educ experience gender, beta

are examples of mi estimation commands. A list of estimation procedures available can be obtained easily by typing help mi estimate.

A helpful option with large data sets or complex models is , dots, which will cause Stata to display a line of dots in the output screen that acts as a sort of "progress bar". You may also use option , ni(#) (for 'number of imputations') or , i(###), where '#' represents the number of imputations you wish to use and '###' represents a list or a range of numbers. So,

mi estimate, dots ni(7): regress income educ experience gender, beta

informs Stata to use only the first seven imputed data sets, whereas

mi estimate, dots i(1 2 5/7): regress income educ experience gender, beta

might be used to request an analysis that uses only imputations number 1, 2 and 5 to 7.

Note that the number of postestimation commands after mi estimation is quite restricted compared to the vast array that is usually offered by Stata. This is because many statistics that are required for postestimation are not easily defined in a multiple imputation context.

Tests

Complex tests with multiply imputed data require special commands. At the moment, I will present a single example only, namely, testing whether the effects of two or more categories of a variable are identical.

Let's assume that you regress income on some variables, among which one indicates your occupation. Let's further assume that you want to test the hypothesis that the effect of the third category of occupation is different from that of the fourth category. This may be achieved like this:

mi estimate (diff: _b[3.occup]-_b[4.occup]), dots: regress income educ i.occup

If there is a possibilty that you may wish to run some further tests on your model, you may save the estimation results and do further tests using these results:

mi estimate (diff: _b[3.occup]-_b[4.occup]), dots saving(miest1): regress income educ i.occup
mi estimate (diff: _b[2.occup]-_b[3.occup]) using miest1, dots

You may also do two (or more tests) with a single run:

mi estimate (diff1: _b[3.occup]-_b[4.occup]) (diff2: _b[2.occup]-_b[3.occup]) ///
, dots saving(miest1): regress income educ i.occup