Multiple Imputation

Multiple Imputation is a procedure to deal with missing data.

Detect patterns of missingness:

MULTIPLE IMPUTATION income educ hhsize status
  /IMPUTE METHOD=NONE
  /MISSINGSUMMARIES OVERALL VARIABLES (MAXVARS=25 MINPCTMISSING=.2) PATTERNS.

In this procedure, no imputations are performed due to subcommand IMPUTE METHOD=NONE. Subommand MISSINGSUMMARIES requests some tables and graphs that indicate the amount, the location and the patterns of missing data. Particularly, MINPCTMISSING=.2 indicates that only variables with more than .2 per cent of missing values are to be included. The default is 10 per cent, which I deem generally too high. On the other hand, .2 per cent may be too low; it all depends on your data and the purposes of your analyses.


Performing the imputations:

DATASET DECLARE ineq_mi.
DATASET DECLARE ineq_mit.
MULTIPLE IMPUTATION income educ hhsize status
  /IMPUTE METHOD=FCS NIMPUTATIONS=5 MAXPCTMISSING=NONE MAXMODELPARAM=1000
  MAXCASEDRAWS=50 MAXPARAMDRAWS=2 MAXITER=100
  /IMPUTATIONSUMMARIES MODELS descriptives
  /CONSTRAINTS income min = 0)
  /OUTFILE IMPUTATIONS=ineq_mi FCSITERATIONS=ineq_mit.

Note that DATASET DECLARE is not part of the MULTIPLE IMPUTATION command. However, it is a prerequisite for having the data sets (on which more below) available after execution of the imputations.

Now some comments on the most important keywords:

  • METHOD=FCS request fully conditional specification, a method that works under all circumstances but may not be necessary under some (the default is to leave the decision about the appropriate method to SPSS).
  • MAXMODELPARAM=1000 is not necessary in a simple model as the present one (assuming that all variables are metric). However, with a larger number of variables, possibly including categorical variables, the default value of 100 may be too low.
  • MAXCASEDRAWS=50 and MAXPARAMDRAWS=2 are SPSS's default values and are listed here just in case you want to change them.
  • MAXITER=100 indicates the number of iterations for each imputation cycle. The default is 10, and this may be considered as too low by some (even though Allison, in his wonderful Sage Quantitative Series volume on MI, says that compared to other issues, this is a minor one. However, given computational speed of current – and future – processors you may as well increase the number of iterations unless you have very large data sets or very complicated models).
  • OUTFILE IMPUTATIONS=ineq_mi FCSITERATIONS=ineq_mit declares one file to which the data including the imputed values are to be written and one additional file to which the parameter estimates of the metric variables from the different iterations are written. These are useful for purposes of diagnosis (the estimates should not display a discernible pattern).

© W. Ludwig-Mayerhofer, IGSW | Last update: 29 Jun 2009