MISSING VALUES

User defined missing values indicate data values that either are indeed missing or that for some other purpose should not be used in most analyses (like 'does not apply'). System missing values occur when no value can obtained for a variable during data transformations.

Missing values are a topic that deserves special attention. This chapter explains why they arise and how to define them. Many chapters on data transformations have special sections devoted to the treatment of missing values.

Example

missing values var17 var23 var25 (99).


User defined missing values

Missing values are values of a variable that for some reasons should not be counted as 'real' data values. The two most common occasions of missing values are the following: First, even though there should be a value, there is none. This occurs regularly in social surveys, because respondents refuse, of simply forget, to answer questions. A second occasion is that there is no value of the variable because the variable does not apply. For instance, if a person has no paid work, her or his income from paid work cannot have a meaningful value. Often it is useful to distinguish these two occasions, sometimes it is not. However, usually nothing is lost if they actually are distinguished. In the case of income, one might enter missing values in the first sense as 999999 and those in the second sense as 999998.

Usually, missing values will not be used in the analyses, except, for instance, in an analysis devoted specifically to missing values. Therefore, SPSS has to know that there is something special with the values, e.g., 999999 and 999998. This can be achieved very easily with the command:

MISSING VALUE income (999998, 999999).

Thus, in all statistical procedures (for instance, in a comparison of mean income between groups), these values will be disregarded. In frequency tables, missing values will be shown, but they will be marked as such and will not be used in computation of statistics.

If there are several variables that have the same missing values, you can define these in one run:

MISSING VALUE var12 to var17, var19, var203, var205 to var 287 (8, 9).

If for some reason you want the missing values not to be treated as such, you can 'unmark' them as missing the following way:

MISSING VALUE income ().

You can also 'undefine' some values as missing and leave others defined as missing. Assuming that you have defined the missing values (as shown above) as 999998 and 999999, with the command

MISSING VALUE income (999999).

999998 will not be treated anymore as missing value. You can change your definitions as often as you like.


Number of missing values

The number of missing values that can be defined is restricted. You can either define up to three specific values as (user) missing, or a range of variables, or a range plus one single value. Here's an example for the latter:

MISSING VALUE income (999994 thru 999997, 999999).


System missing values

System missing values occur when in some data transformations constellations arise that are not meaningful (loosely speaking) or when some conditions arise that you have not taken into account. For instance, if you have two variables, one indicating a person's gender and the other one whether she or he is in paid work, and you create a new variable that tells you whether (a) a person is male and paid, (b) female and paid, (c) male and not paid, all females that are not in paid work will have a dot (or, in some countries, a comma) instead of a 'real' value. Of course, this may be precisely what you want for some purposes, but in many instances this will not be true. Anyway, these dots (or commas) will be treated like any other missing values. However, in many commands, you can address system missing values specifically by the keyword 'sysmis'. System missing values will not be affected by definitions (or 'undefinitions') of user defined missing values as explained above. However, they can be recoded into 'real' values and then still be marked as missing (or not, just as you please).

Note: In my opinion, it is highly recommendable to distinguish system and user missing values especially during the data entry stage. Some users seem to think: "Why should I enter any value at all if it is missing anyway?". These users should be absolutely sure that later on they will be able to tell whether the value is indeed missing or whether somebody just forgot to enter it ...

© W. Ludwig-Mayerhofer, WLM-SPSS | Last update: 25 Mar 1998