RECODE

The RECODE command is used to change some or all values of a variable. The new values can be stored in the same variable or in a new one.

Simple example:

RECODE var1 var2 (1,2 = 1) (3,4 = 2).

More complex example:

RECODE var1 (1,2 = 1) (3 thru 8 = 2) (missing = 99) (else = copy)
  INTO var1new.


General

Assume we have variable famimpor with values 1, 2, 3 and 4, measuring how important having a family is for the respondent.

RECODE famimpor (4=3).

This changes all occurrences of value 4 to value 3. The original value 3 remains unchanged. (If you have value labels, perhaps you should change these now accordingly). All other values remain unchanged, too.

RECODE famimpor (1,2 =1) (3,4 = 2).

This creates a dichotomous variable with cut-off point between 2 and 3.

RECODE famimpor (1=4) (2=3) (3=2) (4=1).

This reverses the order of the values. Note that this can be achieved more easily with the COMPUTE command (COMPUTE famimpor = 5 - famimpor.). This may be helpful if there are many categories.


Creating New Variables

One problem is that all the above examples change the original values of the variable. You should do so only if either you can reconstruct the original values from the new ones (this would be possible only in the last example!) or if you are damn sure that you have a valid copy of your data set elsewhere. Fortunately, you can create new variables with the RECODE command by adding INTO NEWVAR. See the following examples.

RECODE famimpor (1,2 =1) (3,4 = 2) INTO famimpod.

This creates a dichotomous variable with cut-off point between 2 and 3. The old variable remains intact, however, and the dichotomous variable is added to the data set with the name famimpod (with "D" indicating that you have a dichotomous variable. I use such memotechnic devices quite frequently, but of course you don't have to.)

Note, however, that you have to take some precautions when RECODING INTO a new variable. In this case, you have to be sure that the new variable has all the valid values you wish it to have. If the RECODE command names only part of the values, all the other values will be treated as system missing values. Yet, there is an easy way to deal with this problem.

WRONG: RECODE famimpor (4=3) INTO famimpon.

This would change all occurrences of value 4 to value 3. However, all other values would be treated as system missing. Here's the better way:

RECODE famimpor (4=3) (ELSE=COPY) INTO famimpon.

This changes all occurrences of value 4 to value 3, and all other values are copied to the new variable. Of course, you can make as many explicit recodings as you like and then deal with the rest via ELSE=COPY.

The ELSE part of the RECODE command is not restricted to the "COPY" case just explained. You can also use it with explicit values, as in the following example:

RECODE famimpor (4,3=2) (ELSE=1) INTO famimpor.

One problem with RECODE INTO, however, is that variable and value labels are not automatically created (how should they?). So don't forget to label the new variable.


Several variables

can be addressed in one RECODE statement, provided that all variables are to be affected in the same way, as in the following example.

RECODE famimpor workimpo chldimpo (4,3=2) (ELSE=1).

However, if you wish to RECODE these variables INTO new variables, for each variable a separate RECODE command line is necessary.


Treatment of Missing Values

Missing values can be addressed via the keywords "sysmis" or "missing".

Assume that in addition to the values 1 thru 4, there are also values 8 and 9 which are defined as missing values. Here's a few examples and what they do.

RECODE famimpor (missing = 8).

All missing values (i.e., values 8 and 9 and perhaps also system missing values) will have the value 8. 8 will still be defined as missing.

RECODE famimpor (missing = 7).

All missing values (i.e., values 8 and 9 and perhaps also system missing values) will have the value 7. 7 will not be defined as missing!

If you have system missing values and want to recode only these to another value, you can use the keyword "sysmis" instead of "missing".

RECODE famimpor (sysmis = 7).

Again, if you have defined values 8 and 9 as missing, 7 will not be recognized as a (formerly) missing value. You may wish to either define 7 now as missing or to assign a value label that tells you the meaning of 7.


Simplifications

Often, several values of a variable have to be addressed. Some keywords help to make this process easier.

RECODE famimpor (lowest thru 3 = 3).
RECODE famimpor (2 thru highest = 2).

The first command recodes all values from the lowest value to (and including) value 3 into the value 3. The second command recodes all values from the value 2 to (and including) the highest value into the value 2. Note that "lowest" can be abbreviated as "lo" and "highest" as "hi". CAUTION: Assuming that 8 and 9 are still defined as user missing values, these will be recoded to 2 in the last example (and of course won't be defined as missing anymore).

© W. Ludwig-Mayerhofer, WLM-SPSS | Last update: 13 Apr 1998