Conditional Transformations (IF . . .)

Conditional transformation create (or change) data values only IF a certain condition is true.

Simple example:

IF (varx eq 1) varz = 0.

More complex example:

IF ((varx eq 1 and not missing(varu) and vari eq vark) or
   (vara gt 2 and abs(varb) le 7)) varz = 0.

Special example:

DO IF version eq 1.
   compute q1r = q1 eq 4.
   compute q2r = q2 eq 3.
   compute q3r = q3 eq 1.
ELSE IF version eq 2.
   compute q1r = q1 eq 1.
   compute q2r = q2 eq 5.
   compute q3r = q3 eq 2.
END IF.


General

IF checks whether the condition(s) that is (are) listed after the IF clause is (are) true. If this is the case, the variable that is named immediately on the left of the equal sign will get the value on the right hand of the equal sign.

The conditions after the IF clause usually compare one or several variables to numbers or other variables. In addition, the arithmetic functions that are explained in the Compute section may be used, as in the expression "abs(vara)" used above. If there are several conditions, you may specify whether only one, several, or all of these conditions must be met, by using keywords AND and/or OR. Parentheses help to structure the priority of conditions.

Usually, the variable to which a value is assigned IF the condition is true will be a new variable. However, if a variable is named that is already in the data set, the values of this variable will be replaced by those you just created.

The DO IF ... ELSE (IF) ... END IF clause permits to perform one or more transformations on condition that the expression after DO IF or ELSE IF is true. It is special inasmuch DO IF can be combined with other types of data transformation.


Comparisons and logical operators

When variables are compared to numbers or other variables, the following keywords or signs can be used:

Key Symbol Meaning
EQ = Equal to
NE ~= Not equal to
GE >= Greater than or equal to
GT > Greater than
LE <= Less than or equal to
LT < Less than

Several conditions (comparisons) may be concatenated by AND (symbol: &) and/or OR (symbol: ¦). If two conditions are concatenated by AND, the whole expression is true only if both conditions are met. If two conditions are concatenated by OR, the whole expression is true if one of the conditions is met. Several conditions may be be concatenated by AND and/or OR clauses. In addition, you may specify, instead of a condition having to be met, a condition that must NOT (symbol: ~) be met. AND, OR and NOT are called logical operators.

It is usually highly recommended to use parentheses to clarify the priorities of clauses. For instance, you may be looking for mothers in your data who are not married. You first will check whether a person is female (say, gender EQ 1) and if she is never married or divorced or widowed (say, famst EQ 3, 4 or 5); and if this is true, you have to check whether the number of children this person has is greater than 0 (nkids GT 0). Now if you write:

WRONG:

IF (gender EQ 1 AND famst EQ 3 OR famst EQ 4 OR famst EQ 5
  AND nkids GT 0) singlemo = 1.

you will find an amazing number of single mothers in your data. This is because the clause is true in any the following cases:

  1. If gender eq 1 and famst eq 3
  2. If famst eq 4
  3. if famst eq 5 and nkids gt 0

and thus the following persons will get a value of "1" in the variable singlemo:

  1. all women who were never married (whether or not they have kids)
  2. all persons who are divorced (no matter whether they are female or not and no matter whether they have children or not)
  3. all widows and widowers with children

Note that the way SPSS behaves is exactly in line with modern logic.

The right way to get what you want is:

IF (gender EQ 1 AND (famst EQ 3 OR famst EQ 4 OR famst EQ 5)
  AND nkids GT 0) singlemo = 1.

Here, all conditions concatenated by OR are counted, as it were, as one "super"-condition (that is true if any of the three conditions as met), and this condition is linked to the other conditions by AND.


Two special keywords

ANY is an abbreviation for a series of OR clauses related to one variable. That is, you can check whether a variable has one out of several values, as in:

IF ANY (vara, 1, 7, 8, 16, 18) newvar = 3.

Thus, if vara has value 1 OR value 7 OR value 8 OR value 16 OR value 18, the IF clause will be coded as true and newvar will have value 3 for all cases that fulfill this condition.

RANGE helps you check whether the values of a variable are within a certain range. This is an abbreviation for a GE keyword combined with a LE keyword. The syntax is:

IF RANGE (vara, 1.7, 4.8) newvar = 3.

If vara has a value that is not smaller than 1.7 and not bigger than 4.8, the variable newvar will have the value that is specified on the right hand, i.e. 3.


The DO IF command

Most of what is important about this command can be found in the example on top of this page. The DO IF command serves as a shorthand that may help to make syntax more transparent especially if several transformations have to be performed given a certain condition and even more if the transformations to be done vary with a series of different conditions. My example might refer to a professor having given his students two versions of a test and now having his computer judge whether the answers given by the students are correct. Which answer is correct of course depends on the version of the test.

The structure of this command likewise can be found in the example I have provided on top of this page. Note, perhaps, that the last ELSE IF condition may be substituted by the simple keyword ELSE. In this case, the ensuing transformations are performed on all cases that do not meet any of the criteria defined by the previous DO IF and ELSE IF clauses.

In the lines after DO IF or ELSE IF any kind of data transformation may follow. That is, you may use other IF conditions, or the COMPUTE, COUNT, and even the RECODE command, alone or in combination.


Missing values

The way SPSS deals with missing values in IF conditions is not always easy to understand. The best thing is you always try out first. By and large, one may say that missing values are not treated like other data values, except for some circumstances. I can only give a few examples to show that there is a problem.

Let's first assume that you have a single condition. The command

IF (vara EQ 1) newvar = value.

will produce "value" in newvar whenever a case has value 1 in vara. Thus, if a case has a missing value in vara, the condition vara eq 1 is not met and newvar will not have value "value". However, if you write:

IF (vara NE 1) newvar = value.

newvar will also not have the value "value" if there is a missing value in vara.

The same will apply when you concatenate several conditions by AND. However, when using the OR operator, missing values may cause no problem, but only when one of the conditions is true. Assume you write:

IF (vara EQ 1 OR varb EQ 1) newvar = value.

Then if a case has value 1 in vara and a missing value in varb, it will have "value" in newvar. This is because one of the conditions is met, namely, vara EQ 1, and SPSS does not care now whether the next condition is also met. Note, however, that the command

IF (vara NE 1 OR varb NE 1) newvar = value.

will not produce "value" in newvar for cases that have missing values in vara and in varb.

As in many other commands, missing values may be addressed explicitly, however. You may check whether a case has a missing value with a command like this:

IF ( MISSING(vara) ) newvar = value.

Of course, this condition may be concatenated with other conditions. The keyword SYSMIS may be used instead of MISSING to denote system missing values.

© W. Ludwig-Mayerhofer, IGSW | Last update: 26 Jul 2004