COUNT

This command counts the frequency of the occurrence of one or several data values over several variables.

Simple example:

COUNT newvar = var_a var_b var_c var_d (1,2).

More complex example:

COUNT newvar = var_a (3) var_b (4) var_c (1,7, 10 thru 13) var_d (1,2).

General

COUNT first sets the value of the variable named on the left hand of the equal sign to zero. Then it checks for each variable mentioned on the right hand whether a case has the value (or one of the values) that are listed in the parentheses that follow the variable. If this is true, the value of the variable on the left hand is augmented by 1. Thus, if there are, e.g., 4 variables on the right hand, and a case has the value(s) you are looking for in 2 of these variables, this case will have the value 2 in the variable on the left hand.

If you are COUNTing the same values for several variables, these values do not have to be listed after each of the variables; rather, they may be listed after the last of these variables, as in the simple example above.

Some examples

The count command is not used as frequently as other commands for data transformation. Nonetheless, there are instances where it is very useful. I give a few examples.

In household surveys, typically there are data such as age and gender for each household member. To compute the net equivalent household income, you will have to know, for instance, how many household members under 18 are present. If the age of the household members is stored in variables age1 to age8, you will proceed as follows:

COUNT numbkids = age1 age2 age3 age4 age5 age6 age7 age8 (1 thru 17).

Next example: Often, we wish to know how often a respondent has given a certain answer to a series of questions, e.g., whether he is in favour of certain policies (say, values 4 and 5) or not (other values). Thus, we will compute:

COUNT proenvir = env1 env2 env3 env4 env5 (4,5).

Finally, consider a teacher who has given a test to kids and has coded, for each test item, whether the answer is correct ( value of 1) or not (value 0). Thus, the number of correct answers may be easily computed.

Missing values

Missing values are treated like any other values. Thus, if you are looking for value 1 in a given variable, and the value for a case is "missing", then the left hand variable will not be augmented by 1. This may be problematic in some cases. Consider the case we mentioned above: We have, say, 5 items about certain policies and look for the respondents' agreement. A person who has answered all of the 5 questions and has agreed 2 out of 5 times will have the same numbers of agreements as a person who has answered only 2 of these questions and has agreed in these two cases.

Thus, it will usually be necessary to take missing values into account somehow. This may be easily done, since missing values can be listed among the values you are looking for. Thus, you may create a variable that tells you how many of the values actually are missing per case. (This is indeed my most frequent use of the COUNT command.)

COUNT envimiss = env1 env2 env3 env4 env5 (missing).

In the same way, the keyword SYSMIS may be used instead of MISSING.