AGGREGATE
Aggregating your data can be a powerful tool. It means that new data are computed from an existing data set, and cases in the new data set refer to a group of cases. For instance, you may wish to have in your data, for each case, a variable measuring how much this person's income differs from the mean income of all persons in the same occupational group. You may request SPSS to compute these mean incomes by aggregating all cases in each group, save these incomes to a new data set and then match these mean incomes to your original data. The deviation of the individual incomes from the mean income then may easily be computed.
Example:
AGGREGATE | |
/ OUTFILE = 'C:\subdir\meaninc.sav' | |
/ BREAK = occgroup | |
/ meaninc = MEAN(income). |
In the example above, the first line after the AGGREGATE
command requests that the aggregated data be saved in file "meaninc.sav" in subdirectory "subdir". The BREAK
line tells SPSS to group cases by variable "occgroup". All cases that have the same value in that variable will form one group. The next line tells SPSS to form a new variable named "meaninc" by computing the mean of the variable income in each group (the name of that variable actually may be the same as in the original data; your choice depends on the purpose of your analysis). This line could be repeated several times to form additional new variables (that must have different names, of course); for instance, you may request information on the smallest or biggest value of a variable in each group, or the percentage of cases in each group that have values within a given range (more possibilities are listed below).
In the OUTFILE
line, you may specify an asterisk "*" instead of a file name. In that case, the aggregate data file will not be saved; rather, it will become your working file. Often, you will wish to use that option in order to check whether you have really achieved what you wanted. But be sure to save your present working file before executing the aggregate command if you have modified the present file and wish to have access to the modified file at a later stage.
In the BREAK
line, more than one grouping variable may be named (the upper limit is 10, I think, but this is much more than you'll probably ever need). In this case, first all cases with the same value in the first variable are grouped. Then these groups are "split", as it were, to form new groups by the second variable, etc. You can request that the number of cases in each group be saved as a variable in the aggregated data set; for instance, with that variable named "ningroup", a new line (beginning with a slash) would be added with the command ningroup=N
.
Here are some examples of the functions that are available. I will use variable "income" throughout to explain the effects of the different functions.
Keyword | Effect (what new variable will display) |
---|---|
FIRST (income) | First value of income that is encountered in each group |
LAST (income) | Last value of income that is encountered in each group |
MIN (income) | Smallest value of income that is encountered in each group |
MAX (income) | Largest value of income that is encountered in each group |
SUM (income) | Sum of variable income in each group |
SD (income) | Standard deviation of income in each group |
PGT (income 1000) | Percentage of cases in each group with income greater than 1000 |
PLT (income 1000) | Percentage of cases in each group with income less than 1000 |
FGT (income 1000) | Fraction of cases in each group with income greater than 1000 (this is simply PGT divided by 100) |
FLT (income 1000) | Fraction of cases in each group with income less than 1000 |
PIN (income 1000 2000) | Percentage of cases in each group with income of at least 1000 and not more than 2000 |
POUT (income 1000 2000) | Percentage of cases in each group with income of less than 1000 or more than 2000 |
FIN (income 1000 2000) | Fraction of cases in each group with income of at least 1000 and not more than 2000 |
FOUT (income 1000 2000) | Fraction of cases in each group with income of less than 1000 or more than 2000 |
© W. Ludwig-Mayerhofer, IGSW | Last update: 02 May 1998