AGGREGATE

Aggregating your data can be a powerful tool. It means that new data are computed from an existing data set, and cases in the new data set refer to a group of cases. For instance, you may wish to have in your data, for each case, a variable measuring how much this person's income differs from the mean income of all persons in the same occupational group. You may request SPSS to compute these mean incomes by aggregating all cases in each group, save these incomes to a new data set and then match these mean incomes to your original data. The deviation of the individual incomes from the mean income then may easily be computed.

Example:

AGGREGATE
	/ OUTFILE = 'C:\subdir\meaninc.sav'
	/ BREAK = occgroup
	/ meaninc = MEAN(income).

In the example above, the first line after the AGGREGATE command requests that the aggregated data be saved in file "meaninc.sav" in subdirectory "subdir". The BREAK line tells SPSS to group cases by variable "occgroup". All cases that have the same value in that variable will form one group. The next line tells SPSS to form a new variable named "meaninc" by computing the mean of the variable income in each group (the name of that variable actually may be the same as in the original data; your choice depends on the purpose of your analysis). This line could be repeated several times to form additional new variables (that must have different names, of course); for instance, you may request information on the smallest or biggest value of a variable in each group, or the percentage of cases in each group that have values within a given range (more possibilities are listed below).

In the OUTFILE line, you may specify an asterisk "*" instead of a file name. In that case, the aggregate data file will not be saved; rather, it will become your working file. Often, you will wish to use that option in order to check whether you have really achieved what you wanted. But be sure to save your present working file before executing the aggregate command if you have modified the present file and wish to have access to the modified file at a later stage.

In the BREAK line, more than one grouping variable may be named (the upper limit is 10, I think, but this is much more than you'll probably ever need). In this case, first all cases with the same value in the first variable are grouped. Then these groups are "split", as it were, to form new groups by the second variable, etc. You can request that the number of cases in each group be saved as a variable in the aggregated data set; for instance, with that variable named "ningroup", a new line (beginning with a slash) would be added with the command ningroup=N.

Here are some examples of the functions that are available. I will use variable "income" throughout to explain the effects of the different functions.

Keyword	Effect (what new variable will display)
FIRST (income)	First value of income that is encountered in each group
LAST (income)	Last value of income that is encountered in each group
MIN (income)	Smallest value of income that is encountered in each group
MAX (income)	Largest value of income that is encountered in each group
SUM (income)	Sum of variable income in each group
SD (income)	Standard deviation of income in each group
PGT (income 1000)	Percentage of cases in each group with income greater than 1000
PLT (income 1000)	Percentage of cases in each group with income less than 1000
FGT (income 1000)	Fraction of cases in each group with income greater than 1000 (this is simply PGT divided by 100)
FLT (income 1000)	Fraction of cases in each group with income less than 1000
PIN (income 1000 2000)	Percentage of cases in each group with income of at least 1000 and not more than 2000
POUT (income 1000 2000)	Percentage of cases in each group with income of less than 1000 or more than 2000
FIN (income 1000 2000)	Fraction of cases in each group with income of at least 1000 and not more than 2000
FOUT (income 1000 2000)	Fraction of cases in each group with income of less than 1000 or more than 2000