Collapsing your data means to combine several cases into single lines. This is much liking creating statistics for groups of cases, but by collapsing your data a new data set is created that contains these statistics and can be put to further use.

By default, the mean of one (or several) variables is created. So, the simplest version of the command goes like this - let's assume we have a data set of many employed people and we wish to create a variable measuring the average (mean) income per people's occupation:

collapse (mean) income, by(occupation)

The new data set will contain one row for each occupation, and the variable "income" will represent the mean of income of each occupation. Several other functions are available, such as median, sd (for standard deviation), p1 to p99 (for the first to the 99th percentile), and others. See help collapse to find out more about other options.

Note that you do not have to collapse data if you just want to add the mean of variable (possibly for subgroups) to your current dataset. Rather, use the egen command described in the section about generate/replace.


Contract creates a new dataset consisting of all combinations of a number of variables plus a new variable that represents the frequency of each combination.

contract occupation gender

will create a dataset that contains all occupation-gender combinations in your original data and the frequency with which each combination occurs. Note that by default missing values are treated as a value in its own right, but this, just as a number of other features, can be changed with the help of options. For further information see help contract.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 25 Jul 2022