Sort Cases

There are two commands for sorting cases according to the values of one or more variable(s). The first one has limits, but it has an additional feature and perhaps it is faster.

sort caseid

will sort your data according to the values in variable caseid, in ascending order. You may insert more than one variable name; data will then be sorted first according to the first variable, and wherever there are several cases with the same value in this variable, these will be sorted according to the second variable, and so on in the case of even more variables.

Options:

sort caseid, stable
sort caseid in 1/100

The first line works as follows: Assume there are several cases with the same value in caseid. Using the option stable will make Stata keep the order of cases within the same value of caseid after sorting (that is, the first value with a given caseid in the original data will also be the first case in the sorted data, and so on). Otherwise, the order in which cases with the same value in the sorting variable appear is subject to chance.

The second command will cause Stata to sort only the first hundred cases; all others will be unaffected.


gsort + caseid - age

will sort your data according to the values in variable caseid, in ascending order, and within all same values of caseid according to age, in descending order. That is, only gsort allows to sort data in descending order. Note that the plus sign may be omitted.

Options:

gsort country - region, generate(groupcreg)
gsort country - region, m

The first line will tell Stata to create a new variable "groupcreg" that denotes the groups that may be formed from the sorted data. Assume you have sorted your data by country and within country by region. Each country-region combination will be denoted by a value of variable "groupreg", starting with 1.

The second command will place missing values first when a variable is sorted in descending order.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 05 Mar 2009