Exploratory Data Analysis

Stem-and-leaf displays

Stem-and-leaf displays are a good way of looking at the shape of your data. With Stata, this is a good way only if you have a small data set (say, a few hundred cases at max). With large numbers of cases, you will encounter trouble, as Stata always displays each single case as one leaf and therefore may suppress a considerable number of individual values, indicating the number of values suppressed in parentheses. (I wouldn't be surprised if there were some user routine to deal with this problem; as yet I did not bother to search for such a device.)

The basic command is:

stem income

To compare groups, you may use by or if.

Apart from the "large n" problem mentioned above, Stata usually produces a meaningful display. However, you can influence the shape of your display using some options if for some reason or other you don't like the default outcome. Consider the following example

0* | 679
1* | 339
2* | 449
3* | 01269
4* | 44567
5* | 177789
6* | 246

This display has one line per leading digit (the digit forming the stem); put otherwise, it has a width of 10 (10 different digits may be displayed per stem. Therefore, by using either stem percfemale, lines(2) or stem percfemale, width(5) you will get the following display:

0. | 679
1* | 33
1. | 9
2* | 44
2. | 9
3* | 012
3. | 69
4* | 44
4. | 567
5* | 1
5. | 77789
6* | 24
6. | 6


Box plots

Box plots (or, more formally, box-and-whisker-plots) are convenient ways of displaying basic features of the distribution of variables that both give a rough view of the distribution of the bulk of the data and display outlying data values. They can be used to check differences between groups; however, subtle differences may not be detected.

graph box trust, over(gender)

will create boxplots for the variable "trust", one for each gender. More than one variable can mentioned on the left side of the comma. The "over" ooption may be omitted.

Box plots may be displayed horizontally; the command goes like this:

graph hbox trust, over(gender)

See the section on bivariate graphs> for more detail.

For options that apply to the display of (nearly all) graphs, see section chart options.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 17 Sep 2010