Twoway (Bivariate) Charts
This section introduces some elementary possibilities for displaying bivariate relationships. Many graph commands that fall into this category start with
twoway, but some referring to graphs that also can be used for univariate display (such as box plots) don't, and in the case of some others (such as scatter plots),
twoway may be omitted.
Note the other entries in this section which contain important information about options for graphs. The final two entries are devoted to more complex graphs, where several elements are overlaid or where several graphs are combined in a singple plot.
Box plots (box-and-whisker plots)
Box plots were already described in the "univariate charts" entry, but actually they are mainly used to compare distributions of two or more groups, as in:
graph box income, over(status)
graph box income, over(status) scheme(s1mono) intensity(0) aspectratio(2) /// outergap(*3) medtype(cline) medline(lwidth(medium) lcolor(gs0) ) /// title("Income by social groups in 2002") ytitle("Income") /// note("Source: XY data, own calculations")
uses a number of options to create a look that I like.
scatter income tenure
This will plot income (y axis) against tenure (x axis).
There are many options specific to scatterplots. For instance,
scatter trustcourt trustpolit, ms(d)
will use small "diamonds" instead of circles to display the data. A list of symbols available can be found via
help symbolstyle. There are also fifteen predefined marker styles that define combinations of colour, symbols and so on, see
help markerstyle. You can combine global marker styles with more specific styles. For instance, marker style "p4" displays symbols in a colour that looks beige to me, and it uses circles for symbols. If you want to use all the settings of style p4, but with diamonds instead of circles, you may write
scatter trustcourt trustpolit, mstyle (p4) ms(d)
To fit a regression line, you have to use the extended version of the command which starts with
graph twoway. It goes like this:
graph twoway (lfit income tenure) scatter income tenure
A jittered plot adds some random noise around each data point, with the value in parentheses referring to the size of the noise as percentage of the graphical area (some trial and error may be required here):
scatter trustcourt trustpolit, jitter(10)
scatter does not automatically adapt the plot axes to the data range that is actually expanded by jittering; it will use the original range of the variables. Therefore you will have to extend the axes with the help of the
Another way to deal with data points that would overlap in a simple scatterplot is a sunflower plot. Not surprisingly, it works like this:
sunflower trustcourt trustpolit
Note that Stata uses combinations of colours and petals to signal the density in a given are. This, and many other things, may again be influenced by various options.
A matrix of scatterplots
graph matrix income tenure educ prestige
will create a matrix in which the four variables mentioned are plot each against all others, with each variable appearing once on the x axis and once on the y axis. Options are available, e.g. to produce only the lower triangle of the matrix, to "jitter" the variables (add perturbations), and other stuff.
Twoway graphs for discrete (categorical) data
Catplot and spineplot
Percentages of a variable conditional on the values of a second (or even a third) variable may be displayed with the help of
catplot (an ado file; see the entry on univariate charts). An alternative is
spineplot, which I will perhaps explain at a later stage. A twoway graph with catplot works like this:
catplot note sex, percent(sex) recast(bar)
To produce a stacked bar chart, you have to add the options
Particularly if percentages are conditioned on more than one variable, the labels may be too large (in relation to their number) and will overlap. In this case, try the following option:
var2opts(label(labsize(small))) (or even
Twoway histogram for discrete (categorical) data
This is something very specific, and I do not recommend it. But if you wish to try it, here we go.
histogram cww151b, discrete by(cww151a, total)
In this example, the distribution of cww151b is graphed for each category of cww151a. If the distribution of the 'by' variable (in this example, cww151a) is very uneven, you may wish to create a graph that displays the absolute frequencies and thus allows to judge the contribution of each category to the total:
twoway histogram cww151b, discrete freq by(cww151a, total)
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 28 Apr 2017