Quantile Plots, Plots of the Cumulative Density Function
These plots show the ordered values of of a variable against a reference (another variable, a distribution, or some other aspects), or alternatively the cumulative density function (c.d.f.) of a variable (actually the former are quite close to the latter). They can help with assessing properties of the distribution, with variants developed for quite specific properties, such as symmetry (not shown here). A classical text is Wilk and Gnanadesikan (1968). I found particularly helpful this presentation by Nicholas J. Cox, who generally is a great source for (and of) graphs in Stata. See also section 5 of Cox (2004).
Stata's quantile plot commands can be found under the heading of "diagnostic plots" (typing help diagnostic plots
will show the way). Several ado files offer alternative routes to established goals or additional solutions.
ylabel(,angle(0))
, scheme(s1mono)
and plotregion(lstyle(none))
. These are not repeated in the examples shown below.
Stata's diagnostic plots
The following offers only two very common charts from this collection; don't forget to look up the others (help diagnostic_plots
Quantile plots
Here, the ordered values of the variable under investigation are shown, plus a reference line that corresponds to a uniformly distributed variable.
The following plot was created by
quantile equivinc
It can be seen that, for instance, the first quartile is in the region of 1,500, and the median somewhere near 2,300. From this, it follows that the distribution is very skewed; whereas about one half of the data is between 0 (actually the lowest value is a bit higher) and 2,300, the other half is between 2,300 and over 6,000 (or 8,000 if we include the outlier). It can also be seen that the data are far from being uniformly distributed. (Actually, few data are.) Note that a "normal probability plot", created by pnorm equivinc
, compares the empirical distribution against a normal distribution;
Quantile-quantile plots
These plots graph quantiles of one variable against those of another variable. If the variables have the same distribution, the quantiles correspond to each other; the plot forms a straight line (this reference line is shown in the graph).
The following plot comparing (net) incomes of male and female full time workers (in Deutsche Mark; data are from the year 2000) was created by
qqplot incf incm
(plus some options). We see that incomes of male workers are considerably higher than those of female workers; e.g., the cumulative fraction that corresponds to 6,000 DM in the case of men has an income of only 4,000 DM in the case of women. (Note that while there is a gender pay gap in Germany, it is typically even larger in net incomes because of the specifics of German tax law.)
Nicholas J. Cox' qplot
This is an alternative to quantile
, without the latter's reference line. To install it, you have to search qplot
and look which of the links presented offers the latest update. the basic command, of course, is simply search qplot
plus a variable name. The procedure has several additional features which are easily accessible via help qplot
once the ado file is installed.
Using cdfplot and distplot
The commands discussed thus far, quantile
and qplot
, show the values of the variable under investigation on the vertical axis. Cumulative density functions, to which we now turn, are usually displayed the other way round: The values on the horizontal axis and some associated probabilities on the the vertical axis. Such plots can be obtained with cdfplot
(written by Adrian Mander) or distplot
, again a result of Nicholas J. Cox' seemingly unexhaustible productivity.
cdfplot
Procedure cdfplot
is most apt for categorical (ordered) variables, but in principle you may use it for any type of variable you like. It uses a step function to connect the values of the c.d.f. After installing the ado file with
ssc install cdfplot
you may obtain a cdf plot:
cdfplot education
You may compare groups, as in
cdfplot education, by(sex)
Further possibilities (see help cdfplot
): You may include, by way of comparison, the c.d.f. of a normally distributed variable, and there are some further options.
distplot
Another package that is not part of Stata's standard distribution. Type findit distplot
, look for the most recent entry on "Software update for distplot" and follow the link provided. Note that the syntax of distplot
has changed over time; what I describe here is working in 2025 with Stata version 16. The simplest version of the command is as follows:
distplot income
With discrete (ordered) variables, you may wish to the display the distribution as a step function, which can be achieved as follows (the resulting graph will look very much like the one created by cdfplot
), shown above:
distplot education, connect(stairstep)
The option to change the style of the connecting line is a specific feature of distplot
(stepstair
is a further style available, but it is not recommended in the context of c.d.f.s).
By default, the c.d.f is computed and displayed in terms of fractions. To obtain percentages, write:
distplot education, connect(stairstep) trscale(100 * @)
trscale
obviously is for "transform scale". The "at" sign stands for the original values, i.e. the fractions; other transformations may be used.
distplot
may also be used to compare the c.d.f.s of two or more groups as follows (note that (stairstep ..)
is a shortcut for writing out stairstep three times [one for each line]):
distplot equivinc, over(bildgr) connect(stairstep ..) xt(" " "Education")
distplot
has a few features that could be worth investigating; for more information, see help distplot
Reference
- Cox, Nicholas J. (2004): Speaking Stata: Graphing distributions, The Stata Journal, 4 (1), pp. 66–88.
- Wilk, M. B./R. Gnanadesikan (1968): Probability plotting methods for the analysis of data, Biometrika 55, pp. 1–17.
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 20 Apr 2025