Quantile Plots, Plots of the Cumulative Density Function

These plots show the ordered values of of a variable against a reference (another variable, a distribution, or some other aspects), or alternatively the cumulative density function (c.d.f.) of a variable (actually the former are quite close to the latter). They can help with assessing properties of the distribution, with variants developed for quite specific properties, such as symmetry (not shown here). A classical text is Wilk and Gnanadesikan (1968). I found particularly helpful this presentation by Nicholas J. Cox, who generally is a great source for (and of) graphs in Stata. See also section 5 of Cox (2004).

Stata's quantile plot commands can be found under the heading of "diagnostic plots" (typing help diagnostic plots will show the way). Several ado files offer alternative routes to established goals or additional solutions.

Please note: Most graphs in this entry have been created by using, among other options, the following: ylabel(,angle(0)), scheme(s1mono) and plotregion(lstyle(none)). These are not repeated in the examples shown below.

Stata's diagnostic plots

The following offers only two very common charts from this collection; don't forget to look up the others (help diagnostic_plots

Quantile plots

Here, the ordered values of the variable under investigation are shown, plus a reference line that corresponds to a uniformly distributed variable.

The following plot was created by

quantile equivinc

Quantile plot

It can be seen that, for instance, the first quartile is in the region of 1,500, and the median somewhere near 2,300. From this, it follows that the distribution is very skewed; whereas about one half of the data is between 0 (actually the lowest value is a bit higher) and 2,300, the other half is between 2,300 and over 6,000 (or 8,000 if we include the outlier). It can also be seen that the data are far from being uniformly distributed. (Actually, few data are.) Note that a "normal probability plot", created by pnorm equivinc, compares the empirical distribution against a normal distribution;

Quantile-quantile plots

These plots graph quantiles of one variable against those of another variable. If the variables have the same distribution, the quantiles correspond to each other; the plot forms a straight line (this reference line is shown in the graph).

The following plot comparing (net) incomes of male and female full time workers (in Deutsche Mark; data are from the year 2000) was created by

qqplot incf incm

(plus some options). We see that incomes of male workers are considerably higher than those of female workers; e.g., the cumulative fraction that corresponds to 6,000 DM in the case of men has an income of only 4,000 DM in the case of women. (Note that while there is a gender pay gap in Germany, it is typically even larger in net incomes because of the specifics of German tax law.)

Quantile-quantile plot

Nicholas J. Cox' qplot

This is an alternative to quantile, without the latter's reference line. To install it, you have to search qplot and look which of the links presented offers the latest update. the basic command, of course, is simply search qplot plus a variable name. The procedure has several additional features which are easily accessible via help qplot once the ado file is installed.

Using cdfplot and distplot

The commands discussed thus far, quantile and qplot, show the values of the variable under investigation on the vertical axis. Cumulative density functions, to which we now turn, are usually displayed the other way round: The values on the horizontal axis and some associated probabilities on the the vertical axis. Such plots can be obtained with cdfplot (written by Adrian Mander) or distplot, again a result of Nicholas J. Cox' seemingly unexhaustible productivity.

cdfplot

Procedure cdfplot is most apt for categorical (ordered) variables, but in principle you may use it for any type of variable you like. It uses a step function to connect the values of the c.d.f. After installing the ado file with ssc install cdfplot you may obtain a cdf plot:

cdfplot education

Cumulative density plot, created by cdfplot

You may compare groups, as in

cdfplot education, by(sex)

Further possibilities (see help cdfplot): You may include, by way of comparison, the c.d.f. of a normally distributed variable, and there are some further options.

distplot

Another package that is not part of Stata's standard distribution. Type findit distplot, look for the most recent entry on "Software update for distplot" and follow the link provided. Note that the syntax of distplot has changed over time; what I describe here is working in 2025 with Stata version 16. The simplest version of the command is as follows:

distplot income

With discrete (ordered) variables, you may wish to the display the distribution as a step function, which can be achieved as follows (the resulting graph will look very much like the one created by cdfplot), shown above:

distplot education, connect(stairstep)

The option to change the style of the connecting line is a specific feature of distplot (stepstair is a further style available, but it is not recommended in the context of c.d.f.s).

By default, the c.d.f is computed and displayed in terms of fractions. To obtain percentages, write:

distplot education, connect(stairstep) trscale(100 * @)

trscale obviously is for "transform scale". The "at" sign stands for the original values, i.e. the fractions; other transformations may be used.

distplot may also be used to compare the c.d.f.s of two or more groups as follows (note that (stairstep ..) is a shortcut for writing out stairstep three times [one for each line]):

distplot equivinc, over(bildgr) connect(stairstep ..) xt(" " "Education")

Cumulative density plot for several groups, created by distplot

distplot has a few features that could be worth investigating; for more information, see help distplot

Top of page

Reference

  • Cox, Nicholas J. (2004): Speaking Stata: Graphing distributions, The Stata Journal, 4 (1), pp. 66–88.
  • Wilk, M. B./R. Gnanadesikan (1968): Probability plotting methods for the analysis of data, Biometrika 55, pp. 1–17.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 20 Apr 2025