Graphing Univariate Distribution

This entry explains some fairly standard graphs. Note that more univariate charts can be found in the following entries. Bar charts that describe (absolute or relative) frequencies are described in the entry about frequency tables.

The data correspond roughly to the Medicaid program quality index data that can be found in Jacoby, William G. (1997). Statistical Graphics for Univariate and Bivariate Data (Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-117). Thousand Oaks: Sage, pp. 37-8.

Stem and leaf diagram

13 | 3
14 | 16
15 | 89
16 | 0679
17 | 123677
18 | 0134
19 | 01222566
20 | 1279
21 | 3799
22 | 022489
23 | 5
24 | 57
25 | 3
26 | 014

Unsurprisingly, the command is stem, as in

stem(mydata$quality)

Stem and leaf displays, as implemented in R, are not meant for huge data sets. In other words, there is no procedure to combine several values into a single leaf. A few options give you some control over the display of the graph, though.

stem(mydata$quality, width=120, scale = 2)

will increase the standard width of 80 characters per line to 120 and in addition try to spread out the graph by producing more stems. (The latter option was used in the display on the left.)

Strip chart

Strip chart

Strip charts likewise are helpful particulary with small datasets. The basic command is, of course:

stripchart(mydata$quality)

I will typically help to "jitter" the data values, i.e. to add a small random component. Also, by default the strip plot is horizontal; to obtain the vertical chart that you can see here, another option is needed. So, all in all it goes like this:

stripchart(mydata$quality, method="jitter", vertical=T)

Some more options are available which you can learn about via the help system.

Histogram

Histogram

A histogram can easily adopt to large samples and can be obtained as follows:

hist(mydata$quality)

Various options help controlling the display. Perhaps the most important among these gives you control about whether the graphs shows absolute or relative frequencies:

hist(mydata$quality, freq=F)

Note that usually a histogram is somewhat more spread out in the horizontal direction. I made this one a bit tighter (just by narrowing the graph window!) in oder to provide more space for the text here.

Kernel density estimation

Kernel density plot

The minimum version is:

plot(density(mydata$quality))

The default kernel is Gaussian, and the bandwidth is computed automatically. Both can be controlled by the user. The command

plot(density(mydata$quality, bw=5, kernel="t"))

will set the bandwidth to 5 and make sure that a triangular kernel is used. An alternative method to control the bandwidth is to use adjust. Here you will indicate a factor by which the default bandwidth will be multiplied, as in

plot(density(mydata$quality, adjust=1.3, kernel="t"))

The available kernels can all be abbreviated up to the minimum of just the initial letter, as in the example above. The kernels, apart from the default gaussian, are as follows:

epanechnikov
rectangular
triangular
biweight
cosine
optcosine

You will have noted that density is preceded by plot in the example commands shown above. By itself, density will just compute the density estimates, and if assigned to an object, the result will be of class density, which is a list the structure of which you may find out by using str().

© W. Ludwig-Mayerhofer, R Guide | Last update: 30 Jan 2017