Graphing Univariate Distribution
This entry explains some fairly standard graphs. Note that more univariate charts can be found in the following entries. Bar charts that describe (absolute or relative) frequencies are described in the entry about frequency tables.
The data correspond roughly to the Medicaid program quality index data that can be found in Jacoby, William G. (1997). Statistical Graphics for Univariate and Bivariate Data (Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-117). Thousand Oaks: Sage, pp. 37-8.
Stem and leaf diagram
13 | 3 14 | 16 15 | 89 16 | 0679 17 | 123677 18 | 0134 19 | 01222566 20 | 1279 21 | 3799 22 | 022489 23 | 5 24 | 57 25 | 3 26 | 014 |
Unsurprisingly, the command is stem(mydata$quality) Stem and leaf displays, as implemented in R, are not meant for huge data sets. In other words, there is no procedure to combine several values into a single leaf. A few options give you some control over the display of the graph, though. stem(mydata$quality, width=120, scale = 2) will increase the standard width of 80 characters per line to 120 and in addition try to spread out the graph by producing more stems. (The latter option was used in the display on the left.) |
Strip chart
Strip charts likewise are helpful particulary with small datasets. The basic command is, of course: stripchart(mydata$quality) I will typically help to "jitter" the data values, i.e. to add a small random component. Also, by default the strip plot is horizontal; to obtain the vertical chart that you can see here, another option is needed. So, all in all it goes like this: stripchart(mydata$quality, method="jitter", vertical=T) Some more options are available which you can learn about via the help system. |
Histogram
A histogram can easily adopt to large samples and can be obtained as follows: hist(mydata$quality) Various options help controlling the display. Perhaps the most important among these gives you control about whether the graphs shows absolute or relative frequencies: hist(mydata$quality, freq=F) Note that usually a histogram is somewhat more spread out in the horizontal direction. I made this one a bit tighter (just by narrowing the graph window!) in oder to provide more space for the text here. |
Kernel density estimation
The minimum version is: plot(density(mydata$quality)) The default kernel is Gaussian, and the bandwidth is computed automatically. Both can be controlled by the user. The command plot(density(mydata$quality, bw=5, kernel="t")) will set the bandwidth to 5 and make sure that a triangular kernel is used. An alternative method to control the bandwidth is to use plot(density(mydata$quality, adjust=1.3, kernel="t")) |
The available kernels can all be abbreviated up to the minimum of just the initial letter, as in the example above. The kernels, apart from the default gaussian
, are as follows:
epanechnikov |
rectangular |
triangular |
biweight |
cosine |
optcosine |
You will have noted that density
is preceded by plot
in the example commands shown above. By itself, density
will just compute the density estimates, and if assigned to an object, the result will be of class density
, which is a list the structure of which you may find out by using str()
.
© W. Ludwig-Mayerhofer, R Guide | Last update: 30 Jan 2017