Scatterplots

Scatterplots are obtained by plotting two numeric vectors, which of course may (but need not necessarily) represent two variables in a data.frame. Remember that plot is also a 'method'; it may behave very differently if the data do not belong to the data.frame class. Also, if one of the variables involved is defined as a factor, plot will create a different plot, depending on whether the factor is

plot(mydata$age , mydata$income)

will plot variable 'age' (from data.frame 'mydata') on the x axis, and the pertinent values of income on the y axis.

With large datasets, scatterplots may become less useful, as often many data points are overlapping. You might wish to try one of the following (note that the two libraries have to be installed if you have not yet done so):

library(ggplot2)
library(hexbin)
ggplot(mydata,aes(x=age,y=income)) + geom_point(alpha = 0.3)
ggplot(mydata,aes(x=age,y=income)) + stat_binhex()

The first ggplot does 'alpha blending', which makes each point somewhat transparent. Overlapping points will appear darker. The second command will produce hexagonal binning, a procedure similar to sunflower plots.

© W. Ludwig-Mayerhofer, R Guide | Last update: 02 Apr 2017