Basic Charts
Basic charts are
- Bar charts that exhibit the distribution of a categorical variable or the common distribution of two (or perhaps three) categorical variables;
- Scatterplots that exhibit relationships between metric variables, and
- Line charts that exhibit developments over time or exhibit single data values for a number of cases.
Histograms also belong to this type of charts; they are treated in the Exploratory Data Analysis section and can also be obtained via the Frequencies command. Other simple charts such as pie charts are also available, but often not recommended.
In the following, a few examples are mentioned. A full treatment is (presently) not possible here.
Please note: SPSS has added a lot a graphing possibilities recently; regrettably they have been changing rather fast (there was an IGRAPH command which then was deprecated and does not work anymore in version 18.0; version 18.0 has an GGRAPH comand instead). Most of the new stuff is not covered here, and consequently this section is not really up-to-date. Yet, what you'll find here still works, as far as I know.
Bar Charts
Example for a simple bar chart exhibiting percentages
GRAPH | |
/BAR(SIMPLE)=PCT BY var15. |
Example for a bivariate (stacked) bar chart
GRAPH | |
/BAR(STACK)=COUNT BY gender BY var15. |
Note that this bar chart displays the counts of the values of var15 by gender. It needs some further refinement to display the association of these variables in a meaningful way. This is explained in the next entry of this guide (Special: Bar Charts).
Scatterplots
Example for a scatterplot with two variables for all cases
GRAPH | |
/SCATTERPLOT(BIVAR)= var199 WITH var200. |
Note that the first variable is displayed on the x axis, and the second one on the y axis.
Example for a scatterplot with two variables, grouped by a grouping variable
GRAPH | |
/SCATTERPLOT(BIVAR)= var199 WITH var200 BY gender. |
The two groups will be distinguished by different colours. These, as well as the symbols used, may be changed interactively.
Labeling your data
GRAPH | |
/SCATTERPLOT(BIVAR)= var199 WITH var200 BY id (NAME). |
It is assumed that variable "id" contains labels for the data, for instance, country abbreviations like USA, D etc. These labels will be displayed near the symbols that represent the data values.
A matrix of scatterplots (version 17 or higher).
This is one of the things that work with newer version of SPSS only. This example uses a feature that was introduced in version 17. I am writing this section as older SPSS commands are not able to produce this graph, i.e. a number of scatterplots in one set-up. Note that the new graphics language is much more powerful than I can describe here.
I have reduced the necessary syntax as much as I could. Some explanations follow below.
GGRAPH | |
/GRAPHDATASET NAME="anyname-you-like" VARIABLES = var17 var20 var25 var40 | |
/GRAPHSPEC SOURCE=VIZTEMPLATE(NAME="Scatterplot Matrix (SPLOM)"[LOCATION=LOCAL] | |
MAPPING( "all"="var17" "all"="var20" "all"="var25" "all"="var40")) DEFAULTTEMPLATE=NO. |
This command will produce a scatterplot matrix of four variables, to wit, var17 var20 var25 var40. The GRAPHDATASET NAME
command does not make any sense here, but it is required. The part following GRAPHSPEC
is one way of informing SPSS that you want to produce a SPLOM. The final subcommand DEFAULTTEMPLATE=NO
allows the plot to have a larger size than usual, so inspection is easier.
Line Charts
Example for a line chart with a single variable
GRAPH | |
/LINE(SIMPLE)=VALUE( var11 ). |
This chart will display, for all cases in your data set, the value of var11. This could be especially meaningful if var11 is a metric variable (say, average monthly income, or percentage of whatsoever, etc.) and the cases in your data set represent a time series or something similar. Such a plot can be created more easily with a Time Series (or Sequence) chart (see below). However, a line chart with the GRAPH / LINE command is very helpful if you have raw data. Suppose that you have, for a couple of months, samples of persons who rate the performance of, say, the President of the United States, on a scale ranging from 1 to 5. The following command will display the percentage of persons who give the different ratings (stored in var17), per month:
GRAPH | |
/LINE(SIMPLE)=PCT BY month BY var17. |
Another important example might be regression diagnostics; you will want to display dfbetas in order to identify influential cases.Note that missing values cannot be excluded from this type of chart. Therefore you have to define a filter variable or use some other means to discard missing values. If you don't, your plot will look awkward.
Example for a line chart representing a distribution function
GRAPH | |
/LINE(SIMPLE)=CUPCT by educ. |
This chart will display the (empirical) distribution function – that is, a graphical representation of the cumulative percentage – of variable "educ". Note that the distribution function usually is represented as a step function. Therefore, you will have to elaborate on your chart: Double click to have the chart in a new window, select the line by pointing with the mouse on it and clicking once, then select "format" and "interpolation" from the menu (or so I hope; I have translated this backwards from German where fortunately the same words are used). In the dialog window that now appears select "step left", click on "apply" (German: "Zuweisen") and close the window again.
Example for a line chart with several variables
GRAPH | |
/LINE(MULTIPLE)= VALUE( var17 var29 var30 ). |
This chart will display, for all cases in your data set, the value of several variables. Note that a display of percentages of several variables (as explained in the previous example with respect to a single variable) is not possible.
Example for a sequence or time series chart
TSPLOT VARIABLES = var11 var17 var19 | |
/ID = year. |
This chart assumes that for each of the variables on the VARIABLES list, you have one value per year. That is, this will produce, for each variable, a "time series" chart. Note that data have to be SORTed by year (or whatever variable you wish to be displayed on the x axis). Note that this type of chart may not be available if you are using an old version of SPSS.
What does your chart look like?
Your chart may look fine on the screen. What's good on the screen may not necessarily be good for a graph that is to be printed. I cannot give detailed advice here as the default look of graphs change from time to time. For instance, in the version I am currently using, i.e. SPSS for Windows 15.0, charts have a grey background. In other words, contrast between data and background is reduced. This is blank nonsense if you have printed charts in mind.
Of course, I am inclined to say, you cannot change this easily (why not have simply a subcommand background=white
?). On the other hand, there is a not-so-complicated-after-all solution that helps permanently. This consists in saving a template that looks the way you want it to and to add a template
subcommand to your graphs command. There is even a default directory for templates – but do not think that if there is a default directory you have to specify only the file name. No, the full path has to be given. So in my case I always write template= 'c:\Programme\SPSS\Looks\WLM_styles.sgt'.
You may create different templates for different publishing specifications.
More on bar charts
The stacked bar chart mentioned above is the basic graphic equivalent to a crosstabulation. However, SPSS does not display the equivalent to column percentages by default; all you can get via commands are the absolute numbers. However, "column percentages" can be obtained via the special chart menu.
Note that the default display often looks beautiful if you have a colour printer. If your printer is black and white, it is usually recommendable (if there are not more than five categories in the dependent variable) to use different shades of grey (including perhaps black and white, if the respective areas are not too large). Again, you can change the default colours via the chart menu.
Usually, cases with missing values are excluded from the display. If you are specifically interested in the missing values, you just have to add a line / missing = report
before the command terminator.
More on scatterplots
Scatterplots may be used to display the relationship between so called Likert scaled items. However, they will by default not display the number of occurrences of the respective combinations of values. SPSS offers a sunflower plot to amend this (Note: it seems like the sunflower option has been removed from the most recent version; but perhaps I'm just too stupid to find it), but a better choice would be a "jittered" scatter plot. "Jittering" (i.e., adding a small random part to the data values) has to be done "manually" via the Compute
command.
© W. Ludwig-Mayerhofer, IGSW | Last update: 28 May 2012