Exploratory Data Analysis

EDA provides important first insights into the structure of your data. Especially in the case of metric or continuous variables with many values, EDA is preferable to other procedures (such as frequency tables). The most important means of EDA are stem-and-leaf plots and box-and-whisker plots (henceforth box plots). EDA procedures in SPSS also provide the most important sample statistics.

Example for stem-and-leaf plots with sample statistics

EXAMINE var203
  / PLOT STEM.

Example for comparing groups via box plots

EXAMINE var203 BY gender
  / STAT NONE
  / NOTOTAL
  / PLOT BOX.

The EXAMINE command without any additional keywords provides sample statistics, stem-and-leaf plots and box plots for the entire group as well as for any subgroups which result from grouping BY a variable (or perhaps even more variables after additional BY keywords). Sometimes not all of this output is desired, or more output is desired. Therefore, you may use the following keywords to specify the required output or to suppress parts of it:

keyword result
stat none Suppresses descriptive statistics
plot stem Suppresses box plot (and provides stem-and-leaf plot only)
plot stem hist Displays histogram in addition to stem-and-leaf plot
plot box Suppresses stem-and-leaf plot (and provides box plot only)
nototal (or not) Supresses output for entire sample if only results for subgroups are desired
percentiles (or perc) Provides 5, 10, 25, 50, 75, 90, 95 percentiles (by default; other percentiles may be requested by specifying them after the perc keyword)

If you want to use more than one "by" variable please be sure to enumerate them in the right order. The following example will first divide the data by country and then by gender. That is, the plot that results will have several groups (the countries) and within each group the plots of women and men will appear side by side.

.
EXAMINE var203 BY country BY gender
  / STAT NONE
  / NOTOTAL
  / PLOT BOX.

If in contrast you would like to appear first the values of all men (or women, depending on the way gender is coded in your data) by country and next those of all women (or men) by country, because you want to compare how the genders differ by country, the order of variables has to be reversed:

EXAMINE var203 BY gender BY country
  / STAT NONE
  / NOTOTAL
  / PLOT BOX.

© W. Ludwig-Mayerhofer, IGSW | Last update: 30 Oct 2005