Crosstabulation

Crosstabulation is used to display the common distribution of two variables. In addition, tests of significance and measures of assocation may be requested.


Two variables

tab var17 var18

(with tab being an abbreviation for tabulate) will display a crosstabulation with counts only.

tab var17 var18, col

will display column percentages in addition to counts.

tab var17 var18, row nofreq gamma

will display row percentages, but no counts. In addition, Goodman and Kruskal's gamma together with its ASE will be displayed.

Other options to be added after the colon include:

  • chi2: Pearson's chi squared statistic
  • cchi2: Each cell's contribution to the chi squared statistic
  • lrchi2: Likelihood chi-squared statistic (not displayed if at least one cell does not contain an entry)
  • exact: Significance according to Fisher's exact test
  • cell: relative frequencies
  • expected: n of observations expected under the assumption of independence
  • taub: Tau b measure of association
  • V: Cramer's V

Note that for estimation of Kendall's tau-b, there is also a special procedure, ktau, about which you can find more in the entry on correlations.

Note also the following options that refer to the display and/or output of the table:

  • m or missing: Missing values will be treated like any other value
  • nol or nolabel: The values of the variable are displayed instead of the labels
  • nof: This was already mentioned above in its long form, nofreq. If used without any other option, nothing will be displayed!

Probability weights

Probability weights can be used with twoway crosstables via the svy prefix. Most of the options described above will not be available in this case. Note that with option col, estimates of the column proportions will be computed, whereas without this option, the proportions estimated will refer to the entire sample. In other words, in the latter case the proportions of the entire table will sum up to 1.


Tables with more than two dimensions

For higher-dimensional crosstabulations the by prefix may be used.

Alternatively you may use the table command, but this way you can obtain only frequency counts (and summary statistics, see entry on summarize) but no percentages. A three-dimensional table would look like this:

table education gender country

Tables with even more dimensions can be created using the by option, as in:

table education gender age, by(country)

Up to four variables may be included via by. Alternatively, the by prefix may be used. Finally, with the help of foreach (not covered in this guide) a table can be repeated for a number of conditions.


Tables with two dimensions for more than two variables

Instead of tab we may use tab2. With this command, more than two variables can be specified.

tab2 up85 up8601 up8602 up8603, row col taub

will produce all possible crosstabulations between the variables mentioned. Note the following useful option:

tab2 up85 up8601 up8602 up8603, firstonly row col taub

Here, all crosstabulations of up85 with the remaining variables will be displayed, with up85 as the row variable.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 23 Apr 2017