Crosstabulation is used to display the common distribution of two variables. In addition, tests of significance and measures of assocation may be requested.
tab var17 var18
tab being an abbreviation for
tabulate) will display a crosstabulation with counts only.
tab var17 var18, col
will display column percentages in addition to counts.
tab var17 var18, row nofreq gamma
will display row percentages, but no counts. In addition, Goodman and Kruskal's gamma together with its ASE will be displayed.
Other options to be added after the colon include:
chi2: Pearson's chi squared statistic
cchi2: Each cell's contribution to the chi squared statistic
lrchi2: Likelihood chi-squared statistic (not displayed if at least one cell does not contain an entry)
exact: Significance according to Fisher's exact test
cell: relative frequencies
expected: n of observations expected under the assumption of independence
taub: Tau b measure of association
V: Cramer's V
Note that for estimation of Kendall's tau-b, there is also a special procedure,
ktau, about which you can find more in the entry on correlations.
Note also the following options that refer to the display and/or output of the table:
missing: Missing values will be treated like any other value
nolabel: The values of the variable are displayed instead of the labels
nof: This was already mentioned above in its long form, nofreq. If used without any other option, nothing will be displayed!
Probability weights can be used with twoway crosstables via the svy prefix. Most of the options described above will not be available in this case. Note that with option
col, estimates of the column proportions will be computed, whereas without this option, the proportions estimated will refer to the entire sample. In other words, in the latter case the proportions of the entire table will sum up to 1.
Tables with more than two dimensions
For higher-dimensional crosstabulations the
by prefix may be used.
Alternatively you may use the
table command, but this way you can obtain only frequency counts (and summary statistics, see entry on summarize) but no percentages. A three-dimensional table would look like this:
table education gender country
Tables with even more dimensions can be created using the
by option, as in:
table education gender age, by(country)
Up to four variables may be included via
by. Alternatively, the
by prefix may be used. Finally, with the help of
foreach (not covered in this guide) a table can be repeated for a number of conditions.
Tables with two dimensions for more than two variables
tab we may use
tab2. With this command, more than two variables can be specified.
tab2 up85 up8601 up8602 up8603, row col taub
will produce all possible crosstabulations between the variables mentioned. Note the following useful option:
tab2 up85 up8601 up8602 up8603, firstonly row col taub
Here, all crosstabulations of up85 with the remaining variables will be displayed, with up85 as the row variable.
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 23 Apr 2017