# Correlation

## Metric variables

correlate f17-f25 f27

pwcorr f17-f25 f27

will both do the same thing – display the matrix of correlations between variables f17 to f25 and f27.

There are two kinds of difference between both commands. The first one is that with "corr", Stata uses listwise deletion. That is, the correlation matrix is computed only for those cases which do not have any missing value in any of the variables on the list. In contrast, "pwcorr" uses pairwise deletion; in other words, each correlation is computed for all cases that do not have missing values for this specific pair of variables.

Another difference are the options associated with each command. The most important are:

corr f17-f25 f27, m

will display mean, standard deviation, minimum and maximum of each variable.

corr f17-f25 f27, c

will display the covariance matrix instead of the correlation matrix. Of course, m and c may be combined.

pwcorr f17-f25 f27, o sig p(.1) star(.05)

will display the number of observations for each correlation and the level of significance. The option p(.1) tells Stata to display only correlations with a significance level of .1 or better (i.e. lower), and star(.05) requests Stata to display a star with each correlation that is significant at .05 or better. Again, any combination of these options is possible.

## Non-metric variables

### Binary variables

One possibility to deal with binary variables is to see them as a resulting from an underlying continuous variable, with respondents below a certain cut-off point responding with "0" (or whatever the lower value may be) and those at or above the cut-off responding with "1" (or, more generally, the higher category.

tetrachoric t1 t2 t3, pw stats(rho se obs p)

will compute the tetrachoric correlations, their standard errors, the number of observations and the p-value, using pairwise deletion. If only two variables are on the variables list, the `stats`

option can be omitted, because all necessary information will be displayed by default. Note that a few other options are available.

A traditional measure for association of binary variables is phi, a chi-square based statistic that is numerically equivalent to Pearson's r. This can be obtained via the `V`

option for crosstabulation (V is for Cramer's V, which in the case of a 2 x 2 table is equivalent to phi).

### Ordinal variables

Ordinal variables (like the usual Likert scaled attitude items) can also be considered as expression of an underlying continuous attribute. In this case the *polychoric correlation* is a good approximation of the correlation of the underlying continuous properties.

The polychoric correlation is not included as a standard procedure in Stata.

findit polychoric

will inform you how to download the procedure (you need Stata 8.2 or higher for the procedure to work). Afterwards,

polychoric var24a-var24g var24j var24m

will compute the requested polychoric correlations. Note that computation is based on an iterative procedure and therefore may take a few minutes if a large number of correlations is requested.

The Stata help is somewhat confusing as to how variables are treated. It says:

"If the number of the categories of one of the variables is greater than 10, polychoric treats it is (sic) continuous, so the correlation of two variables that have 10 categories each would be simply the usual Pearson moment correlation found through correlate."

Actually, in the first sentence it should read " ... is 10 or more", and thus the second sentence is correct. Note that as a consequence, the polyserial correlation is computed if one variable has less than 10 categories and the other has 10 or more.

More traditional measures of association between ordinal variables are also available:

For estimation of *Kendall's tau-b*, there is a procedure, somewhat slow, that permits computation of one or several correlations (without the associated crosstabulations) and that includes the tau-a coefficient (if desired); it goes like this:

ktau var17 var18 var20, pw stats(taua taub obs p)

With only two variables, the `stats`

option is not necessary. `pw`

is for pairwise deletion and of course may be omitted if you which to use listwise deletion of variables. Also, further options are available. Note that tau-b can also be obtained together with a crosstabulation of two variables.

*Spearmans rho* can be obtained via:

spearman var17 var18 var20, pw stats(rho obs p)

Again, with only two variables, the `stats`

option is not necessary. For `pw`

, see the previous entries. Likewise, furthers options are available.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 29 Aug 2010