Intraclass Correlation

The intraclass correlation coefficient, or ICC, is computed to measure agreement between two or more raters (judges) on a metric scale. The raters build the columns of the data matrix, each case is represented by a row. There may be two raters or or more.

Typical example:

RELIABILITY
   /VARIABLES=r1 r2
   /ICC TYPE(ABSOLUTE).

Further example:

RELIABILITY
   /VARIABLES=r1 r2
   /ICC =MODEL(RANDOM) TYPE(CONSISTENCY) CIN=90 TESTVAL=0.4 .

What I have termed the typical example in my opinion reflects the most likely use of the ICC: There is a number of raters, for example, psychologists or physicians participating in a study, and we want to know whether the diagnoses or assessments they are making are in agreement. We do not only wish them to agree in relative terms, that is, that if rater A judges a patient to be more sick than another one, rater B (and C, D and so on) should arrive at the same judgment; rather we wish them to agree in absolute terms. In other words: If rater A thinks a patient is "very sick", rater B (and C, D etc.) should arrive at the same conclusion, and the same should be the case for "medium", "weak" and "no symptoms" – or whatever your classification is. In clinical studies this is normally of great importance, as patient selection and/or outcome measures depend crucially on such judgments.

In terms of SPSS, this is a "mixed effects model with absolute agreement". It is called mixed effects because the raters (judges) are not considered a random sample; we do not wish to make inference about the universe of all possible raters, but rather about those particular individuals at hand. In contrast, the "objects" (e.g., patients) on which the judgments are made are (more or less) a random sample. And it is about "absolute agreement" for the reason outlined in the previous paragraph; raters should not only be consistent, but rather agree in absolute terms. Note that to get a "mixed model", no keyword is needed as this is the default.

In the "further example", SPSS is asked to use a random model which assumes that also the raters are a random sample from a larger pool of raters, but it tests only for consistency of judgments. In addition some specifications are made concerning inferential statistics. Normally, SPSS computes a 95 per cent confidence intervall for the ICC; this value can be changed via keyword CIN. Likewise, by default SPSS computes a test whether the observed ICC is significantly different from zero; keyword TESTVAL can change the value against which the observed ICC is tested (in this example, to 0.4).

Note that SPSS displays two ICCS, one concerning single measures and one concerning average measures. The latter is a measure of the reliability of all measures combined; it is close to Cronbach's alpha. It should be used only if also in the application of the measures under scrutiny the assessments of all raters are combined into a single measure, which is rarely the case. Normally, the rater agreement is investigated in order to find out whether we can assume that the judgment of one rater is the same as that of the others. If this is the case, the "single measures" ICC is the appropriate coefficient.

© W. Ludwig-Mayerhofer, IGSW | Last update: 26 Apr 2006