Complex Samples

Data in the social sciences often do not come from a simple random sampling procedure. Rather, complex sampling frames may be involved. Also, data may come along with weights.

Cluster sampling

Cluster sampling means that individual cases are not selected individually; typically, first a number of regions (or schools, or other entities) is sampled (the primary sampling unit, or PSU) and then individuals within this PSU are selected.

To request an analysis that takes clustering into account, the variable indicating to which cluster an individual unit belongs is named in the VARIABLE command.

VARIABLE:
          NAMES ARE x1 x2 x3 region;
          CLUSTER IS region;

In the ANALYSIS command, you have to indicate:

ANALYSIS:
          TYPE IS COMPLEX;

which may of course supplemented by other subcommands.

Sampling weights

Sampling weights are likewise indicated in the variables command.

VARIABLE:
          NAMES ARE x1 x2 x3 wght;
          WEIGHT IS wght;

A quote from the User's Guide, version 5, on p. 457: "Sampling weights are available for ESTIMATOR=MLR, MLM, MLMV, WLS, WLSM, WLSMV, and ULS and for ESTIMATOR=ML when the BOOTSTRAP option of the ANALYSIS command is used. There are two exceptions. They are not available for WLS when all dependent variables are continuous and are not available for MLM or MLMV for EFA."

Sampling weights can be used with cluster sampling. In other words, CLUSTER and WEIGHT can be used together on the VARIABLE command.

© W. Ludwig-Mayerhofer, Mplus Guide | Last update: 23 Feb 2010