Loops (foreach and forvalues)

Occasionally, a step in your work (a piece of data transformation, some analyse) has to be performed repeatedly, with some slight variation. Imagine that you wish to do several regression analyses with a given set of independent variables, for instance, in order to investigate the effect of these variables on a series of outcomes. (Note that for a fully simultaneous analysis of several outcomes, perhaps procedure sureg could be used; let's just suppose that this is not what we want to do here.) Of course, you might write the first command, copy the command line, exchange the name of the dependent variable, and so on. But loops sometimes make things easier.

There are two ways of defining loops: foreach refers to a list of elements to be enumerated, whereasforvalues refers to a range of numbers with the effect that what follows is executed on each of these numbers. The following examples hopefully will clarify.

Loops with foreach

Let's use the example with which I started this entry. Suppose that you wish to investigate the effects of income, family status and gender on a range of dependent variables (of course this example is grossly simplified). Using foreach, this can be achieved like this:

foreach x in excl depr satisf happy activ {
    regress `x' income i.famstat gender

This will run regression analyses of the variables "excl", "depr" and so on on the set of independent variables. The variable names on the list that follows "foreach x" will successively replace the x in the line that starts with regression. Instead of x, any other character or string of characters might be used.

Be sure to note two things: First, the braces and their placement. The list of elements on the list is followed by an opening brace (on the same line!). The command(s) to be executed follow(s) on the next line(s) – actually, several commands may follow which all will be repeated while Stata is working through the list of elements on the list. The closing brace follows on an extra line at the end.

Second, within the braces, x (or whatever you may use in its place) is surrounded by an grave accent, or gravis, on the left, and an apostrophe on the right. (I just learned that programmers call it backquote, backtick or backgrave – not to be mistaken for backstop [just kidding]). Note that the Stata handbook (and other Stata sources) refer to the grave accent by the name of "single left quote", but at least on German QUWERTZ keyboards (and for German users who use commas for single left quotes) I think it is more correct to talk about the grave accent (or gravis, in French: accent grave). Be it as it may, look at the glyph in the command above – what you see here is correct, whereas the Stata handbook often looks misleading (for German users!).

The list of items that follows foreach in need not be a list of variables; it may be any list of elements. Look at the next example which is a special case of these elements being a series of numbers. If this is not the case, stick with foreach, but otherwise proceed the same way.

Loops with forvalues

Sometimes you have a situation where something, e.g., a list of variables, is related to a list of consecutive numbers. Here's a real example from my recent work (some rather awkward variable names have been simplified here). In survey data, you often find information about the members of the household the respondent is living in. In this case, information about the current status of the members was stored in variables "hhms_2" to "hhms_10" (the first household member is the respondent him- or herself!). I wished to create, for each household member, a dummy variable indicating whether or not they were employed (with various types of employment coded as 1 to 4). One of several ways to do this was:

forvalues i= 2/10 {
    gen emp'i' = inrange(hhms_`i',1,4)

The numbers 2 to 10 are referred to by the index "i" in this example, but instead of "i" you might use any character or string of characters (such as, e.g., "number"), or even numbers. Within the loop, the index is surrounded by the grave accent (or gravis) on the left and the apostrophe on the right. Attention: The first grave accent, i.e. the one following "emp", looks like an apostrophe here due to some technical problems I encountered, but in real life it should always be a grave accent.

The result is a number of new variables, called "emp2" to "emp10", with values of 1 if the respective household member is employed and 0 otherwise.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 03 Feb 2019