Factor Variables

Please note that the material in this section is valid as of Stata version 11!

For all (or most) regression models, some commands are provided that tremendously ease the construction of complex variables. By this, I mean things such as the construction of a series of dummy variables, interaction effects, quadratic effects (which are interactions of a variable with itself), cubic effects and so on. Of course, we can construct the pertinent variables prior to estimation with the help of generate/replace or by related means. But if we just want to check, say, whether an interaction effect is present, we may do so "on the fly" by using some of the following possibilities.

What follows refers to terms that can be included "as is" in the list of independent variables in regression models. help fvvarlist will provide more information.


c.age#c.age

will include the variable "age squared" (or age times age) in the equation. More generally, the # sign may be used to create interaction effects (age squared is just an interaction of age with itself). "c." of course is for "continuous".

i.occupation

will create a series of dummy variables from variable "occupation".

ib2.occupation

will create a series of dummy variables from variable "occupation" with the category "2" as base category. In contrast,

ib(#2).occupation

will create a series of dummy variables from variable "occupation" with the second category (whatever it is) as base category.
The "ib#." and the "ib(##)." prefixes can also be used in the more complex terms that follow.

i.occupation#i.gender

will create a series of interaction effects for "occupation" and "gender", both treated as categorical variables, with dummies created from occupation (and gender, if it has more than two categories).
Note that "i." may be omitted; Stata will assume that both variables are to be treated as categorical if there is no prefix.

occupation##gender

is an abbreviation of i.occupation i.gender occupation#gender; i.e., series of dummy variables will be built from occupation and gender, and on top a set of interaction effects will be created.

c.age#gender

will build interaction effects for age and gender; normally, there will be two interactions (one for men and one for women). If age is also included in the equation, one of these interactions will be omitted from the estimation.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 02 Aug 2015