Internet Guide to Stata
Print article

Logistic regression and related models

Logistic regression models deal with categorical dependent variables. Depending on the number of categories and on whether or not these categories are ordered, different models are available.

Model overview

Binary logistic regression

Example with variable "vote" (yes/no) as the dependent variable:

logit vote age education gender
logistic vote age education gender
logit vote age education gender, or

The first command will produce the model estimates in terms of logit coefficients; the second and third command will yield what some people call "effect coefficients", i.e. the effect the independent variables have on the odds.

Multinomial logistic regression

With Stata procedure mlogit, you may estimate the influence of variables on a dependent variable with several categories (such as "Brand A", "Brand B", "Brand C", "Brand D"). Note that if these categories are ordered (such as in statements like "strongly agree" ... "strongly disagree"), an ordered logistic regression model should usually be preferred.

Example

mlogit brand age sex class, baseoutcome (2) rrr

The option baseoutcome is necessary only if you wish to depart from Stata's default, i.e., the most frequent category. Another option is rrr, which causes stata to display the odds ratios (and the associated confidence intervals) instead of the logit coefficients.

Ordered logistic regression

Actually, Stata offers several possibilities to analyze an ordered dependent variable, say, an attitude towards abortion. The most common model is based on cumulative logits and goes like this:

Example

ologit abortion age sex class, or

Option or will again produce influences in terms of odds.

Probit models

Probit models are alternatives to logistic regression models (or logit models). The commands for the binary, multinomial and ordered case go like this:

probit vote age education gender
mprobit brand age sex class, baseoutcome (2)
oprobit abortion age sex class

Interpretation of effects with "margins"

Stata can compute the effects of independent variables on the outcome in terms of probabilities, both direct and by way of marginal effects (changes of probability).

Margins for models with binary dependent variables
margins sex Margins for a categorical variable
margins, at(age=(10(10)80))   Margins for a metric variable
margins, dydx(_all) atmeansMarginal effects of all independent variables at the mean of other covariates
margins, dydx(_all) Mean marginal effects of all covariates
Margins for dependent variables with more than two categories

Margins are particularly important in the case of the multinomial model, as the regression coefficients may be very misleading. They must be obtained separately for each category of the dependent variable. This holds true for the ordinal model as well.

To achieve this, you can use all the commands described above, just adding an option indicating the category for which the margins are to be computed. There are two ways, which I will describe for the simplest case, a categorical independent variable:

margins sex, predict(outcome(#3))    Margins for the third category (whatever its actual value)
margins sex, predict(outcome(3)) Margins for the category that is coded as "3"

Tests of significance

The significance tests on the coefficients based on the z statistic are not considered the best available. A superior test is based on the likelihood ratio statistic. Unfortunately, computation is a bit tedious. You have to save the estimates from your model first, then compute a constrained model (e.g. a model with one parameter set to zero, or actually a model with any constraints you like) and finally perform a LR test on both models. The procedure is as follows:

(m)(o)logit ...     Estimation of first model
estimates store anyname   Estimates are stored in matrix "anyname"
(m)(o)logit ... Estimation of model with constraints
lrtest anyname .    Performing the LR test; note the dot at the end which indicates that the last model estimated is to be tested against model "anyname".

Of course, you may estimate several models, store the estimates (under different names) and test any models you like afterwards. Make sure that the models you estimate contain the same number of cases and always one model is nested within the other.

Another way would be simply to compute the LR test "by hand" (or rather by brain) using the log-likelihoods from the Stata output.

Measures of fit

For many purposes, Stata's output concerning overall model fit is sufficient. Both the model chi-square (i.e., the LR test for the current model compared to the null model) and McFadden's Pseudo R-square are included in the standard output.

A number of additional statistics are available from the fitstat package by J. Scott Long and Jeremey Freese. This package may be installed as follows:

ssc install fitstat

See help fitstat (after installation) for more details.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 06 Jan 2013