Logistic Regression and Related Models
Logistic regression models deal with categorical dependent variables. Depending on the number of categories and on whether or not these categories are ordered, different models are available.
Binary logistic regression
Here are three examples with variable "vote" (yes/no) as the dependent variable:
logit vote age education gender
logistic vote age education gender
logit vote age education gender, or
The first command will produce the model estimates in terms of logit coefficients; the second and third command will yield what some people call "effect coefficients", i.e. the effect the independent variables have on the odds.
Alternatively, you may write
logistic vote age education gender
logit will "translate" the immediately preceding model (with effect coefficients) into a model with logit coefficients.
Multinomial logistic regression
With Stata procedure
mlogit, you may estimate the influence of variables on a dependent variable with several categories (such as "Brand A", "Brand B", "Brand C", "Brand D"). Note that if these categories are ordered (such as in statements like "strongly agree" ... "strongly disagree"), an ordered logistic regression model should usually be preferred.
|mlogit brand age sex class, baseoutcome (2) rrr|
baseoutcome is required only if you wish to depart from Stata's default, i.e., the most frequent category. Another option is
rrr, which causes stata to display the odds ratios (and the associated confidence intervals) instead of the logit coefficients. Note that for some strange reasons the odds are called "relative risks" here (hence the name of the option), but the formula in the handbook shows that it's all about the odds, as you might expect.
Ordered logistic regression
Actually, Stata offers several possibilities to analyze an ordered dependent variable, say, an attitude towards abortion. The most common model is based on cumulative logits and goes like this:
|ologit abortion age sex class, or|
or will again produce influences in terms of odds.
Probit models are alternatives to logistic regression models (or logit models). The commands for the binary, multinomial and ordered case go like this:
|probit vote age education gender|
|mprobit brand age sex class, baseoutcome (2)|
|oprobit abortion age sex class|
Interpretation of effects with "margins"
Stata can compute the effects of independent variables on the outcome in terms of probabilities, either literally (predicted probabilities) or as marginal effects (predicted changes of probability).
Margins for models with binary dependent variables
|margins sex||Margins for a categorical variable|
|margins, at(age=(10(10)80))||Margins for a metric variable|
|margins, dydx(_all) atmeans||Marginal effects of all independent variables at the mean of other covariates|
|margins, dydx(_all)||Mean marginal effects of all covariates|
Margins for dependent variables with more than two categories
Margins are particularly important in the case of the multinomial model, as the regression coefficients may be very misleading. They must be obtained separately for each category of the dependent variable. This holds true for the ordinal model as well.
To achieve this, you can use all the commands described above, just adding an option indicating the category for which the margins are to be computed. There are two ways to achieve this which I will describe for the simplest case, a categorical independent variable:
|margins sex, predict(outcome(#3))||Margins for the third category (whatever its actual value)|
|margins sex, predict(outcome(3))||Margins for the category that is coded as "3"|
Tests of significance
The significance tests on the coefficients based on the z statistic are not considered the best available. A superior test is based on the likelihood ratio statistic. Unfortunately, computation is a bit tedious (unless you resort to the procedure
lrdrop1 described belowed, which has its own drawbacks): You have to save the estimates from your model first, then compute a constrained model (e.g. a model with one parameter set to zero, or actually a model with any constraints you like) and finally perform a LR test on both models. The procedure is as follows:
||Estimation of first model|
||Estimates are stored in matrix "anyname"|
||Estimation of model with constraints|
||Performing the LR test; note the dot at the end indicating that the last model estimated is to be tested against model "anyname".|
Of course, you may estimate several models, store the estimates (under different names) and test any models you like afterwards. Make sure that the models you estimate contain the same number of cases and always one model is nested within the other.
Another way would be simply to compute the LR test "by hand" (or rather by brain) using the log-likelihoods from the Stata output.
This ado file, which can be installed via
ssc install lrdrop1, computes LR tests for all variables in the model. However, it was written at the end of the last millenium and it does not support automatically created factor (dummy) variables or interaction effects. However, if you create the respective variables prior to your model step, it will work out fine in most cases. Note that multinomial logit commands must be preceded by prefix
version 10:(some lower versions will work as well) in order for
lrdrop1not to abort with an error message.
Measures of fit
For many purposes, Stata's output concerning overall model fit is sufficient. Both the model chi-square (i.e., the LR test for the current model compared to the null model) and McFadden's Pseudo R-square are included in the standard output.
A number of additional statistics are available from the fitstat package by J. Scott Long and Jeremey Freese. This package may be installed as follows:
|ssc install fitstat|
help fitstat (after installation) for more details.
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 19 Jan 2020