Internet Guide to Stata |
Print article |
Logistic regression models deal with categorical dependent variables. Depending on the number of categories and on whether or not these categories are ordered, different models are available.
Example with variable "vote" (yes/no) as the dependent variable:
logit vote age education gender |
logistic vote age education gender |
logit vote age education gender, or |
The first command will produce the model estimates in terms of logit coefficients; the second and third command will yield what some people call "effect coefficients", i.e. the effect the independent variables have on the odds.
With Stata procedure mlogit, you may estimate the influence of variables on a dependent variable with several categories (such as "Brand A", "Brand B", "Brand C", "Brand D"). Note that if these categories are ordered (such as in statements like "strongly agree" ... "strongly disagree"), an ordered logistic regression model should usually be preferred.
Example
mlogit brand age sex class, baseoutcome (2) rrr |
The option baseoutcome is necessary only if you wish to depart from Stata's default, i.e., the most frequent category. Another option is rrr, which causes stata to display the odds ratios (and the associated confidence intervals) instead of the logit coefficients.
Actually, Stata offers several possibilities to analyze an ordered dependent variable, say, an attitude towards abortion. The most common model is based on cumulative logits and goes like this:
Example
ologit abortion age sex class, or |
Option or will again produce influences in terms of odds.
Probit models are alternatives to logistic regression models (or logit models). The commands for the binary, multinomial and ordered case go like this:
probit vote age education gender |
mprobit brand age sex class, baseoutcome (2) |
oprobit abortion age sex class |
Stata can compute the effects of independent variables on the outcome in terms of probabilities, both direct and by way of marginal effects (changes of probability).
margins sex | Margins for a categorical variable | |
margins, at(age=(10(10)80)) | Margins for a metric variable | |
margins, dydx(_all) atmeans | Marginal effects of all independent variables at the mean of other covariates | |
margins, dydx(_all) | Mean marginal effects of all covariates |
Margins are particularly important in the case of the multinomial model, as the regression coefficients may be very misleading. They must be obtained separately for each category of the dependent variable. This holds true for the ordinal model as well.
To achieve this, you can use all the commands described above, just adding an option indicating the category for which the margins are to be computed. There are two ways, which I will describe for the simplest case, a categorical independent variable:
margins sex, predict(outcome(#3)) | Margins for the third category (whatever its actual value) |
margins sex, predict(outcome(3)) | Margins for the category that is coded as "3" |
The significance tests on the coefficients based on the z statistic are not considered the best available. A superior test is based on the likelihood ratio statistic. Unfortunately, computation is a bit tedious. You have to save the estimates from your model first, then compute a constrained model (e.g. a model with one parameter set to zero, or actually a model with any constraints you like) and finally perform a LR test on both models. The procedure is as follows:
(m)(o)logit ... | Estimation of first model |
estimates store anyname | Estimates are stored in matrix "anyname" |
(m)(o)logit ... | Estimation of model with constraints |
lrtest anyname . | Performing the LR test; note the dot at the end which indicates that the last model estimated is to be tested against model "anyname". |
Of course, you may estimate several models, store the estimates (under different names) and test any models you like afterwards. Make sure that the models you estimate contain the same number of cases and always one model is nested within the other.
Another way would be simply to compute the LR test "by hand" (or rather by brain) using the log-likelihoods from the Stata output.
For many purposes, Stata's output concerning overall model fit is sufficient. Both the model chi-square (i.e., the LR test for the current model compared to the null model) and McFadden's Pseudo R-square are included in the standard output.
A number of additional statistics are available from the fitstat package by J. Scott Long and Jeremey Freese. This package may be installed as follows:
ssc install fitstat |
See help fitstat (after installation) for more details.
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 06 Jan 2013