Linear Regression and Some Alternatives

Simple example


Keyword beta is required if you want to obtain standardized regression coefficients.

Example with estimation of robust (Huber-White) standard errors

regress DEPVAR INDVAR1 INDVAR2 INDVAR3, beta robust


Regression diagnostics and much else can be obtained after estimation of a regression model. Note that some statistics and plots will not work with survey data, i.e. if the svy option (see complex samples) was used. Here are some useful post-estimation commands:

estat hettest Breusch-Pagan/Cook-Weisberg test for heteroskedasticity.
estat vif 1/VIF for the independent variables.
rvfplot will display a plot of residuals vs. fitted values (helpful for assessing heteroskedasticity).
avplots will produce a tableau of added variable plots for all independen variables.
avplot experience will display an added variable plot for variable "experience".
avplot will display an added variable plot for the dummy variable that represents the category coded "3" of variable "group" (not the third value of this variable).
cprplot experience will produce a component plus residual plot for variable "experience". Options for this plot are available, such as "lowess" or "mspline".
Note that an "augmented component plus residual plot" is available with command acprplot. It is said to do better in detecting non-linearity.
predict cd1, cooksd saves the values of Cook's d in variable "cd1".
dfbeta computes dfbeta for all independent variables and stores the values in variables whose names are given in the output.
predict dfbe1, dfbeta(educ) saves the values of dfbeta for variable "educ" in variable "dfbe1".
estat ic displays the values of AIC and BIC in the output.
collin x1 x2 x3 produces additional statistics about collinearity, e.g., eigenvalues, condition number and the determinant of the correlation matrix.
Note that collin is an ado file which has to be downloaded (start with findit collin).

Alternatives to the regress command

Two or more dependent variables

You may estimate models where two or more dependent variables are regressed on the same set of predictors. The advantage over a series of regressions with a single dependent variable is that you may test effects across regression equations. I cannot go into details here and will leave you just with the basic command:

mvreg depvar1 depvar2 = ivar1 ivar2 ivar3

You will not always want to use the same set of predictors, and in this case, a procedure called "seemingly unrelated regression" is the method of choice.

sureg (depvar1 ivar1 ivar2) (depvar2 ivar2 ivar3)

Ridge regression

Some people recommend "ridge regression", particularly if collinearity is high (many others do not recommend it!). If you want to give it a try, there is an ado file ridgereg which may be obtained via findit ridgereg.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 26 Feb 2018