Welcome to my revamped Stata Guide!
This guide is supposed to work as a brief "online help" for Stata for Windows that makes specific use of the possibilities of the internet. Its aim is to provide an intermediate road to learning Stata that hopefully is especially convenient for newbies (even though not for absolute beginners). Throughout, it is assumed that users have already mastered the statistical procedures I am dealing with, as no explanations of these are given. All you can learn here is how to put things into practice with Stata.
Please note also that this guide omits some parts of working with Stata; e.g., nothing is said about entering data. All in all, you will gain little in-depth knowledge about the software. This is mainly due to its richness and complexity. You will note that this guide by no means is small; yet, it covers only a tiny percentage of what Stata has to offer. If you consider that the handbook to Stata Release 16 consists of 30 volumes, most of them comprising several hundred pages (and one of them, the base reference manual, almost 3,000), you may become aware of how small my effort is.
My heartfelt thanks to David Peplow who did a great job at re-designing this (and other) project(s).
Wolfgang Ludwig-Mayerhofer
How this guide works
The main goal of this guide is to give examples for the most common Stata procedures. Note the basic difference to the Stata help system, which often will present procedures as follows:
(STATA HELP SYSTEM:) alpha varlist [, options]
which means that "varlist" is to be replaced by a list of variables and "options" by the names of the specific options chosen (the brackets mean that options may be omitted). This guide will typically give an example -- instead of "varlist" you will find a list of variables, and you may also find one or several options that seem helpful to me, as in
(THIS GUIDE:) alpha trust1 trust2 trust3, i g(trust)
A note on different versions of Stata. As far as I could check, all of the examples I provide should work with Stata for Windows, version 14. It should also work with newer versions (currently, the latest version is 16), but new stuff from higher versions does enter this guide only slowly (whenever this occurs, I will try to indicate it appropriately). Most of the stuff will also work with older versions (say, version 10 or even older), but I cannot guarantee this. (Stata regrettably is not very good at retrospectively documenting when particula changes have been introduced).
How reliable is this guide? Well, apart from the odd typo or some unexplainable sloppiness on my part, almost everything you find here will work. The reason is quite simple: This guide arises from my own work; that is, it is primarily motivated by my own desire to put down what I have found out about Stata in order to retrieve it whenever needed. Publishing this stuff on the internet is just a way of sharing what I have learned. A possible downside is that what you find here may sometimes be incomplete or superficial, as an entry may have been written at a time when I had not yet a full understanding of the matter under consideration.
All in all, please bear in in mind: What you find here is the work of a single person who has many other duties to comply with. This guide is just a spin-off of the data analysis work I am doing and which is my main business – or rather should be, apart from dealing with the university administration, filling in forms, applying for this and that, commenting on the latest ideas of my department, my colleagues, my dean, other deans, the rector, the vice-rector, the vice-vice-rectors, and many others about how to render our university more up-to-date, designing new courses of study, preparing for meetings, going to meetings, thinking about the consequences of meetings, trying to figure out whether and when my university has already given me the money it has promised me, trying to figure out how much I will have to pay for my staff this year (last year, the administration started accounting in autumn, that is, after about three quarters of the money had already gone with me just guessing how much it might be), and so on. Therefore, there may be a lot that you will find wanting in this guide; this refers not only to content, but also to language (including typos) and design. Please accept my apologies. -- Postscript: This paragraph obviously was written during my very active years as a university professor. Regrettably, while I have more time now (i.e., since 2020) I have decided to use this time in more pleasant ways. In other words, my preferences have shifted somehow to other stuff (such as listening to or making music, reading, enjoying the sun, cooking, and a few others).
Links
Perhaps I will collect links at a later stage. For the time being, just use this page from Stata Corp. which provides links to helpful resources for learning Stata.
Acknowledgements
March 2024: It's now 15 years that this guide has been online and finally someone, namely Ahmet Toprak from Santa Clara County, was kind enough to send me an email to point out some languages mistakes he found in a paragraph in the entry about Stata output. Alas, I am sure that there are hundreds of attacks on the English language left, and everybody is invited to come forward with more hints of this sort.
2016: As mentioned in the introductory section, David Peplow undertook a complete re-design of this guide. In particular, he adapted the design to the age of the smartphone. He also wrote a program that helps me to adapt the guide to further enlargements (which, however, are extremely unlikely).
History
Note: Only 'major' changes (new keywords, sizable additions to entries that already exist) are reported here. I try to indicate the date of the most recent change at the bottom of each entry, but I am only moderately good at this. Particularly minor corrections of typos may thus go unnoticed. Note also that there may be a time lag (hopefully short) between the creation (or the changing) of a file and its upload.
July 2022
Very little has happened for almost two years (in contrast to my hopes from two years ago). Probably not much more is going to happen, as I'm retired now, with the consequence that I use Stata only occasionally and have little opportunity to learn something new which I could share.
September 2020
Added an entry on winsorizing and trimming. Slightly enlarged the entry on output.
July 2020
I am hoping to find time during the next weeks or months to check this guide for typos and other mistakes. No thorough changes are planned, even though I have some (as yet untested) ideas in mind how to improve this guide. Oh, and Stata 16 has been out for quite some time, but nothing of the new stuff is considered here.
January 2020
This is just to show you that I'm still alive. But I almost never work on this (or any other) project, and only minor changes are made every now and then.
March 2019
Minor additions to the entries about univariate and bivariate charts.
February 2019
I've started a section "Elements of Programming". It is short, it's really only about "elements" and not about "real programming" and I do not plan do develop this much further. However, to create a section helps me (and others) not to overlook this topic.
December 2018
I'm still alive, even though you may not have noticed. Very rarely, I make some additions or amendments. For instance, I just have added a paragraph or two on line plots to the entry about twoway charts, and the entry about "Lines, Symbols, etc." now actually has a few paragraphs on marker symbols. But yes, all of this is very minor.
May 2018
Not much has happened since last spring. Stata version 15 is out since last summer, but as yet this guide refers to version 14 at best. Nevertheless, a few minor additions are being made every now and then.
Spring 2017
Amendments are being made here and there. Most recently, I have added a new entry (at Basics) on accessing Stata results that are stored in memory. Also, the entry on confidence intervals with command ci has been updated to accommodate the changes introduced with version 14.
November 2016
Re-design of this guide.
August 2016
Added a few entries about the analysis of spatial data.
January 2016
Expanded the entry on string variables.
October 2015
Rearranged entries on graphs: Univariate and bivariate (twoway) graphs are now treated separately. The reason behind this is that the earlier entry had grown quite long; in other words, both entries now are somewhat enlarged (and updated) in comparison to previous versions.
August 2015
The entry on overlaying graphs has been expanded. It's still very small, but previously it was not even a stub, just a dummy.
July 2015
The information about multiple imputation has been expanded considerably. Most notably, whereas up to now there was only a brief entry about the analysis of multiply imputed data, now there is also a (somewhat larger) entry about the imputation step, including the preparatory steps.
June 2015
Introduced a new section about estimation in the "data analysis" part. This section does not contain much new stuff (with the exception of a few estimation commands for univariate statistics), but I think that it helps to better focus the attention on estimation issues and the many possibilities Stata offers. To achieve this, the entries on using survey design information (the svy
command) and on multiple imputation have been moved to this new section, together with a few other entries.
Additionally, some other entries have been expanded or amended. This applies particularly to the section about data transformation, which has several new entries and has been rearranged.
April and May 2015
Minor updates, additions or corrections have been made to a few entries.
March 2015
Added a few extensions to the egen
section of the entry on creating and modifying variables.
June 2014
Re-arranged this guide, creating an extra section for graphs, as things started to get very complex and very long. But as yet there is little that is new contentwise. All I have added are a few remarks about combining graphs. Also, I added a few sentences about line patterns. I really hope that more is going to come.
February 2014
Augmented the (previously extremely short) section about string variables in the entry on "data types" to accommodate for the changes introduced in version 13.0, transforming it into a what now might be called a very short section.
December 2013
Just started an entry on probability distributions (to be found under the heading of "functions").
November 2013
Yeah, I'm still alive. But currently I do not work much with Stata, and if do, I rarely learn something new – and so there is little I can add to this guide. But I just augmented the entry about Stata output by a section that deals with the question of how to obtain elements stored in Stata's memory after estimation (and some other) procedures.
You may also wish to learn that I have acquired, and now work with, Stata 13.1. But as yet, nothing that is new to that version has entered this guide, and I am also happy to say that the Stata folks have refrained from implementing drastic changes concerning the appearance or the basic handling of Stata. So I think there is nothing, or at least not much, I have to change in my little guide at the moment.
June 2013
Added a section on (one specific type of) global macros, or "globals", to the entry "Working with Stata". Also reorganized and slightly expanded the section on graph options.
January 2013
Enlarged what up to now was the entry on the multinomial logit to include models for binary and ordinal variables. Also, more material on the postestimation phase is included now.
July 2012
Minor additions to entry about basic charts and graphs.
June 2012
Expanded the entry about data and storage types to include changes of storage type. Added entries about constraints and about count data models.
May 2012
Created an entry about data and storage types and an (as yet small) entry about collapsing and contracting datasets. Updated the entry on merging data to cover the new syntax available as of version 11.0 and added some information about appending datasets. Started an entry about parametric regression models for time-to-event data.
March 2012
Enlarged the entry about creating and modifying data. Added some clarifications to the entry on missing values.
February 2012
Enlarged the entry about life tables and other simple procedures for time-to-event data.
January 2012
Added entry on help, search and the like to section "Basics". Also, I am now working with Stata 12, and I will try to make you aware of the most important changes.
December 2011
New entries on factor variables and on multilevel models (currently for metric dependent variables only). Some additions to entries on generating variables and on crosstabulation.
September 2011
Added a small piece on packages for formatting Stata output (e.g., for LaTex, HTML or Word) in the entry on output (section "Basics"). Created a new entry on estimation of confidence intervals (section "Data Analysis").
December 2010
Added a small entry on nonparametric tests.
November 2010
Slightly enlarged the section on crosstabulations to give a little bit more prominence to the tab2
command, and also to explain the firstonly
option.
October 2010
Added a section about cumulative density plots (for empirical variables) to the entry about basic charts.
September 2010
Slightly expanded the section on EDA to explain how to influence the display of stem-and-leaf displays. Added a few words about the fre
command to the entry about frequency tables.
August 2010
A small section with some basic commands for analyzing multiply imputed data sets with Stata 11 has been added. The entry on correlations has been slightly expanded.
June 2010
I have finally acquired Stata 11. I will try to accommodate this guide to any changes I encounter, but please have some patience. Presently, I notify readers of one major change, i.e. the fact that you may leave open the data window while proceeding with your work.
May 2010
This entry serves mainly to assure people that this is not a dead end. I did not much to improve or enlarge this guide, but every now and then, small changes occur and mistakes of language are corrected (or so I hope). – Near the end of May, I added a very small entry about life table analysis.
April 2009
Every now and then, minor amendments and extensions are being made.
March 2009
What you find here has grown since late 2005. Now, I think enough stuff has accumulated to put it on the world wide web. Please note that this version is far from being satisfactory. This does not mean that it contains wrong or useless stuff; rather, it is not very comprehensive. It's really mostly for beginners.
About the Author
This page is a process initiated and maintained by
Prof. Dr. Wolfgang Ludwig-Mayerhofer
Universität Siegen / University of Siegen
Philosophische Fakultät – Soziologie / Faculty of Arts and Humanities – Sociology
57068 Siegen
Homepage at the University of Siegen
Last update: 30 Mar 2024