This guide is supposed to work as a brief "online help" for Stata for Windows that makes specific use of the possibilities of the internet. Its aim is to provide an intermediate road to learning Stata that hopefully is especially convenient for newbies (even though not for absolute beginners). Throughout, it is assumed that users have already mastered the statistical procedures I am dealing with, as no explanations of these are given. All you can learn here is how to put things into practice with Stata.
Please note also that this guide does not introduce you in any thorough way to the fundamentals of working with Stata for Windows, e.g., how to install the program, what the different "windows" are, how to set up a data base, how exactly to execute commands from a do file, etc. Of course, much of it will be mentioned, but it won't be explained in any depth, as these are things that are quite tiresome to explain in writing and very easy to explain simply by demonstrating and rehearsing (and some trial and error). But when you have just developed a basic idea of how the program works, this guide hopefully may be of some help.
My heartfelt thanks to David Peplow who did a great job at re-designing this (and other) project(s).
How this guide works
The main goal of this guide is to give examples for the most common Stata procedures. Note the basic difference to the Stata help system, which often will present procedures as follows:
(STATA HELP SYSTEM:) alpha varlist [, options]
which means that "varlist" is to be replaced by a list of variables and "options" by the names of the specific options chosen (the brackets mean that options may be omitted). This guide will typically give an example -- instead of "varlist" you will find a list of variables, and you may also find one or several options that seem helpful to me, as in
(THIS GUIDE:) alpha trust1 trust2 trust3, i g(trust)
A note on different versions of Stata. As far as I could check, all of the examples I provide should work with Stata for Windows, version 14. It should also work with newer versions (currently, the latest version is 16), but new stuff from higher versions does enter this guide only slowly (whenever this occurs, I will try to indicate it appropriately). Most of the stuff will also work with older versions (say, version 10 or even older), but I cannot guarantee this.
How reliable is this guide? Well, apart from the odd typo, everything you find here will work, as I said before. The reason is quite simple: This guide arises from my own work; that is, it is primarily motivated by my own desire to put down what I have found out about Stata in order to retrieve it whenever needed. Publishing this stuff on the internet is just a way of sharing what I have learned. A possible downside is that what you find here may sometimes be incomplete or superficial, as an entry may have been written at a time when I had not yet a full understanding of the matter under consideration.
All in all, please bear in in mind: What you find here is the work of a single person who has many other duties to comply with. This guide is just a spin-off of the data analysis work I am doing and which is my main business – or rather should be, apart from dealing with the university administration, filling in forms, applying for this and that, commenting on the latest ideas of my department, my colleagues, my dean, other deans, the rector, the vice-rector, the vice-vice-rectors, and many others about how to render our university more up-to-date, designing new courses of study, preparing for meetings, going to meetings, thinking about the consequences of meetings, trying to figure out whether and when my university has already given me the money it has promised me, trying to figure out how much I will have to pay for my staff this year (last year, the administration started accounting in autumn, that is, after about three quarters of the money had already gone with me just guessing how much it might be), and so on. Therefore, there may be a lot that you will find wanting in this guide; this refers not only to content, but also to language (including typos) and design. Please accept my apologies.
Perhaps I will collect links at a later stage. For the time being, just use this page from Stata Corp. which provides links to helpful resources for learning Stata.
Note: Only 'major' changes (new keywords, sizable additions to entries that already exist) are reported here. I try to indicate the date of the most recent change at the bottom of each entry, but I am only moderately good at this. Particularly minor corrections of typos may thus go unnoticed. Note also that there may be a time lag (hopefully short) between the creation (or the changing) of a file and its upload.
Very little has happened for almost two years. Probably not much more is going to happen, as I'm retired now, with the consequence that I use Stata only occasionally and have little opportunity to learn something new which I could share.
Added an entry on winsorizing and trimming. Slightly enlarged the entry on output.
I am hoping to find time during the next weeks or months to check this guide for typos and other mistakes. No thorough changes are planned, even though I have some (as yet untested) ideas in mind how to improve this guide. Oh, and Stata 16 has been out for quite some time, but nothing of the new stuff is considered here.
This is just to show you that I'm still alive. But I almost never work on this (or any other) project, and only minor changes are made.
Minor additions to the entries about univariate and bivariate charts.
I've started a section "Elements of Programming". It is short, it's really only about "elements" and not about "real programming" and I do not plan do develop this much further. However, to create a section helps me (and others) not to overlook this topic.
I'm still alive, even though you may not have noticed. Very rarely, I make some additions or amendments. For instance, I just have added a paragraph or two on line plots to the entry about twoway charts, and the entry about "Lines, Symbols, etc." now actually has a few paragraphs on marker symbols. But yes, all of this is very minor.
Not much has happened since last spring. Stata version 15 is out since last summer, but as yet this guide refers to version 14 at best. Nevertheless, a few minor additions are being made every now and then.
Amendments are being made here and there. Most recently, I have added a new entry (at Basics) on accessing Stata results that are stored in memory. Also, the entry on confidence intervals with command ci has been updated to accommodate the changes introduced with version 14.
Re-design of this guide.
Added a few entries about the analysis of spatial data.
Expanded the entry on string variables.
Rearranged entries on graphs: Univariate and bivariate (twoway) graphs are now treated separately. The reason behind this is that the earlier entry had grown quite long; in other words, both entries now are somewhat enlarged (and updated) in comparison to previous versions.
The entry on overlaying graphs has been expanded. It's still very small, but previously it was not even a stub, just a dummy.
The information about multiple imputation has been expanded considerably. Most notably, whereas up to now there was only a brief entry about the analysis of multiply imputed data, now there is also a (somewhat larger) entry about the imputation step, including the preparatory steps.
Introduced a new section about estimation in the "data analysis" part. This section does not contain much new stuff (with the exception of a few estimation commands for univariate statistics), but I think that it helps to better focus the attention on estimation issues and the many possibilities Stata offers. To achieve this, the entries on using survey design information (the
svy command) and on multiple imputation have been moved to this new section, together with a few other entries.
Additionally, some other entries have been expanded or amended. This applies particularly to the section about data transformation, which has several new entries and has been rearranged.
April and May 2015
Minor updates, additions or corrections have been made to a few entries.
Added a few extensions to the
egen section of the entry on creating and modifying variables.
Re-arranged this guide, creating an extra section for graphs, as things started to get very complex and very long. But as yet there is little that is new contentwise. All I have added are a few remarks about combining graphs. Also, I added a few sentences about line patterns. I really hope that more is going to come.
Augmented the (previously extremely short) section about string variables in the entry on "data types" to accommodate for the changes introduced in version 13.0, transforming it into a what now might be called a very short section.
Just started an entry on probability distributions (to be found under the heading of "functions").
Yeah, I'm still alive. But currently I do not work much with Stata, and if do, I rarely learn something new – and so there is little I can add to this guide. But I just augmented the entry about Stata output by a section that deals with the question of how to obtain elements stored in Stata's memory after estimation (and some other) procedures.
You may also wish to learn that I have acquired, and now work with, Stata 13.1. But as yet, nothing that is new to that version has entered this guide, and I am also happy to say that the Stata folks have refrained from implementing drastic changes concerning the appearance or the basic handling of Stata. So I think there is nothing, or at least not much, I have to change in my little guide at the moment.
Added a section on (one specific type of) global macros, or "globals", to the entry "Working with Stata". Also reorganized and slightly expanded the section on graph options.
Enlarged what up to now was the entry on the multinomial logit to include models for binary and ordinal variables. Also, more material on the postestimation phase is included now.
Minor additions to entry about basic charts and graphs.
Expanded the entry about data and storage types to include changes of storage type. Added entries about constraints and about count data models.
Created an entry about data and storage types and an (as yet small) entry about collapsing and contracting datasets. Updated the entry on merging data to cover the new syntax available as of version 11.0 and added some information about appending datasets. Started an entry about parametric regression models for time-to-event data.
Enlarged the entry about creating and modifying data. Added some clarifications to the entry on missing values.
Enlarged the entry about life tables and other simple procedures for time-to-event data.
Added entry on help, search and the like to section "Basics". Also, I am now working with Stata 12, and I will try to make you aware of the most important changes.
New entries on factor variables and on multilevel models (currently for metric dependent variables only). Some additions to entries on generating variables and on crosstabulation.
Added a small piece on packages for formatting Stata output (e.g., for LaTex, HTML or Word) in the entry on output (section "Basics"). Created a new entry on estimation of confidence intervals (section "Data Analysis").
Added a small entry on nonparametric tests.
Slightly enlarged the section on crosstabulations to give a little bit more prominence to the
tab2 command, and also to explain the
Added a section about cumulative density plots (for empirical variables) to the entry about basic charts.
Slightly expanded the section on EDA to explain how to influence the display of stem-and-leaf displays. Added a few words about the
fre command to the entry about frequency tables.
A small section with some basic commands for analyzing multiply imputed data sets with Stata 11 has been added. The entry on correlations has been slightly expanded.
I have finally acquired Stata 11. I will try to accommodate this guide to any changes I encounter, but please have some patience. Presently, I notify readers of one major change, i.e. the fact that you may leave open the data window while proceeding with your work.
This entry serves mainly to assure people that this is not a dead end. I did not much to improve or enlarge this guide, but every now and then, small changes occur and mistakes of language are corrected (or so I hope). – Near the end of May, I added a very small entry about life table analysis.
Every now and then, minor amendments and extensions are being made.
What you find here has grown since late 2005. Now, I think enough stuff has accumulated to put it on the world wide web. Please note that this version is far from being satisfactory. This does not mean that it contains wrong or useless stuff; rather, it is not very comprehensive. It's really mostly for beginners.
This page is a process initiated and maintained by
Prof. Dr. Wolfgang Ludwig-Mayerhofer
Universität Siegen / University of Siegen
Philosophische Fakultät – Soziologie / Faculty of Arts and Humanities – Sociology
Last update: 28 Sep 2020