This guide is supposed to work as a brief "online help" for Stata for Windows that makes specific use of the possibilities of the internet. Its aim is to provide an intermediate road to learning Stata that hopefully is especially convenient for newbies (even though not for absolute beginners). Throughout, it is assumed that users have already mastered the statistical procedures I am dealing with, as no explanations of these are given. All you can learn here is how to put things into practice with Stata.
Please note also that this guide does not introduce you in any thorough way to the fundamentals of working with Stata for Windows, e.g., how to install the program, what the different "windows" are, how to set up a data base, how exactly to execute commands from a do file, etc. Of course, much of it will be mentioned, but it won't be explained in any depth, as these are things that are quite tiresome to explain in writing and very easy to explain simply by demonstrating and rehearsing (and some trial and error). But when you have just developed a basic idea of how the program works, this guide hopefully may be of some help.
My heartfelt thanks to David Peplow who did a great job at re-designing this (and other) project(s).
How this guide works
The main goal if this guide is to give examples for the most common Stata procedures. Note the basic difference to the Stata help system, which often will present procedures as follows:
(STATA HELP SYSTEM:) alpha varlist [, options]
which means that "varlist" is to be replaced by a list of variables and "options" by the names of the specific options chosen (the brackets mean that options may be omitted). This guide will typically give simply a list of variables and will also display immediately one or several options that seem helpful to me, as in
(THIS GUIDE:) alpha trust1 trust2 trust3, i g(trust)
A note on different versions of Stata. As far as I could check, all of the examples I provide should work with Stata for Windows, version 10. It should also work with higher versions (currently, the newest version is 14), but new stuff from higher versions does enter this guide only slowly (whenever this occurs, I will try to indicate it appropriately).
How reliable is this guide? Well, apart from the odd typo, everything you find here will work, as I said before. The reason is quite simple: This guide arises from my own work; that is, it is primarily motivated by my own desire to put down what I have found out about Stata in order to retrieve it whenever needed. Publishing this stuff on the internet is just a way of sharing what I have learned. However, very rarerly I speculate about things Stata might or might not do (for instance, when comparing it to the capabilities of other software). Typically, this means that I have sought for possibilities to do things in a certain way and was unable to find them. Of course, this does not necessarily mean that these possibilities are absent; rather, it may be well the case that I was only too stupid.
All in all, please consider the following: What you find here is the work of a single person who has many other duties to comply with. This guide is just a spin-off of the data analysis work I am doing and which is my main business – or rather should be, apart from dealing with the university administration, filling in forms, applying for this and that, commenting on the latest ideas of my department, my colleagues, my dean, other deans, the rector, the vice-rector, the vice-vice-rectors, and many others about how to render our university more up-to-date, designing new courses of study, preparing for meetings, going to meetings, thinking about the consequences of meetings, trying to figure out whether and when my university has already given me the money it has promised me, trying to figure out how much I will have to pay for my staff this year (last year, the administration started accounting in autumn, that is, after about three quarters of the money had already gone with me just guessing how much it might be), and so on. Therefore, there may be a lot that you will find wanting in this guide; this refers not only to content, but also to language (including typos) and design. Please accept my apologies.
Perhaps I will collect links at a later stage. For the time being, just use this page from Stata Corp. which provides links to helpful resources for learning Stata.
Note: Only 'major' changes (new keywords, sizable additions to entries that already exist) are reported here. I try to indicate the date of the most recent change at the bottom of each entry, but I am only moderately good at this. Particularly minor corrections of typos may thus go unnoticed. Note also that there may be a time lag (hopefully short) between the creation (or the changing) of a file and its upload.
Not much has happened since last spring. Stata version 15 is out since last summer, but as yet this guide refers to version 14 at best. Nevertheless, a few minor additions are being made every now and then.
Amendments are being made here and there. Most recently, I have added a new entry (at Basics) on accessing Stata results that are stored in memory. Also, the entry on confidence intervals with command ci has been updated to accommodate the changes introduced with version 14.
Re-design of this guide.
Added a few entries about the analysis of spatial data.
Expanded the entry on string variables.
Rearranged entries on graphs: Univariate and bivariate (twoway) graphs are now treated separately. The reason behind this is the earlier entry had grown quite long; in other words, both entries now are somewhat enlarged (and updated) in comparison to older versions.
The entry on overlaying graphs has been expanded. It's still very small, but previously it was not even a stub, just a dummy.
The information about multiple imputation has been expanded considerably. Most notably, whereas up to now there was only a brief entry about the analysis of multiply imputed data, now there is also a (somewhat larger) entry about the imputation step, including the preparatory steps.
Introduced a new section about estimation in the "data analysis" part. This section does not contain much new stuff (with the exception of a few estimation commands for univariate statistics), but I think that it helps to better focus the attention on estimation issues and the many possibilities Stata offers. To achieve this, the entries on using survey design information (the
svy command) and on multiple imputation have been moved to this new section, together with a few other entries.
Additionally, some other entries have been expanded or amended. This applies particularly to the section about data transformation, which has several new entries and has been rearranged.
April and May 2015
Minor updates, additions or corrections have been made to a few entries.
Added a few extensions to the
egen section of the entry on creating and modifying variables.
Re-arranged this guide, creating an extra section for graphs, as things started to get very complex and very long. But as yet there is little that is new contentwise. All I have added are a few remarks about combining graphs. Also, I added a few sentences about line patterns. I really hope that more is going to come.
Augmented the (previously extremely short) section about string variables in the entry on "data types" to accommodate for the changes introduced in version 13.0, transforming it into a what now might be called a very short section.
Just started an entry on probability distributions (to be found under the heading of "functions").
Yeah, I'm still alive. But currently I do not work much with Stata, and if do, I rarely learn something new – and so there is little I can add to this guide. But I just augmented the entry about Stata output by a section that deals with the question of how to obtain elements stored in Stata's memory after estimation (and some other) procedures.
You may also wish to learn that I have acquired, and now work with, Stata 13.1. But as yet, nothing that is new to that version has entered this guide, and I am also happy to say that the Stata folks have refrained from implementing drastic changes concerning the appearance or the basic handling of Stata. So I think there is nothing, or at least not much, I have to change in my little guide at the moment.
Added a section on (one specific type of) global macros, or "globals", to the entry "Working with Stata". Also reorganized and slightly expanded the section on graph options.
Enlarged what up to now was the entry on the multinomial logit to include models for binary and ordinal variables. Also, more material on the postestimation phase is included now.
Minor additions to entry about basic charts and graphs.
Expanded the entry about data and storage types to include changes of storage type. Added entries about constraints and about count data models.
Created an entry about data and storage types and an (as yet small) entry about collapsing and contracting datasets. Updated the entry on merging data to cover the new syntax available as of version 11.0 and added some information about appending datasets. Started an entry about parametric regression models for time-to-event data.
Enlarged the entry about creating and modifying data. Added some clarifications to the entry on missing values.
Enlarged the entry about life tables and other simple procedures for time-to-event data.
Added entry on help, search and the like to section "Basics". Also, I am now working with Stata 12, and I will try to make you aware of the most important changes.
New entries on factor variables and on multilevel models (currently for metric dependent variables only). Some additions to entries on generating variables and on crosstabulation.
Added a small piece on packages for formatting Stata output (e.g., for LaTex, HTML or Word) in the entry on output (section "Basics"). Created a new entry on estimation of confidence intervals (section "Data Analysis").
Added a small entry on nonparametric tests.
Slightly enlarged the section on crosstabulations to give a little bit more prominence to the
tab2 command, and also to explain the
Added a section about cumulative density plots (for empirical variables) to the entry about basic charts.
Slightly expanded the section on EDA to explain how to influence the display of stem-and-leaf displays. Added a few words about the
fre command to the entry about frequency tables.
A small section with some basic commands for analyzing multiply imputed data sets with Stata 11 has been added. The entry on correlations has been slightly expanded.
I have finally acquired Stata 11. I will try to accommodate this guide to any changes I encounter, but please have some patience. Presently, I notify readers of one major change, i.e. the fact that you may leave open the data window while proceeding with your work.
This entry serves mainly to assure people that this is not a dead end. I did not much to improve or enlarge this guide, but every now and then, small changes occur and mistakes of language are corrected (or so I hope). – Near the end of May, I added a very small entry about life table analysis.
Every now and then, minor amendments and extensions are being made.
What you find here has grown since late 2005. Now, I think enough stuff has accumulated to put it on the world wide web. Please note that this version is far from being satisfactory. This does not mean that it contains wrong or useless stuff; rather, it is not very comprehensive. It's really mostly for beginners.
This page is a process initiated and maintained by
Prof. Dr. Wolfgang Ludwig-Mayerhofer
Universität Siegen / University of Siegen
Philosophische Fakultät – Soziologie / Faculty of Arts and Humanities – Sociology
Last update: 31 May 2018