Working with Stata

The Stata starting screen

After starting Stata, the display will show an overall Stata window consisting of several sub-windows. The exact set-up of these windows has changed several times during Stata's history.

Version 12 and higher

By default, the center of your Stata screen is dominated by the Results window. Beneath, there is the – much smaller – Command window. To the left, there is the Review window that lists all the commands that were entered. To the right, you'll find the Variables window that provides a list of the variables in the current data set, and the Properties window, which gives information about the variable that is highlighted in the Variables window and about the data file.

Stata screen with main, command, review, variables, and properties windows

Data are not shown by default. However, upon pressing Ctrl+8 (in German: Strg+8), a window will open showing some colums of the data set plus the Variables and the Properties windows. Note that in earlier versions, Ctrl+8 served to open the do file editor. This can now be accomplished by pressing Ctrl+9 (in German: Strg+9).

Earlier versions

I started to work with Stata's version 8, and earlier versions are not covered here. In version 8 the Stata display was green characters on a black background, and many Stata addicts still use this setup. As of version 9, the default setting is black and white. Also, the composition of the screen was somewhat different in earlier versions, as were the commands required to open a do file or the data window.

Stata commands are also changing constantly, if slowly. While I am an active Stata user and therefore chances are that I notice changes that are introduced in newer versions, I cannot promise that everything that's new is translated immediately into this guide. Overall, I think that most of what is covered by this guide refers to version 11 or 12, with newer versions taken into account where possible. But much stuff will work with older versions (I will try to indicate exceptions to this rule).

You should be aware that older commands, while deprecated, never are completely dropped from Stata. However, if you wish to use a command from an older version, often you will have to inform Stata about this by inserting a command that indicates which version is to be used in what follows; this line is simply version XX, with XX to be replaced by the number of the version you are using. Therefore, once you have completed a do file, it is advisable that you insert a line to this effect on the very start of the do file; this will ensure that the do file will still run in a couple of years, even if you will be using a newer version of Stata by then.


Submitting commands

The menu system

Like all other modern software, Stata offers a menu system -- that is, you do some clicking, occasionally mixed up with some writing. This might look like this:

Stata screen with a pop-up window

However, working with the menu system is not supported by this guide.

Commands entered via the command window

The basic way of interacting with Stata that is proposed here is through commands that are written in (more or less) plain English and are 'submitted' to Stata. One way to enter commands is to use the Command window, typing a command (as can be seen here at the bottom) and pressing ENTER:

Stata screen with command in command window highlighted

Indeed, all of the commands you will find here can be used this way. However, if you aim at more than just having a casual glance at the data it will be advisable to collect the commands, as it were, that is, to put them together in one or several files. Such files are called "do files".

Commands entered via a do file

The best way to work with Stata's commands is to collect all your commands in a file or perhaps several files, depending on how complex your work is. In some other statistics software, such a file may be called a "syntax file" or a "script". In Stata, it is termed a "do file".

To create a do file, Stata offers a special window called the do-file editor. You can open it by any of the following:

  • Enter "doedit" (abbr.: "doed") in the command window and press the ENTER key
  • Click on the appropriate icon (moving your mouse over the icons will tell you which one to use)
  • In the menu, click on "Window" and then on "Do-file Editor"
  • (Until version 11:) Press Ctrl+8 (on German keyboards, "Ctrl" is represented by "Strg")
  • (As of version 12:) Press Ctrl+9 (on German keyboards, "Ctrl" is represented by "Strg")

Now you can begin to write your commands. A short do file might look like this:

Stata do file

Notice the color coding: Stata keywords are in blue, most of the remainder of commands is in black, and file or path names are in red. Everything that's green is a comment (see below).

You may run an entire do file by pressing Ctrl + d (in German: Strg + d). But you may also run just a part of a do file. To do this, highlight everything from the first to the last line of the section you wish to run:

Stata do file

As you can see, the first and last lines need not be highlighted in their entirety; you may start and end just anywhere.


More about commands

Stata commands start with a keyword that indicates the specific procedure to be employed. As Stata is used for doing data analysis, often the names of one or several variables follow. An important notion is that of options: Options can be used to expand commands (that is, with their help Stata is informed that additional things are to be done that are not included in the default command), or to specify commands (that is, to inform Stata that things are to be done in a specific way), or both. That is, a command is a series of keyword(s), variables and options; sometimes, variables are linked by operators (such as the + or - sign, or stuff like "and", "or", "equal to"), particularly in computations.

Abbreviating commands and options

Many of the keywords used in commands can be abbreviated, typically by entering only the first or the first two or three characters and omitting the rest. The same goes for options.

In this guide, I will try to deal with this as follows: Upon first explanation of a command, the full keyword will be given; later on often the abbreviated version will be used. If there is a single occurrence, I will sometimes indicate the abbreviated command or option by square brackets, as in frac[tion]; this would mean that fraction can be abbreviated to frac. Options frequently will be introduced as abbrevations only.

Note that there may be quite a lot of exceptions to these rules. However, everything you see here will work with Stata.

Stata is case sensitive!

In contrast to some other programs, Stata is case sensitive. More specifically,

  1. commands (keywords) have to be written in small letters,
  2. variable names have to be used in commands exactly as they appear on the variables list.

On the other hand, I have the impression that Stata does not bother about small or capital letters in the case of directory or file names; in other words, up to now I have not encountered problems with using whatever letters I liked, regardless of how the file or directory name appears in the file explorer.

In contrast to command keywords, which invariantly have to appear in small letters, you may occasionally come across an option to be written in capital letters. As far as I can see such options are very rare. That is, most options have to be written in small letters as well. Anyway, they have to be written exactly as required by Stata. I am pretty confident that this guide is quite accurate in this respect, but as a human being I am error prone and so occasional mistakes cannot be ruled out.

(Very) long commands

Note that a command cannot stretch over more than one line; normally Stata interprets the "return" key (or "carriage return", to use a term from the typewriter age that is still in use) as a command delimiter. Accordingly, as you write a command the line will not wrapped around automatically when you reach the end of the screen; rather the window will be shifted to the left, which makes long commands difficult to read. There are two ways do deal with this problem:

  1. You may end a line with three slashes preceded by a blank, i.e. ///, and continue the command on the next line; this can be repeated ad libitum. Stata will combine all lines 'connected' via /// into a single command (the last line will not end with slashes!). Please note:The blank before the three dashes is really important! Omitting it will produce an error and thus cause abortion of your current Stata run.
  2. You may switch to using the semicolon (";") as a command delimiter as follows:
    • Write a line consisting of #delimit ;
    • Now commands may continue over as many lines as you want, with Stata interpreting the occurence of a semicolon as the end of the command
    • If you want to switch back to 'normal' mode, write a new line with the command #delimit cr (no semicolon required at the end!)

Please be aware that in this guide you may see command lines wrapped around without /// and without a semicolon as command delimiter. Whether this occurs will depend on the size and the settings of your screen and your browser. I will try to use short command lines as much as possible, but this will not always be feasible. And after all, it won't be too realistic. Anyway, even what I think is a short line may be "long" in relation to your screen, and browsers should not be forced to extend lines beyond the screen width.

When you have finished, you can save your do file; this is done just as with any other software. When you restart your work, you can open the do-file editor again and use a file you have saved previously, again just like in the case of other software. Note that you might wish to save your do file repeatedly while at work, as there is no automatic saving procedure in the case of a computer crash. I may also note that in my (limited) experience Stata is running smoothly; in other words, it is rarely Stata itself that causes trouble. But other software you may be using concurrently may behave differently, or your computer simply my be prone to get messed up every now and then.

Comments

A comment is any text you like that you write in a do file in order to remember later what it was all about (or in order to let others know). Stata of course will have to know whether what it encounters in a do file is intended as something to "do", that is, a command, or as a comment only.

* Here I will recode income into categories

is a single-line comment. Stata will ignore this single line. Of course, several lines starting with an asterisk may follow each other. A more convenient solution for longer comments is the following:

/* Here I will recode income into categories using the
categories proposed by Dr Özgul-Harnischfeger. Personally,
I think we should use fewer categories */

This comment stretches over several lines. It is important not to forget the concluding asterisk-cum-slash clause.

A further possibility is

recode income (1/300 = 1) (301/600 = 2) (601/1000 = 3) /* Dr Özgul-Harnischfeger's categories */

In other words, a comment may be appended immediately to a command.

You may also use double slashes for short comments

recode income (1/300 = 1) (301/600 = 2) (601/1000 = 3) // this works!

but this type of comment must not be extended to the next line. So, the earlier version combining slashes and asterisks is more flexible. Actually, the two slashes just tell Stata to ignore everything until the the end of the line, excluding the (invisible) "carriage return" (or "End of Line") character that informs Stata (just as any other software) that the end of the line has been reached.


Globals, locals

"Global" and "local" are notions (and Stata commands) you will encounter in chapters on "Stata programming". However, they can be helpful for the work of 'normal' users as well. In the Stata handbook, they are described as "local macros" and "global macros", but in what follows I will use the abbrevations "global(s)" and "local(s)". This section will describe only "globals", and only one particular type.

A "global" is something that is stored in memory and can be used anytime during your Stata session by reference to the name of the global. The type of globals I am going to explain here are expressions, perhaps long and complicated, that you have to use repeatedly; here it might be helpful to store each expression to a global and to evoke the respective global whenever you have to use a specific expression. For instance, you may wish to use several files that are all stored in a certain directory on your computer. You might store the path to the directory in a global and use the global instead of the full pathname. This would work like this:

First, define the global giving it a name of your choice, e.g., "datadir":

global datadir "C:\mydirectory\mydata\GSOEP\2012\statafiles\"

Note that the quotation marks will not be stored to the global; they are only the delimiters Stata needs in order to know that a string is to be stored here.

Now, in a second step, whenever you address one of the files stored in the directory "C:\mydirectory\mydata\GSOEP\2012\statafiles\", you may use the global instead of writing down the full path. The following example refers to the (fictitious) file "householdmembers":

use "${datadir}householdmembers"

Note the braces that enclose "datadir". They make clear where the global ends and the ensuing text begins. If the global were to stand alone, the braces would not be needed. Note also the quotation marks that enclose the entire expression. These quotation marks are required only if the path or the name of the file contain (blank) spaces; but as they won't do any harm even in the absence of spaces, I consider it good practice to always use quotation marks when referring to paths and/or files.

Globals can make your work very much easier. Recently, I had to produce a set of histograms, which all were to look alike. Instead of adding a huge number of options again and again to each of the histogram commands, I simply stored the options in a global and added the global to each histogram. This went like this (using only very few options for illustration):

global histoption `"ysc(r(0 80)) xtitle("Health Status", size(medlarge))"'

hist var15, $histoption

Please note three things:

First, the definition of the global "histoption" is now enclosed in "single plus double" quotation marks. They are needed here because the global itself contains quotation marks (see the xtitle option). Be sure to use the correct single quotation marks: The accent grave, or gravis, at the beginning, and the apostrophe at the end.

Second, as in the second line the macro "histoption" stands alone, no curled braces are needed.

Third, I have used a very short list of options here, hoping that the list will fit on a single line on most screens. In fact, the list of options is not limited. But note: In defining macros, you cannot use the three slashes that inform Stata that the command is continued on the next line. In other words, the entire global must be written in a single line.


Set more off, set mem 300m

Normally, Stata pauses to display output after a certain number of lines and waits for any key to be pressed in order to continue.

set more off

suppresses pauses, thus allowing smooth running of Stata. Please note: If set more off is used within a do file, it will be valid only for the commands executed from this do file. To make it "permanent" (for your current session), you have to execute it from the command line.

The next command is not necessary anymore in "modern" versions of Stata (I think since version 12). But older versions of Stata reserve only a small amount of memory for the data which is not always sufficient; for instance, I encountered trouble with a data set with 1200 cases and, yes, a lot of variables (say, 1000; I never did count them). The storage type of variables also plays a role here. If you are working with an older version, you may therefore wish to increase this space to, say, 300 MB with the command

set mem 300m

The option perm may be added to make Stata allocate this amount regularly from the start. At any event, don't forget the "m" following the number indicating the required memory; otherwise Stata will interpret the number as kilobytes. For large memory requests, you may use the abbreviation "g" for gigabyte(s).

As you might have guessed, there is a number of further settings. To learn more about the current settings, just type query into the command window. And yes, help set will provide more information about how to change system parameters.


Using Stata as a calculator

display sqrt(4)

will cause Stata to calculate the desired value (the square root of 4) and to display the result. The short version is dis(This section obviously is just a short reminder that the display command exists. You can use any mathematical function available in Stata.)

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 12 Jun 2017