Internet Guide to Stata |
Print article |
After starting Stata, your display will show an overall Stata window consisting of several sub-windows. The exact set-up of these windows has changed several times during Stata's history.
By default, the center of your Stata screen is dominated by the Results window. Beneath, there is the – much smaller – Command window. To the left, there is the Review window that lists all the commands you have entered. To the right, you'll find the Variables window that provides a list of the variables in your current data set, and the Properties window, which gives information about the variable that is highlighted in the Variables window and about the data file.
Data are not shown by default. However, upon pressing Ctrl+8 (in German: Strg+8), a window will open showing some colums of your data set plus the Variables and the Properties windows. Note that in earlier versions, Ctrl+8 served to open the do file editor. This can now be accomplished by pressing Ctrl+9 (in German: Strg+9). (Of course, Stata users have longed for this change for years. They don't want to always have the same effect if they press a key, as this will force them to concentrate on their work. They are much happier if they are disrupted and have an opportunity to wonder what stupid things programmers are inclined to do.)
The most basic windows are the Command window, where you can enter, indeed, commands, and the Results window, where you'll find ... yep (even though I think this window better be called output window). Before version 11, data were typically not shown, or only for an interim period (see entry on Data).
By default, two other windows are shown that may be helpful: The window in the upper left corner, called Review window, lists the commands you have 'sent' to Stata; so you may have a look at the sequence of commands or may retrieve a command you have used earlier. (The latter effect can also be accomplished by using the "screen up" key from the command window.) The window in the lower left corner, aptly called Variables window, shows the variables in your data set (f nothing is shown, you have not yet opened any data.)
Though I started to work with Stata's version 8, earlier versions are not covered here. I think that in version 8 the Stata display was green characters on a black background, and many Stata addicts still use this setup. As to Stata commands, I think that most of what is covered by this guide will work with older versions; I will try to indicate exceptions to this rule.
The basic way of interaction with Stata is through commands that are written in (more or less) plain English and are 'submitted' to Stata. Stata also offers a menu, just like most other software that is available nowadays. However, the use of the menu is not explained here; I also advise against it. Statistics is done much more efficiently and in more transparent ways through using commands deployed in the way I just have outlined. (To do this you will have to use some of the icons, however; and for some general purpose stuff like for instance opening or saving files, icons or part of the menu system may be deployed as well. That is, you can't do without icons and menus entirely.)
It is obvious that you can enter commands in the Command window. Indeed, all of the commands you will find here can be used this way. However, if you do more than just having a casual glance at your data it will be advisable to collect your commands, as it were; even in the case of "just having a glance" this is advisable as in most cases you will like to return to some earlier work during one session. Even though you can get back to commands you have used earlier (see above), collecting your commands, that is putting them together in one or perhaps even several file(s), is often easier. Such files are called "do files" and will be explained shortly. But prior to that, some more words about Stata commands are in order.
Stata commands start with a keyword that indicates the specific procedure to be employed. As Stata is used for doing data analysis, often the names of one or several variables follow. An important notion is that of options: Options can be used to expand commands (that is, with their help Stata is informed that additional things are to be done that are not included in the default command), or to specify commands (that is, to inform Stata that things are to be done in a specific way), or both. That is, a command is a series of keyword(s), variables and options; sometimes, variables are linked by operators (such as the + or - sign, or stuff like "and", "or", "equal to"), particularly in computations.
Here is an example of a simple command line that contains the most common elements, with the exception of operators which are used only with specific types of command. The command starts with a keyword, as do all commands; options are typically placed at the end, that is after the keyword and a variable or a list of variables, and are separated from the rest by a comma. This example shows the command "alpha" which refers to a statistic frequently employed in item analysis, to wit, Cronbachs alpha. We start with a command that has no options:
alpha trust1 trust2 trust3 trust4 trust5
will basically compute and display the value of Cronbachs alpha for variables "trust1" to "trust5". The next line adds two options:
alpha trust1 trust2 trust3 trust4 trust5, i g(trust)
which will request Stata to display some additional statistics (option "i") and also to generate (hence "g", or "generate") a new variable, called "trust", by summing up the five items.
Typically, the number of options is small. In same cases, it may be rather large; for instance, graphs can be combined with a considerable number of options referring to scales, axes, labels and much more.
Stata commands consist of one or several keywords. Some of these keywords can be abbreviated, typically by entering only the first or the first two or three characters and omitting the rest. As you have seen above, the same goes for options.
In this guide, I will try to deal with this as follows: Upon first explanation of a command, the full keyword will be given; later on often the abbreviated version will be used. If there is a single occurrence, I will sometimes indicate the abbreviated command or option by square brackets, as in frac[tion]; this would mean that fraction can be abbreviated to frac. Options frequently will be introduced as abbrevations only.
Note that there may be quite a lot of exceptions to these rules. However, everything you see here will work with Stata.
In contrast to some other programs, Stata is case sensitive. More specifically
On the other hand, I have the impression that Stata does not bother for small or capital letters in the case of directory or file names; in other words, up to now I have not encountered problems with using whatever letters I liked, regardless of how the file or directory name appears in the file explorer.
In contrast to command keywords, which invariantly have to appear in small letters, you may occasionally come across an option to be written in capital letters. As far as I can see such options are very rare. That is, most options have to be written in small letters as well. Anyway, they have to be written exactly as required by Stata. I am pretty confident that this guide is quite accurate in this respect, but as a human being I am error prone and so occasional mistakes cannot be ruled out.
The best way to work with Stata's commands is to collect all your commands in a file or perhaps several files, depending on how complex your work is. In some other statistics software, such a file is called a syntax file. In Stata, it is termed a "do file".
To create a do file, Stata offers a special window called the do-file editor. You can open it by any of the following:
Now you can begin to write your commands.
Note that a command cannot stretch over more than one line; normally Stata interprets the "return" key (or "carriage return", to use a term from the typewriter age that is still in use) as a command delimiter. Accordingly, as you write a command the line will not wrapped around automatically when you reach the end of the screen; rather the window will be shifted to the left, which makes long commands difficult to read. There are two ways do deal with this problem:
Please be aware that in this guide you may see command lines wrapped around without /// and without a semicolon as command delimiter. Whether this occurs will depend on the size and the settings of your screen and your browser. I will try to use short command lines as much as possible, but this will not always be feasible. And after all, it won't be too realistic. Anyway, even what I think is a short line may be "long" in relation to your screen, and browsers should not be forced to extend lines beyond the screen width.
When you have finished, you can save your do file; this is done just as with any other software. When you restart your work, you can open the do-file editor again and use a previously saved file, again just like in the case of other software. Note that you might wish to save your do file repeatedly while at work, as there is no automatic saving procedure in the case of a computer crash. I may also note that in my (limited) experience Stata is running smoothly; in other words, it is rarely Stata itself that causes trouble. But other software you may be using concurrently may behave differently, or your computer simply my be prone to get messed up every now and then.
To execute a do file, i.e. to submit all commands it contains to Stata, click on the icon "do" (this should be the last icon on the right hand side). There is also an icon that is called "run"; here the commands will be executed, but you will see no output on the screen.
To run a part of a do file, you have to mark the required line(s) of your file. Now the icons mentioned before will "do" or "run" only the selected lines.
You can "do" or "run" your file or parts of it also via the menu "Tools" in the do-editor window.
A comment is any text you like that you write in a do file in order to remember later what it was all about (or in order to let others know). Stata of course will have to know whether what it encounters in a do file is intended as something to "do", that is, a command, or as a comment only.
* Here I will recode income into categories
is a single-line comment. Stata will ignore this single line. Of course, several lines starting with an asterisk may follow each other. A more convenient solution for longer comments is the following:
/* Here I will recode income into categories using the
categories proposed by Dr Özgul-Harnischfeger. Personally,
I think we should use fewer categories */
This comment stretches over several lines. It is important not to forget the concluding asterisk-cum-slash clause.
A further possibility is
recode income (1/300 = 1) (301/600 = 2) (601/1000 = 3) /* Dr Özgul-Harnischfeger's categories */
In other words, a comment may be appended immediately to a command.
You may also use double slashes for short comments
recode income (1/300 = 1) (301/600 = 2) (601/1000 = 3) // this works!
but this type of comment must not be extended to the next line. So, the earlier version combining slashes and asterisks is more flexible. Actually, the two slashes just tell Stata to ignore everything until the the end of the line, excluding the (invisible) "carriage return" (or "End of Line") character that informs Stata (just as any other software) that the end of the line has been reached.
"Global" and "local" are notions (and Stata commands) you will encounter in chapters on "Stata programming". However, they can be helpful for the work of 'normal' users as well. In the Stata handbook, they are described as "local macros" and "global macros", but in what follows I will use the abbrevations "global(s)" and "local(s)". This section will describe only "globals", and only one particular type.
A "global" is something that is stored in memory and can be used anytime during your Stata session by reference to the name of the global. The type of globals I am going to explain here are expressions, perhaps long and complicated, that you have to use repeatedly; here it might be helpful to store each expression to a global and to evoke the respective global whenever you have to use a specific expression. For instance, you may wish to use several files that are all stored in a certain directory on your computer. You might store the path to the directory in a global and use the global instead of the full pathname. This would work like this:
First, define the global giving it a name of your choice, e.g., "datadir":
global datadir "C:\mydirectory\mydata\GSOEP\2012\statafiles\"
Note that the quotation marks will not be stored to the global; they are only the delimiters Stata needs in order to know that a string is to be stored here.
Now, in a second step, whenever you address one of the files stored in the directory "C:\mydirectory\mydata\GSOEP\2012\statafiles\", you may use the global instead of writing down the full path. The following example refers to the (fictitious) file "householdmembers":
use "${datadir}householdmembers"
Note the braces that enclose "datadir". They make clear where the global ends and the ensuing text begins. If the global were to stand alone, the braces would not be needed. Note also the quotation marks that enclose the entire expression. These quotation marks are required only if the path or the name of the file contain (blank) spaces; but as they won't do any harm even in the absence of spaces, I consider it good practice to always use quotation marks when referring to paths and/or files.
Globals can make your work very much easier. Recently, I had to produce a set of histograms, which all were to look alike. Instead of adding a huge number of options again and again to each of the histogram commands, I simply stored the options in a global and added the global to each histogram. This went like this (using only very few options for illustration):
global histoption `"ysc(r(0 80)) xtitle("Health Status", size(medlarge))"'
hist var15, $histoption
Please note three things:
First, the definition of the global "histoption" is now enclosed in "single plus double" quotation marks. They are needed here because the global itself contains quotation marks (see the xtitle option). Be sure to use the correct single quotation marks: The accent grave, or gravis, at the beginning, and the apostrophe at the end.
Second, as in the second line the macro "histoption" stands alone, no curled braces are needed.
Third, I have used a very short list of options here, hoping that the list will fit on a single line on most screens. In fact, the list of options is not limited. But note: In defining macros, you cannot use the three slashes that inform Stata that the command is continued on the next line. In other words, the entire global must be written in a single line.
Normally, Stata pauses to display output after a certain number of lines and waits for any key to be pressed in order to continue.
set more off
suppresses pauses, thus allowing smooth running of Stata. Please note: If set more off is used within a do file, it will be valid only for the commands executed from this do file. To make it "permanent" (for your current session), you have to execute it from the command line.
The next command is not necessary anymore in "modern" versions of Stata (I think since version 12). But older versions of Stata reserve only a small amount of memory for the data which is not always sufficient; for instance, I encountered trouble with a data set with 1200 cases and, yes, a lot of variables (say, 1000; I never did count them). The storage type of variables also plays a role here. If you are working with an older version, you may therefore wish to increase this space to, say, 300 MB with the command
set mem 300m
You may add the option perm to make Stata allocate this amount regularly from the start. At any event, don't forget the "m" following the number indicating the required memory; otherwise Stata will interpret the number as kilobytes. For large memory requests, you may use the abbreviation "g" for gigabyte(s).
As you might have guessed, there is a number of further settings. To learn more about the current settings, just type query into the command window. And yes, help set will provide more information about how to change system parameters.
display sqrt(4)
will cause Stata to calculate the desired value (the square root of 4) and to display the result. (This section obviously is just a short reminder that the display command exists. You can use any mathematical function available in Stata.)
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 21 Jun 2015