Reading Data

Stata data sets

If you want to use an already existing Stata system file (with extension ".dta"), the appropriate command is

use name-of-data-file

If you are already using another data set and want to replace it (either having saved it or not wanting to save it) with the new data set, you will write:

use name-of-data-file, clear

Up to now, I have assumed that the data are in your working directory, which normally is called "data" on a Windows PC. If the data set can be found somewhere else, you may write, for instance

use c:\mydirectory\mysubdirectory\name-of-data-file

where you have to fill in your directory and data set name. Another way is to change to the pertinent directory first and then to "use" the data file:

cd c:\mydirectory\mysubdirectory\

use name-of-data-file

Important note: If one or several of the directories or the name of the data set contain empty spaces (blanks), they have to be placed within double quotation marks. Single quotation marks won't do the trick.

Parts of a Stata data set

If you know from the outset that you need only parts of a data set, you may request Stata to limit the data to be loaded. "Limiting" the data may refer to the variables used and/or to the selection of a subsample of cases. Look at the following examples:

use var1 var17 var38 using name-of-data-file

will load only the three variables mentioned into your working memory.

use if id <= 1000 using name-of-data-file

will load only cases with a value less than or equal to 1000 in variable id.

Both types of command may be combined, such as in

use var1 var17 if id <= 1000 using name-of-data-file

Data from other statistics software

By default, Stata is not prepared to read data from other statistical packages. However, there are a number of workarounds. One of these is using software designed specifically to convert files from one format to another, such as Stat/Transfer or DBMS copy. But there are ways of doing without.

SPSS

If you have access to SPSS, you may save your file in Stata format and then use this version in Stata. However, with IBM-SPSS version 19 problems may occur with the file simply not being written. Note that with later versions (currently, this is version IBM SPSS 22) things seem to work well.

With Stata 12 or some earlier version, the easier way is to use the user-written routine usespss. If you have not yet installed it, type

findit usespss

and follow the directions given. Once usespss is installed, you can read your SPSS file with the following command:

usespss using spssfilename.sav, clear

Warning: At the time of this writing (May 2015), the procedure is not available for the 64 bit version of Stata 13.

SAS

Stata has a function to read data that come in SAS XPORT Transport format. See help fdause (the name of the command is derived from the fact that the US FDA requires this format).

Data from spreadsheets and other programs that can create ASCII files

Software like MicroSoft Excel™, but also many other programs, such as SPSS, can create so-called ASCII files that can be read by any software. But as of version 12, Stata can import Excel™-files directly.

Importing Excel™-files (Stata 12 and higher)

import excel name-of-data-file, firstrow clear

will import the first sheet from file "name-of-data-file", assuming that the first row contains the variable names. If the data you wish to import are not in the first sheet, try adding the option sheet("name-of-sheet"). There are other options as well; e.g., you might restrict import to some rows and colums.

Stata can also export to Excel™ files, but as yet I did have no reason to try this, so please find out for yourself how this works.

Earlier versions

An ASCII file contains the data (one case per line), a delimiter to separate data, possibly the names of the variables in the very first line of the file, but no formatting or other stuff. Such files can easily read by Stata.

insheet using name-of-data-file, c n clear

will read an ASCII file with comma separated values and names in the first line. Other options are:

t for tab delimited data
delim("X") for data delimited by X.

"X" may be exchanged by any other character.

ASCII files in free or fixed format

Stata can read ASCII files with no delimiters between colums. These may come in free format (here, columns have to be separated by spaces) or in fixed format (where for each column, i.e. variable, an exact position can be given). For more information, see help infile of help infix.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 22 May 2015