Reading Data
Stata data sets
To open an already existing Stata system file (with extension ".dta"), the appropriate command is
use name-of-data-file
If you have been working with another data set that is still in memory, you have to write:
use name-of-data-file, clear
I may mention in passing that clear
can be used as a stand-alone command prior to use
,
Up to now, I have assumed that the data are in your working directory, which afaik is called "data" on a Windows PC. If the data set can be found somewhere else, you may write, for instance
use c:\mydirectory\mysubdirectory\name-of-data-file
where you have to fill in your directory and data set name. Another way is to change to the pertinent directory first and then to "use" the data file:
cd c:\mydirectory\mysubdirectory\
use name-of-data-file
If the directory path or the name of the data set contain at least one empty space (blank), they have to be placed within double quotation marks. Single quotation marks won't do the trick.
If you make frequent use of files from other directories, it may be helpful to define the paths to these directories as global macros.
Parts of a Stata data set
If you know from the outset that you need only parts of a data set, you may request Stata to limit the data to be loaded. "Limiting" the data may refer to the variables used and/or to the selection of a subsample of cases. Look at the following examples:
use var1 var17 var38 using name-of-data-file
will load only the three variables mentioned into your working memory.
use if id <= 1000 using name-of-data-file
will load only cases with a value less than or equal to 1000 in variable id.
Both types of command may be combined, such as in
use var1 var17 if id <= 1000 using name-of-data-file
Data from other statistics software
As of version 16, Stata can import data sets that have been created by the SPSS or SAS packages. Actually, SAS files could be read from version 12 onwards; for SPSS one or two ado files were available since about version 10.
Data from other statistical packages may be converted to Stata data sets with the help of Stat/Transfer or perhaps similar software I am not aware of. (There used to be a program called DBMS copy, but this has been defunct for quite some time afaik.)
IBM SPSS Statistics
Stata version 16 or higher
Stata's import
command, introduced in version 13 if memory serves, has been extended to include SPSS files as of version 16. Typical uses may be:
import spss name-of-data-file
import spss var1 var17 var38 using name-of-data-file // reads only var1, var17 and var38
Note that name-of-data-file
may include the name of a path to the directory in which the file is located. Note further that you may have to use the , clear
option, depending on whether or not another data set is currently open.
By default, this command assumes that the SPSS data are in .sav format and were created by SPSS version 16 or higher. Data in .zsav format (SPSS version 21 or higher) can be imported by adding the option , zsav
.
Earlier Stata versions
Two user-written procedures were (and still are) available for earlier versions: usespss
can still be obtained via net from http://radyakin.org/transfer/usespss/beta
(last time I checked this as mid-February 2024), and it seems to work with Stata 16, 64 bit version. Quite some time ago, possibly with Stata 12, I successfully used an earlier version of this procedure.
I never used importsav
, the other user-written procedure. To obtain more information, use Stata' search function: search importsav
. The procedure requires that you have a version of the R software installed on your computer. The help file says that it works with Stata version 10 or higher.
Note that if you have access to SPSS, you may save your file in Stata format and then use this version in Stata. However, it's a long time since I quit working with SPSS, so I can't reliably tell you something about the current state of things.
SAS
Stata version 16 or higher
As of version 16, Stata offers three commands to import differenty types of SAS file:
import sas name-of-data-file
import sasxport5 name-of-data-file
import sasxport8 name-of-data-file
The first command can read SAS files created by version 7 or higher (.sas7bdat). The other two commands can import what is called a SAS Transport file, created by SAS XPORT version 5 or SAS XPORT version 8, respectively.
Note the extension of these commands to read only a selection of variables as, e.g., in
import sas var1 var15 var30 using name-of-data-file
Plus, do not forget to add the option , clear
if another file is currently open.
Earlier Stata versions
With Stata versions 13 to 15, SAS "Transport" files could be read via the command
import sasxport5 name-of-data-file
In still earlier versions of Stata such data could be read with the fdause
command (the name of the command is derived from the fact that the US FDA requires this format). help fdause
should provide the necessary information.
Data from spreadsheets and ASCII (text) files
Importing Excel™-files (Stata 12 and higher)
import excel name-of-data-file, firstrow clear
will import the first sheet from file "name-of-data-file", assuming that the first row contains the variable names. If the data you wish to import are not in the first sheet, try adding the option sheet("name-of-sheet")
. There are other options as well; e.g., you might restrict import to some rows and columns.
Stata can also export to Excel™ files, but since as yet I did have no reason to try this, please find out for yourself how it works (help export
).
For earlier versions of Stata (that cannot read Excel data as such), you may consider that Excel can create so called ".csv" files, files with raw numbers / characters that are separated by delimiters (usually commas -- hence "csv" for "comma separated values). Look at the following paragraphs.
Text / ASCII files
Text or ASCII files can represent data by plain numbers or characters. They can come in different shapes:
Data separated by delimiters
Often (as in the .csv files mentioned above) the numbers/characters are separated by delimiters; by default, this delimiter is a comma (other separators are possible, the main caveat being that it is a symbol that cannot be part of a data value). So, a list of six variables from two individuals may look like this:
1, m, 35, 3700, 80000, 30
2, f, 25, 900, 0, 21
With Stata version 12 or higher, such data can be read as follows:
import delimited name-of-data-file, clear
See help import_delimited
or the Stata handbook for additional options.
For earlier versions of Stata, the insheet
command provides similar functionality:
insheet using name-of-data-file, c n clear
will read an ASCII file with comma separated values and names in the first line. Other options are:
t
for tab delimited data
delim("X")
for data delimited by X.
"X" may be exchanged by any other character.
The insheet
command is no longer official part of Stata (as of version 13), but it still works. Look for more information with help insheet
.
ASCII files in fixed or free format
Data need not necessarily be separated by a delimiter; a blank may be enough. We say that data are in fixed format if the data are aligned in a way that variables are stacked precisely above/beneath each other other, as in:
1 m 35 3700 80000 30
2 f 25 900 0 21
Such a format makes it easier to deal with missing values (they might even be represented simply by blanks, i.e. "nothing", even though I consider this bad practice).
In free format, the data may look just like those separated by a delimiter, but here the 'delimiter' is just a blank:
1 m 35 1700 80000 30
2 f 25 900 0 21
Here it is absolutely mandatory that each variable is represented in each single case by a value (number or character), at the least as a symbol for missing values. This format works because you inform Stata that there are (in our example) six variables per case (line), and thus Stata counts six consecutive data values as belonging to one case.
Current versions of Stata (i.e. version 16 or higher) use the infile
and the infix
command for such data. For more information, see help infile
or help infix
.
The older insheet
command mentioned above may also be used for free format data (or fixed format data, provided each variable is represented by some data value for all cases). For more information, see help insheet
.
© W. Ludwig-Mayerhofer, Stata Guide | Last update: 15 Feb 2024