Data Files

Data consist of cases and variables that are set up in a matrix. That is, you have several cases which are the units of your analysis, and each case has one or several variables attached. Cases may be anything you like -- persons, countries, geographical units (cities, regions, lakes), or any other entity (trees, birds, observations from chemical experiments, temperature measurements ... really anything). For each case, there is at least one observation; usually, the same information is obtained for all cases (even though especially in the social sciences, "missing values", i.e. missing information due to lack of respondent information, errors in data processing etc., are pervasive).

A data file, then, is a rectangular matrix in which usually each case consists of one row, and the variables form the column of the matrix. Variables have to be named; the cases do not have "names", but of course there should be one or even more variables that "identify" each case (for instance, an identification number that refers to the questionnaire from which the data were entered). Variable names in SPSS may have at most 8 characters; the first character has to be a letter from the (English) alphabet. The ensuing characters may also be numbers; most special characters are not permitted, with the exception of the underline sign "_" and the dollar sign "$". CAUTION: Some languages, such as German, Danish or Swedish, contain "Umlaute". These have to be avoided in variable names (and in names of data files), at all events.

How do data become available for analysis as a matrix in the data window? There is a number of possibilities. First, data may be entered directly into the SPSS data window. In order to do this, you have to define the variables first. This is possible only interactively, and thus it is hard to explain whereas it is easy to demonstrate. Some of the sites referred to in the "links" section of my introduction to this guide show how this works. Second, data may already be stored in a SPSS data file, either because you have saved the data that were entered in the data window, or because you have access to someone else's data. SPSS data files can be accessed with the GET FILE command. Third, SPSS can read data from a number of other programs that can be used to store data, such as spreadsheets or data bases; most notably, SPSS can read Excel and dBase files. Fourth, sometimes data are present in a so-called "ASCII file" (or "text file"), i.e. a file that is not specific to a certain program, but rather consists merely of numbers and/or characters; see the Read Raw Data chapter in section "Handling Data Files". Fifth, there is a special SPSS export-import format that permits to transfer data sets from one operating system to another and can also be read by some other statistical packages.

If you have data in a format SPSS cannot read, you may have access to a data conversion utility such as DBMSCOPY. This program can convert data from a large number of formats into SPSS (and vice versa). Presumably, other programs exist that do the same job.

Note that with SPSS, you have access to one single data file at a given moment; this data file is called "working file". Often data come in several data sets; in order to analyse from different data sets, you first have to merge two or more files. This is explained in the "Merging Files" chapter in section "Handling Files".

© W. Ludwig-Mayerhofer, IGSW | Last update: 28 April 2002