Working with SPSS via the syntax window

Working with SPSS implies to "submit" a series of commands to the program and to receive some output displaying either the desired results of your analysis or some error messages. (The latter of course usually implies that something went wrong; see below.)

These commands may be submitted via the SPSS menue system, but as I have explained in my introductory remarks, it is necessary to learn working with the syntax window. Of course everyone uses the menue all of the time for a quick look at the data or for trying something out, but any serious data analysis work requires to set up a syntax file with a series of commands. This section points out some implications of working with the syntax window.

0 Save your syntax file(s) frequently!

Like all computer programs, SPSS is bound to go haywire from time to time. (Of course, this depends on your hardware and software configuration and many other circumstances, including the size of your data set and especially the tasks you have created for SPSS). Even though what you have done thus far may not be lost entirely, the easiest way to deal with this problem is to save your syntax file at regular intervals. As far as I can see, SPSS has not cared to implement a function to do this automatically. Luckily, all you have to do is to click on the appropriate icon, once your syntax file has got a name.

In order to be able to retrieve your work in case you have lost a larger segment of your syntax file, please check the following:

Select "Edit" (in German: "Bearbeiten") from the menu system, and then "Options"
The pop-up window that opens should display a number of items, among which you find something like "Session Journal" (in German: "Sitzungs-Journal"). Be sure that the check-box "Record Syntax in Journal" or so (in German: "Syntax in Journal aufzeichnen") is ticked.
You may also give this journal a new name (and path) in order to retrieve it more easily.

This "journal" now will record all your steps automatically, and you can retrieve them after a breakdown. Note that warnings and other messages may be also contained in the journal, so it is always easier to have your complete syntax file at hand.

1 Commands and Subcommands

Some SPSS commands consist of one or two keywords only, followed by the specifications of what exactly has to be done. For instance, the command

COMPUTE income = rawincom - tax.

has one keyword, COMPUTE, telling SPSS that a computation is to be performed; in this example, for all cases in the data file, the value of variable "tax" is subtracted from the value of "rawincom", and the result is stored in variable "income" (which may be an existing variable whose values will be overwritten by the freshly computed values, or a new variable; note that SPSS will not warn you when you attempt to overwrite an existing variable.)

In most commands, especially those for data analysis, the initial keyword(s) is/are followed by several subcommands that also contain keywords. These subcommands usually are separated by a slash. It is very good practice to start a new line with each subcommand and to put the slash on the beginning of the new line. This will enable you to see very easily whether you have forgotten a slash. It is also very strongly recommended that each line following the initial line with the command keyword and other specifications be indented. Not only your syntax will be readable much better (for yourself!); this way of setting up your syntax file is also mandatory in larger work when you have several syntax files which are addressed by the INCLUDE command (which I will explain below). Thus, a typical command with subcommand will look like this:

LOGISTIC REGRESSION a10
	/METHOD=ENTER a132 a15 159 a15*a159 a16
	/CONTRAST (a16)=INDICATOR(2)
	/SAVE COOK.

Throughout in this introduction, I will write the SPSS commands and subcommands in capital letters to distinguish them from variable names, values and the like. This is not necessary when actually writing your syntax file. Still, some users find it helpful to adhere to this rule. Also, throughout I have tried to keep to the rule of indenting all lines of multiple line commands save the first one. However, depending on your screen and your browser setup, this may not have worked out in all circumstances.

2 Command terminator

Each SPSS command has to start on a new line, and it has to be terminated by a period. Omission of the period (or insertion of a period at the end of a line which is not the last line of a command) will often cause SPSS errors, i.e. abortion of your current run. A much greater peril is to omit the period at the end of a (longer) comment (see next paragraph). In this case, SPSS will interpret the next command after the comment as continuation of the comment and thus will not execute this command. However, it will continue processing further commands. Sometimes the one command that is not executed may be vital to what happens afterwards, so be sure to see carefully to it that all comments are closed by a period. (On the other hand, in longer comments it may happen that a period is at the end of a line simply because a sentence ends here. In this case, the next line of your comment will be interpreted as a command by SPSS, thus causing an error message. But you will easily get used to avoiding periods at the end of a line save the last one.)

3 Comments

Comments are "memos" written into the syntax file to remind you what a specific command, or, more frequently, a group of commands, is meant to do. You are advised to use comments frequently when you are engaged in larger projects (meaning anything that goes beyond, say, 30 lines of SPSS syntax). As long as you are in the process of writing your syntax, it will clear to you what you want to do (or so I hope), but things may look very different after a couple of days, weeks, or months. Comments have to begin with an asterisk; thus, a typical comment may look like this:

* The next section recodes respondents' and respondents' partners'
occupation according to John Goldthorpe's class scheme.

Remember that if your comment extends over more than one line, all lines after the first line should be indented.

Short comments can be added anywhere (even in the middle of a command) as follows:

SURV TABLE = dauer BY bildgr (0,3)
  BY sex (2) /* for both sexes, use (1,2)
  / INTERVAL THRU 50 BY 1
  / STATUS = ziel (2)
  / PLOT (SURV) = dauer BY bildgr.

Everything following the sign /* will be ignored by the command interpreter.

4 Abbreviation of commands

Most SPSS commands can be abbreviated. Usually, SPSS interprets only the first three characters of a command, which is very helpful because misspellings after the first three characters do not result in abortion of the program run. I employ abbreviations frequently in this guide. For instance, VAR LAB is an abbreviation for VARIABLE LABEL. Note, however, that this does not always work. For instance, for some strange reason one of the most frequent commands, COMPUTE, cannot be abbreviated as COM; doing so will result in abortion of the program run.

5 Different versions of commands

SPSS commands permit some, but not too much, variability. For instance, many commands contain a VARIABLE keyword which often is not necessary. Thus, in the EXAMINE command, you may write

EXAMINE variable = var184.

EXAMINE var184.

However, while in the first version you may omit the equals sign, inserting an equals sign in the second version will lead to an error message and abortion of the program.

6 Variable names

SPSS uses the standard set-up for a data matrix: Each row in the data set contains different variables for a single case (depending on what constitutes a "case", cases my extend over more than one row, but we need not discuss this here), and the different variables are represented by different columns of the data set. Now, to address these columns SPSS uses variable names (a different possibility might be to use numbers, such as c1 oder #1 or whatever for the first column, and so on). A few hints about names may be helpful.

In earlier versions, variable names could have 8 characters. More recent version of SPSS permit variable names to have 64 bytes which in most Western languages should mean 64 characters. I don't know what exactly "recent" means, but the average student now working at a university computer pool most likely will encounter a "recent" version. I am quite sure that versions 12.0 or higher will cause no trouble at all.
I still recommend to use short names, as you might wish to use the "variable label" command to be more explicit about the content of your variables (see section on handling data files). Long variable names may cause trouble when exchanging data, especially when converting them to other data formats. Also, the window with your data spreadsheet may look pretty awkward if you use long variable names. And finally, the style work of work advocated here (i.e. working with syntax files) becomes very tedious with long variable names.
It is a good rule to start variable names with a letter and to use no other characters than letters and numbers, with one exception. Even though other special characters are permitted, I encourage readers to use the underscore sign ("_") only. Var_1, Var_2, Var_3 and so on may help you to read your variable names faster than Var1, Var2, Var3.
Think about these rules when entering your data in a different program. For instance, with Microsoft Excel, you might enter whatever you like in the first row of your spreadsheet and ask SPSS to use this as variable names when reading in the data. Even though I have seen SPSS behaving very bravely with respect to strange variable names, it is difficult to judge in advance what SPSS will accept and what not. Beginners therefore should exert some caution.

7 Abbreviations of variable and value lists

In most data analysis commands, variables are addressed; sometimes it is also necessary to list a number of (possible) values of a variable. Lists of variables and of data values can be abbreviated.

Lists of variables like

var1 var2 var3 var4 var5

can be abbreviated as

var1 TO var5.

Caution: The "TO" keyword means that in addition to both variables mentioned explicitly, all variables "between" these variables will enter the analysis. "Between" refers to the "physical" setup of the data as they appear in the data window. Thus, if your data set contains a series of variables as follows: var1 vara varb varc vard vare var2 var3 var4 var5, writing var1 TO var5 will also address variables vara varb varc vard vare.

Value lists like

1 2 3 4 5

can be abbreviated as

1 THRU 5

which may be useful especially in the RECODE command. In addition, keywords LO and HI can help you further, such as in LO THRU 5. This may be helpful if you do not know exactly the lowest value of this specific variable. When using these abbreviations, be sure that the range of values does not comprise missing values that should be excluded from the data transformation or your analysis. SPSS will include these in the list of values to be modified even if they are defined as missing values.

8 Error messages

If SPSS encounters characters, words or anything else it cannot interpret, it will abort processing and will issue one or several error message(s). It is not uncommon that you will not be able to make sense of them. This is because for a computer program it is very difficult to recognize what actually is wrong with your syntax. A series of error messages does not always mean that you have committed several errors. Many error messages arise if one faulty command is the preriquisite for the following commands. Take, for instance, the case that a variable is defined with one command, and this variable thereafter is addressed by other commands. If the command to define this variable contains an error, all the subsequent commands will also result in an error message. It is therefore paramount that you start amending your syntax with regard to the first error message. Misspellings of variable names are among the most common sources of error messages.

9 Commands that are executed immediately vs. commands that need other (ensuing) commands in order to be executed

SPSS commands for data analysis usually are executed immediately when they are "submitted" to the SPSS processor. A number of other commands, mostly those that transform data, or that merge several data sets, are executed only when they are followed by a command of the 'immediately executable' type . However, sometimes you will not want to have such a command; you may wish to compute a new variable, or to recode an existing one, and to have a look at the result before proceeding with your analysis. For this purpose, the EXECUTE command (abbreviated as EXE) is very helpful, as it will result in immediate processing of those commands that by themselves would have to wait for a data analysis procedure in order to be executed.

10 Accessing syntax files via the INCLUDE command

In larger projects, you will often wish to have several syntax files simply because having all your syntax in one single file will result in a very large file in which you may easily get lost. It is not necessary, however, to load one syntax file after the other into the syntax window; rather you may address them with the INCLUDE command. Let's suppose all your data preparation work is distributed over three syntax files; you then can insert the following lines in your present syntax window and proceed with your analysis:

INCLUDE 'd:\mydirectory\mysubdirectory\syntax1.sps'.
INCLUDE 'd:\mydirectory\mysubdirectory\syntax2.sps'.
INCLUDE 'd:\mydirectory\mysubdirectory\syntax3.sps'.

Of course, all directory, subdirectory and syntax file names should be replaced by the appropriate names.

When you INCLUDE syntax files, it is mandatory that in commands that stretch over more than one line, all lines save the first one have to be indented, whereas the first line of each command has to start on the beginning of a line, that is in the "first column" of that line. Some SPSS information says that the lines in such files may not be longer than 80 characters, but at least Version 10 can cope with longer lines. Yet, long lines are difficult to read anyway, and therefore it is advisable to get accostumed to stay within the 80 characters limit.