Spatial Data

Important note

In what follows, I describe some modules for spatial data that were available before Stata introduced their own sp commands for the analysis of spatial data in version 15.

As far as I can judge, the modules I describe here are still available and working. For information about the inbuilt sp commands, see Stata's help system (help sp will provide access to the introductory remarks and an overview of all available commands).

Preamble

Spatial analysis has roots in geography, spatial economics and related disciplines. As yet, it is considered a field for specialists and therefore is not a standard part of most statistical software. As far as stata is concerned, a number of user-written tools are available that can be downloaded either from the ssc server or from other sources.

The ado files considered in this and the next entries are:

`shp2dta`	from `ssc`
`sppack`	from `ssc`; contains `spmat`, `spreg` and `spivreg`
`spatwmat`	use `search spatwmat`
`spatgsa`, `spatlsa` and `spatcorr`	use `search ...`
`spatdiag` and `spatreg`	use `search ...`

Note that there are some other packages around which are not described here.

What is special about spatial data?

Spatial data can mean several things, but they all have in common that they are about entities that can be described by their location in space. Most social scientists think about area data, i.e. about regions, neighborhoods, districts. But actually, spatial data may also be about single points (locations of events or of objects – points are of course abstractions here).

For files of such data, there is a world-wide de-facto standard, coming from the ArcGiS software. This software provides a so-called shapefile, which may be read into Stata by procedure shp2dta. Another format is the MapInfo Interchange Format, and there is a procedue mif2dta that helps you deal with such data. In what follows, I will describe the more common case of shapefiles.

Before we start, note that a shapefile actually consists of several files, typically with the same name but different extensions. Three files are mandatory: mydata.shp which contains the coordinates, mydata.dbf which describes the objects, and mydata.shx with an index of the objects. There may also be mydata.prj which indicates the projection (or spatial reference) system used. Why we need such a multitude of files can best be understood in the case of area data: An area, such as a region, can be described by its boundaries, and these form a polygon, geometrically speaking. Now a polygon can be very simple, as in the case of a rectangle, or it can be very complex. Compare some of the US States, such as Colorado or Wyoming, whose boundaries form simple recangles, with others, such as West Virginia, the contours of which form a very irregular entity. Therefore, different numbers of space coordinates are required to describe such heterogeneous polygons.

Reading a shapefile

The minimum command is

shp2dta using name-of-shapefile, database(db-new-name) coordinates(co-new-name) genid(id-var)

This command reads a shape file called name-of-shapefile and writes its contents to two Stata files (with extension .dta), which I have called here (db-new-name) and (co-new-name). Obviously, the first contains the description of the objects (mostly variables that were measured at different locations), whereas the latter contains coordinates. id-var is the name of your choice for an id variable that will be created. Note that both Stata files will be saved to disk in your current working directory.

These new files can be treated like any other Stata file; you may rename variables, drop variables or cases, transform variables, or merge them with other files. Of course, you will want to be careful with such activities, unless you are sure about what you can and should do. The file with the coordinates should normally best be left untouched.