Objects

Preamble

Nearly everything you will work with in R will be an 'object', and during an R session often quite a number objects will be available. Indeed, this is one of the most significant differences to conventional statistical software. When using the latter, you will open a data set, and henceforth all procedures will be applied to this data set. (With SPSS, several data sets can be opened, but you will still specify one of these as the current working data set.) In R, you can have several objects, and you can do anything you want to any of these objects (provided you apply procedures that fit the object – of course you can't compute the sum of a sequence of characters). The downside of this is that anytime you want to apply a statistical procedure to a (part of a) data set, you have to state explicitly the name of the object containing the data. (There is a workaround that consists in attaching the data set, but many people advise against it.)

The aim of this section, however, is to introduce the notion of objects in a more general way.

Basic ideas about objects

To store something to an object is more or less equivalent to assigning it to the object. For instance, you may wish to read a data set with the command read.csv("somedata.csv",header=TRUE), but this will only result in the contents of "somedata.csv" being written to the console; the data will not be available for analysis. To use the data for analysis, they must be assigned to an object, as in:

dt <- read.csv("somedata.csv",header=TRUE)

The object "dt" has to be referred to during the following steps, such as in

outp1 <- lm( depvar ~ indvar1 indvar2, data=dt)

Here, a regression model was estimated, based on the data contained in object 'dt'. The results of the model were stored in object 'outp1', and this object again may be accessed in the next steps, and so on. If the output is not assigned to an object, it will be nowhere to be seen! As a quite different example, you may just write gg <- 5, and henceforth an object named "gg" will be around that consists of the number 5 (typing 10/gg will yield 2!).

The gist of all this is that you may have several data sets open, you may have several results of quite different statistical procedures available, and in addition there may be many other objects around. You may also extract pieces from a data set and store these pieces to some object. You may create a vector of numbers and apply some procedures to this, or you may attach this vector to another vector, or indeed to a data set. R is extremely versatile in this respect. But at the same time, this makes R difficult for beginners.

As the objects you encounter may be, and often indeed are, very heterogeneous, it is very helpful (and indeed important) to know that the R people have tried to bring some order to this chaos. This has been accomplished by defining classes of objects, with objects belonging to the same class having some features in common.

Class

Objects may belong to different classes. For instance, some data you have entered may have the class matrix, but others may belong to the class data.frame. But as indicated above, classes extend far beyond what we may loosely call data. For instance, the output of many statistical procedures likewise belongs to a class.

The basic idea of defining classes is that certain functions, or "methods", may be associated with a given class. Also, generic methods like summary or plot do different things, depending on the class of an object. Thus, the summary of an object (see below) may look very different according to the class of the object. You may also try plot() on objects; it will do different things to different objects, depending on what the creator of the class had in mind.

There are several "systems" of defining classes, called "S3", "S4" and "R5". Classes in "S3" seem to be rather loosely defined; in "S4", they have a formal definition, which clearly states the components of a class. Thus, in "S4" there are no uncertainties about the the format of a given class. "R5" classes, officially called reference classes, or refclasses for short, are quite abstract things, and I gather that normal people need not understand what it's all about.

Which class?: Simply type class(name-of-object).

Basic information about an object:

str(name-of-object)

will display the basic structure of the object, which in the case of a formal class (defined according to the "S4" standard) refers to the components, or "slots", of the respective class (some classes, such as vectors, of course have a very simple structure consisting of a single component). Note that the objects that fit a "slot" may, on their part, belong to other classes (e.g., "matrix", "numeric", or whatever). This holds true particularly for lists or other complex objects that can combine very heterogeneous elements.

getClass("name-of-class")

will yield a description of the "slots" that make up the respective class, plus perhaps some additional information.

summary(name-of-object)

will do very different things depending on the class of the object. In the case of a matrix of numbers, it will produce summary statistics of variables (five point summary plus mean). In the case of a model output, it will yield the results of the model in a formatted way (and often more comprehensively than just typing the name of the object).

is.matrix(name-of-object)

yields TRUE or FALSE. This works with R's standard classes, such as vectors or data.frames.

Changing class

Changing the class of on object of course depends on a number of circumstances. Put simply, the object, of parts of it, must fit into the new class. For instance, no problems will occur if you change a matrix into a data.frame, as a data.frame can be described as a matrix that can accommodate different types of variables (see section Lists and Data Frames).

The method to change the class of an object is as.object-type(). Thus, changing a matrix to a data.frame is achieved by:

new-name <- as.data.frame(old-name)

In fact, new-name can be identical with old-name.

For illustration, here is another, more complicated example. A SpatialPolygonDataFrame is an object that delineates areas (mathematically described as polygons) in space, but also contains variables that measures properties of these areas (such as the percentage of poor people living in the area, or the average temperature in July, or whatever). Writing

new-data <- as.data.frame(name-of-spatial-polygons-data-frame)

will extract all variables describing properties and put them into the data.frame called new-data. Actually, these things are a bit more complicated (a SpatialPolygonsDataFrame object has several parts, and one of these is a data.frame part), but let this example suffice for the moment.

Type and mode

Less decisive than the class of an object, but still possibly important, are an object's type and its mode. It's not always very easy to distinguish these, and sometimes that class and the type of an object are the same, but a simple example may make clear what it's all about.

Say we have a matrix consisting of a few numbers, arranged in rows and columns (as is a preriquisite for matrices). But a matrix may also consist of characters (like "a", "b", etc.) or some other types of element.

The information on whether the matrix consists of numbers or of characters (or other elements) is provided by its type. So,

typeof(my-matrix)

will yield numeric if matrix my-matrix contains numbers, and character if it contains characters.

But perhaps you have heard that numbers can be stored in different ways. Numbers like 1, 10, or 51735, are called "integers", whereas numbers like 1.17, 10.73495434 or 51735.6 have decimals and therefore need more space, both in memory as when stored to the disk. More importantly, such numbers are computationally more difficult. Be it as it may, numeric information can be stored in different ways, and this is called "mode" in R.

Now, R typically will allocate much space, or "precision", even to simple numbers like 1 or 10. So, actually both series of numbers will have the same mode, which is "double" (for double precision). You can obtain this information with

mode(my-matrix)

with my-matrix again being the name of the matrix. In the case of a matrix (or other object) consisting of characters, the mode will by characters, just like the type.

Note that there are both other types and other modes, the most important for 'normal' users perhaps being logical.

Keeping track

A list of all objects in the current environment is provided by

ls()

If you think that that are too many objects around, you may remove some of them with

rm(object1, object2, object3)

or clear the entire working space with

rm(list=ls())