Lists and Data Frames

Lists

Lists differ from vectors and matrices in two ways:

The may mix elements of different modes.
They may be considered as consisting of rows and colums, but not all rows or all columns need to have the same number of elements. In fact, it's better to abandon the notion of rows and colums here and talk simple about components.

So, lists may be similar to a storeroom where you may find lots of different items, with the most amazing thing being that all these heterogeneous items can be found within the same boundaries. Think about several racks in your storeroom, some large, some small, with different kinds of objects stored to different shelves. A list can contain objects of all sorts, including lists! Note that in the case of a list, the things it consists of are called components.

Most users aiming at doing statistical analyses on a single data set will not encounter such lists too frequently, and so for the time being I will not say much more about them. Yet, if you encounter a list you have to know that it is a legitimate R object.

Plus, there is a special type of list which is very important for such users, and this is the data frame to be introduced in the next section, with more information in the ensuing entries. A data frame is a list, but it is so special that it has been given a class of its own, very appropriately named data.frame. Luckily, data frames are very regular lists.

Extracting elements ('components') from a list

Just in case you have to deal with a list proper, here's some basic information about how to come to terms with it. The first thing, of course will be to investigate its structure with, say, str(this.stupid.list). Now, a list as it stands often is not very useful, but you may wish to extract components from it. As I understand it, even this is not always an easy topic. However, you can go a long way with the two commands that follow.

xl.1 <- this.stupid.list[1]

This will extract the first component as an object of class list. If this component is, e.g., a matrix, you will get a list with a single component, which is this matrix.

xl.1 <- this.stupid.list[[1]]

This will extract the content of the first component. If this component is, e.g., a matrix, you will get a matrix.

Data frames

First, for clarity's sake let me repeat once more: For R, "they [to wit, data] come in all shapes and sizes". In other words, a piece of data may be available as a vector, as a matrix, as a list, or as part of a special class such as SpatialsPolygonDataFrame. All of this can analyzed statistically, as long as the procedure you wish to apply is appropriate (of course, you cannot apply multiple regession to a vector). But in many cases, data will be available as a data.frame. The main reason is that vectors and matrices cannot contain elements of different modes (such as numbers and characters, or numbers and logical values), whereas lists and data.frames can. Data.frames, in addition, appear more 'tidy' than (some) lists. (For instance, a list can contain a data frame, but a data frame cannot, as far as I can see.)

There are basically three ways to obtain (or create) a data.frame:

The data obtained from a website, or forwarded by a colleague, may already be formatted as a data.frame.
Some procedures for reading or importing data from other formats automatically create data.frames.
You may transform a vector, a matrix or a list into a data.frame if it meets the requirements (as far as I can see, all rectangular objects do).

A data frame will always have column (i.e., variable) names. If no names are available, R will automatically create column names "V1", "V2", "V3" and so on.

For more information on data.frames, look at the entries in the next section, particularly about saving and reading data.