Looking at Data
This section deals mainly with data organized as a data.frame
. Some of it also applies to matrices.
Information about the structure of datasets
Variable names:The columns of a data set represent variables, and they typically have names by which they can be addressed. To find out what the names in a current data object are, use
names(name-of-data-object)
This is probably valid for data.frames only. In the case of matrices (which may or may not have row or column names!) try rownames(name-of-matrix)
(or perhaps colnames
instead).
Some additional information (I think the following is self-explaining):
ncol(schools.data)
nrow(schools.data)
complete.cases(schools.data)
Note that the latter yields a vector of length nrow()
, indicating for each row whether it contains at least one missing value (i.e., NA or a similar entry).
The spreadsheet
You can have a look at data that are organized as a data frame or a matrix with a built-in spreadsheet.
fix(mydata)
Note that the data in this spreadsheet may be modified. Furthermore, upon clicking on a column name you can change this name, and you can change the mode of the column from numeric to character or vice versa.
In the case of a data.frame, you will not be asked whether or not you want to save the changes upon quitting the spreadsheet; all changes will persist (at least in the current workspace). So, cautious people might wish to use the following instead:
mydata2 <- edit(mydata)
This will copy whatever has been gathered in mydata
to a new object, here called mydata2, and show this new object in the spreadsheet. You may, of course, simply write edit(data1)
, but this will not only open the spreadsheet, but also write the entire data into the R console. Of course, this command will be useful if you do plan to manually modify the data. (Note that after leaving the spreadsheet, any changes you have made will be effective in the new object, but they will not have been saved to disk; this is a different step in the workflow.)
Currently (i.e. in R version 3.3.0) the editor is limited to 65,535 rows.
Other ways to look at data
Let's assume that you have assigned a data set to a datal.frame
object called mydata
. You can inspect rows, colums or other extracts from that data set as follows (note that there are many similarities, but also some differences, to the case of a matrix):
head(mydata) | lists the firstfive rows or so of mydata |
tail(mydata) | lists the last five rows or so of mydata |
mydata$v2 | lists the variable (column) named v2 as one or several continous lines of numbers |
mydata[c("V2","V4")] | lists the variables (columns) named v2 and v4 in column format |
mydata[3:9, c("V2","V4")] | lists rows 3 to 9 of the variables (columns) named v2 and v4 in column format |
mydata[3,] | lists the third row (case) of mydata |
mydata[100:150,] | lists rows 100 to 150 of mydata |
mydata[2] | lists the second column of mydata in column format |
mydata[,2] | lists the second column of mydata as one or several continuous lines of numbers |
mydata[,2:3] | lists the second and third column of mydata in column format. Same as mydata[2:3]
|
mydata[1:5,2:3] | lists the second and third column of the first five rows of mydata in column format. |
mydata | lists the complete data |
© W. Ludwig-Mayerhofer, R Guide | Last update: 05 Dec 2016