This page briefly explains the principles of column operations in tables, using the data.table syntax. For a general primer on data tables, check the relevant page.

The information of a data table is accessed with square brackets next to its name. Any operation involving columns is determined by the second argument within the square brackets.

For the examples below, we want to include all rows. Because of this, the first argument (meant for row filtering) will be left blank (e.g. table.name[,...]). However, rows can be easily selected if needed, by applying a filter in the first argument.

In most cases, the second argument can be one of two things:

  • A list of columns to display - existing or new. Any data.table is essentially a list of columns, even tables with a single column. In the data.table syntax, list() is synonymous to the more compact .(), so that’s what we’ll use.

  • An expression (calculation or function), determining what to do - including the creation of new columns.

Example data

Let’s use one of R’s built-in data sets, called iris. In order to use data.table’s syntax, we will first convert it from data.frame to data.table.

library(data.table)
iris.dt <- as.data.table(iris)
iris.dt
##      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
##   1:          5.1         3.5          1.4         0.2    setosa
##   2:          4.9         3.0          1.4         0.2    setosa
##   3:          4.7         3.2          1.3         0.2    setosa
##   4:          4.6         3.1          1.5         0.2    setosa
##   5:          5.0         3.6          1.4         0.2    setosa
##  ---                                                            
## 146:          6.7         3.0          5.2         2.3 virginica
## 147:          6.3         2.5          5.0         1.9 virginica
## 148:          6.5         3.0          5.2         2.0 virginica
## 149:          6.2         3.4          5.4         2.3 virginica
## 150:          5.9         3.0          5.1         1.8 virginica

Selecting columns

Selecting columns to display

The iris.dt table has 5 columns in total, but let’s assume that we only want to display 3: petal length, petal width and species. We can choose which columns to show as follows:

iris.dt[,.(Petal.Length,Petal.Width,Species)]
##      Petal.Length Petal.Width   Species
##   1:          1.4         0.2    setosa
##   2:          1.4         0.2    setosa
##   3:          1.3         0.2    setosa
##   4:          1.5         0.2    setosa
##   5:          1.4         0.2    setosa
##  ---                                   
## 146:          5.2         2.3 virginica
## 147:          5.0         1.9 virginica
## 148:          5.2         2.0 virginica
## 149:          5.4         2.3 virginica
## 150:          5.1         1.8 virginica

Omitting columns from display

To reach the same displayed columns, we can also choose to omit Sepal.Length and Sepal.Width. The easiest way to do this is the following:

iris.dt[,!c("Sepal.Length","Sepal.Width")]
##      Petal.Length Petal.Width   Species
##   1:          1.4         0.2    setosa
##   2:          1.4         0.2    setosa
##   3:          1.3         0.2    setosa
##   4:          1.5         0.2    setosa
##   5:          1.4         0.2    setosa
##  ---                                   
## 146:          5.2         2.3 virginica
## 147:          5.0         1.9 virginica
## 148:          5.2         2.0 virginica
## 149:          5.4         2.3 virginica
## 150:          5.1         1.8 virginica

This is a slightly different syntax from the list symbol used above (.()). Here, we combine the names of the columns we don’t want in a vector and exclude them with the ! symbol - the NOT operator in R (see logical operations).

Extracting a single column as vector

We can also call the name of a column without enclosing it in .() or list(). This will return the column as a vector:

iris.dt[,Petal.Length]
##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
##  [19] 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
##  [37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0
##  [55] 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0
##  [73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0
##  [91] 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3
## [109] 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0
## [127] 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
## [145] 5.7 5.2 5.0 5.2 5.4 5.1

Performing calculations

Let’s assume we want to calculate the petal area for each observation: petal area = petal length · petal width

There are two different ways to perform calculations, depending on our needs: simply display the calculation, or create a new column with the calculation and store it in memory.

Calculating to display

It’s important to consider what exactly we want to display. We could just display the calculated petal area, but it would be meaningless without the species name - it would be just a set of numbers.

Instead, we can calculate and show the petal area and the corresponding species, enclosed within .():

iris.dt[,.(Petal.Area = Petal.Length*Petal.Width,
           Species)]
##      Petal.Area   Species
##   1:       0.28    setosa
##   2:       0.28    setosa
##   3:       0.26    setosa
##   4:       0.30    setosa
##   5:       0.28    setosa
##  ---                     
## 146:      11.96 virginica
## 147:       9.50 virginica
## 148:      10.40 virginica
## 149:      12.42 virginica
## 150:       9.18 virginica

Creating new columns

We might want to store our calculation in memory as a new column of the table. To create new columns, we use the := assignment operator, which is specifically used for the data.table package. Let’s create a new column named Petal.Area:

iris.dt[,Petal.Area := Petal.Length*Petal.Width]

This creates a new column without displaying the table. To see the table, we simply need to call its name:

iris.dt
##      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species Petal.Area
##   1:          5.1         3.5          1.4         0.2    setosa       0.28
##   2:          4.9         3.0          1.4         0.2    setosa       0.28
##   3:          4.7         3.2          1.3         0.2    setosa       0.26
##   4:          4.6         3.1          1.5         0.2    setosa       0.30
##   5:          5.0         3.6          1.4         0.2    setosa       0.28
##  ---                                                                       
## 146:          6.7         3.0          5.2         2.3 virginica      11.96
## 147:          6.3         2.5          5.0         1.9 virginica       9.50
## 148:          6.5         3.0          5.2         2.0 virginica      10.40
## 149:          6.2         3.4          5.4         2.3 virginica      12.42
## 150:          5.9         3.0          5.1         1.8 virginica       9.18

To create multiple columns at once, the easiest way is the functional form of :=. We will use it to create 2 additional columns:

  • Sepal.Area: similarly calculated sepal area.

  • index: a simple index of each observation (i.e. each row).

iris.dt[
  ,`:=`(Sepal.Area = Sepal.Length*Sepal.Width,
        index = .I)
  ]
iris.dt
##      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species Petal.Area
##   1:          5.1         3.5          1.4         0.2    setosa       0.28
##   2:          4.9         3.0          1.4         0.2    setosa       0.28
##   3:          4.7         3.2          1.3         0.2    setosa       0.26
##   4:          4.6         3.1          1.5         0.2    setosa       0.30
##   5:          5.0         3.6          1.4         0.2    setosa       0.28
##  ---                                                                       
## 146:          6.7         3.0          5.2         2.3 virginica      11.96
## 147:          6.3         2.5          5.0         1.9 virginica       9.50
## 148:          6.5         3.0          5.2         2.0 virginica      10.40
## 149:          6.2         3.4          5.4         2.3 virginica      12.42
## 150:          5.9         3.0          5.1         1.8 virginica       9.18
##      Sepal.Area index
##   1:      17.85     1
##   2:      14.70     2
##   3:      15.04     3
##   4:      14.26     4
##   5:      18.00     5
##  ---                 
## 146:      20.10   146
## 147:      15.75   147
## 148:      19.50   148
## 149:      21.08   149
## 150:      17.70   150

Note: .I is a special symbol in data.table, returning an index number for each row.

In this instance, := is used as a function. Its arguments are the new columns we create, separated by comma.

Deleting columns

We can also use := to delete columns. We will most likely not need the index column we just created, so let’s delete it:

iris.dt[,index := NULL]