We can arrange a data table in different ways to represent the same information, depending on what is convenient for us. Wide and long table formats are both useful, depending on our needs, and in this page we’ll see how to easily convert a table between them.

Example data

Let’s divide a group of co-workers from a small regional branch of a paper company in two teams for a friendly match of basketball. To encourage cross-departmental collaboration, each team will have one person from each type of position. We’ll first create a table, where each row corresponds to a team.

# Load data.table library
library(data.table)

# Create a data.table
example.wide <- data.table(team=c("Scranton.1","Scranton.2"),
                           sales=c("Jim","Stanley"),
                           accounting=c("Angela","Oscar"),
                           HR=c("Toby","Holly"),
                           manager=c("Michael","Deangelo"),
                           warehouse=c("Darryl","Roy"))

example.wide
##          team   sales accounting    HR  manager warehouse
## 1: Scranton.1     Jim     Angela  Toby  Michael    Darryl
## 2: Scranton.2 Stanley      Oscar Holly Deangelo       Roy

This is considered a wide table format, as it expands in width if more players are added. This format is fine to display the teams’ lineups, but very limiting for common admin tasks. For example, it’s impossible to add more information on each player, such as height, or an ID number. A more practical way to organize the data is in the long format, so that each row corresponds to a player.

From wide to long format

The melt function converts a table from the wide to the long format:

example.long <- example.wide |>
  melt(id.vars="team",
       variable.name="department",
       value.name="name")

example.long
##           team department     name
##  1: Scranton.1      sales      Jim
##  2: Scranton.2      sales  Stanley
##  3: Scranton.1 accounting   Angela
##  4: Scranton.2 accounting    Oscar
##  5: Scranton.1         HR     Toby
##  6: Scranton.2         HR    Holly
##  7: Scranton.1    manager  Michael
##  8: Scranton.2    manager Deangelo
##  9: Scranton.1  warehouse   Darryl
## 10: Scranton.2  warehouse      Roy

Arguments of melt function:

  • id.vars: The name of the column(s) that work as row identifiers. They must not overlap with measure.vars (see below). Defaults to all columns except measure.vars if unspecified. Example: id.vars="id.column" or id.vars=c("id.column.1","id.column.2").

  • measure.vars (not used here): The names of the columns with the values. They must not overlap with id.vars. Defaults to all columns except id.vars if unspecified, as in this case. Example: measure.vars="measure.column" or measure.vars=c("measure.column.1","measure.column.2").

  • variable.name: Name that will be given to the column with the names of measure.vars columns of the wide table. Defaults to “variable” if unspecified.

  • value.name: Name that will be given to the column with the values of measure.vars columns of the wide table. Defaults to “value” if unspecified.

Now that each player corresponds to a row, we can add more information on each one. Let’s create two new columns to give an ID number to each player and fill in their heights.

example.long[,`:=`(ID=.I,
                   height.cm=c(187,180,160,173,176,167,182,190,180,189))]

example.long
##           team department     name ID height.cm
##  1: Scranton.1      sales      Jim  1       187
##  2: Scranton.2      sales  Stanley  2       180
##  3: Scranton.1 accounting   Angela  3       160
##  4: Scranton.2 accounting    Oscar  4       173
##  5: Scranton.1         HR     Toby  5       176
##  6: Scranton.2         HR    Holly  6       167
##  7: Scranton.1    manager  Michael  7       182
##  8: Scranton.2    manager Deangelo  8       190
##  9: Scranton.1  warehouse   Darryl  9       180
## 10: Scranton.2  warehouse      Roy 10       189

From long to wide format

We can go back to the wide format with the dcast function. Let’s use it to arrange the table somewhat differently.

example.wide.2 <- example.long |>
  dcast(formula=department~team,
        value.var="name")

example.wide.2
##    department Scranton.1 Scranton.2
## 1:      sales        Jim    Stanley
## 2: accounting     Angela      Oscar
## 3:         HR       Toby      Holly
## 4:    manager    Michael   Deangelo
## 5:  warehouse     Darryl        Roy

Arguments of dcast function:

formula: The arrangement that we want in the resulting table, based on the column names of the long table, in the form rows~columns.

value.var: The name of the long table column from which the wide one will take its values.

Each row of our new table corresponds to a department, not a team. The new information we entered (ID number and height) is not there, as there is no way to show more than one attribute for each player (here we show their names). That’s why wide tables are limiting.

If we choose to show just player height in a wide format, this is how we would do it:

example.wide.height <- example.long |>
  dcast(formula=department~team,
        value.var="height.cm")

example.wide.height
##    department Scranton.1 Scranton.2
## 1:      sales        187        180
## 2: accounting        160        173
## 3:         HR        176        167
## 4:    manager        182        190
## 5:  warehouse        180        189

We can also revert the long table back to our original wide table:

example.wide.original <- example.long |>
  dcast(formula=team~department,
        value.var="name")

example.wide.original
##          team   sales accounting    HR  manager warehouse
## 1: Scranton.1     Jim     Angela  Toby  Michael    Darryl
## 2: Scranton.2 Stanley      Oscar Holly Deangelo       Roy