Wide/long table format
Featured in 1 main post
We can arrange a data table in different ways to represent the same information, depending on what is convenient for us. Wide and long table formats are both useful, depending on our needs, and in this page we’ll see how to easily convert a table between them.
Example data
Let’s divide a group of co-workers from a small regional branch of a paper company in two teams for a friendly match of basketball. To encourage cross-departmental collaboration, each team will have one person from each type of position. We’ll first create a table, where each row corresponds to a team.
# Load data.table library
library(data.table)
# Create a data.table
example.wide <- data.table(team=c("Scranton.1","Scranton.2"),
sales=c("Jim","Stanley"),
accounting=c("Angela","Oscar"),
HR=c("Toby","Holly"),
manager=c("Michael","Deangelo"),
warehouse=c("Darryl","Roy"))
example.wide
## team sales accounting HR manager warehouse
## 1: Scranton.1 Jim Angela Toby Michael Darryl
## 2: Scranton.2 Stanley Oscar Holly Deangelo Roy
This is considered a wide table format, as it expands in width if more players are added. This format is fine to display the teams’ lineups, but very limiting for common admin tasks. For example, it’s impossible to add more information on each player, such as height, or an ID number. A more practical way to organize the data is in the long format, so that each row corresponds to a player.
From wide to long format
The melt function converts a table from the wide to the long format:
example.long <- example.wide |>
melt(id.vars="team",
variable.name="department",
value.name="name")
example.long
## team department name
## 1: Scranton.1 sales Jim
## 2: Scranton.2 sales Stanley
## 3: Scranton.1 accounting Angela
## 4: Scranton.2 accounting Oscar
## 5: Scranton.1 HR Toby
## 6: Scranton.2 HR Holly
## 7: Scranton.1 manager Michael
## 8: Scranton.2 manager Deangelo
## 9: Scranton.1 warehouse Darryl
## 10: Scranton.2 warehouse Roy
Arguments of melt function:
-
id.vars: The name of the column(s) that work as row identifiers. They must not overlap withmeasure.vars(see below). Defaults to all columns exceptmeasure.varsif unspecified. Example:id.vars="id.column"orid.vars=c("id.column.1","id.column.2"). -
measure.vars(not used here): The names of the columns with the values. They must not overlap withid.vars. Defaults to all columns exceptid.varsif unspecified, as in this case. Example:measure.vars="measure.column"ormeasure.vars=c("measure.column.1","measure.column.2"). -
variable.name: Name that will be given to the column with the names ofmeasure.varscolumns of the wide table. Defaults to “variable” if unspecified. -
value.name: Name that will be given to the column with the values ofmeasure.varscolumns of the wide table. Defaults to “value” if unspecified.
Now that each player corresponds to a row, we can add more information on each one. Let’s create two new columns to give an ID number to each player and fill in their heights.
example.long[,`:=`(ID=.I,
height.cm=c(187,180,160,173,176,167,182,190,180,189))]
example.long
## team department name ID height.cm
## 1: Scranton.1 sales Jim 1 187
## 2: Scranton.2 sales Stanley 2 180
## 3: Scranton.1 accounting Angela 3 160
## 4: Scranton.2 accounting Oscar 4 173
## 5: Scranton.1 HR Toby 5 176
## 6: Scranton.2 HR Holly 6 167
## 7: Scranton.1 manager Michael 7 182
## 8: Scranton.2 manager Deangelo 8 190
## 9: Scranton.1 warehouse Darryl 9 180
## 10: Scranton.2 warehouse Roy 10 189
From long to wide format
We can go back to the wide format with the dcast function. Let’s use it to arrange the table somewhat differently.
example.wide.2 <- example.long |>
dcast(formula=department~team,
value.var="name")
example.wide.2
## department Scranton.1 Scranton.2
## 1: sales Jim Stanley
## 2: accounting Angela Oscar
## 3: HR Toby Holly
## 4: manager Michael Deangelo
## 5: warehouse Darryl Roy
Arguments of dcast function:
formula: The arrangement that we want in the resulting table, based on the column names of the long table, in the form rows~columns.
value.var: The name of the long table column from which the wide one will take its values.
Each row of our new table corresponds to a department, not a team. The new information we entered (ID number and height) is not there, as there is no way to show more than one attribute for each player (here we show their names). That’s why wide tables are limiting.
If we choose to show just player height in a wide format, this is how we would do it:
example.wide.height <- example.long |>
dcast(formula=department~team,
value.var="height.cm")
example.wide.height
## department Scranton.1 Scranton.2
## 1: sales 187 180
## 2: accounting 160 173
## 3: HR 176 167
## 4: manager 182 190
## 5: warehouse 180 189
We can also revert the long table back to our original wide table:
example.wide.original <- example.long |>
dcast(formula=team~department,
value.var="name")
example.wide.original
## team sales accounting HR manager warehouse
## 1: Scranton.1 Jim Angela Toby Michael Darryl
## 2: Scranton.2 Stanley Oscar Holly Deangelo Roy