Cross-join
Featured in 3 main posts
There are cases in which we want to create a data table with all possible arrangements of different sets of values (otherwise called the Cartesian product).
Example
We want to list shirts of different colors, sizes and fabrics. Each shirt can be:
-
one of 5 colors: black, white, red, green, blue
-
one of 4 sizes: small, medium, large, extra large
-
one of 3 fabrics: cotton, linen, wool
To represent shirts with all combinations of these properties, we can need to cross-join them.
The CJ function of the data.table package cross-joins given vectors into a data table. In this particular case, we can use the following three vectors:
color <- c("black","white","red","green","blue")
size <- c("small","medium","large","extra large")
fabric <- c("cotton","linen","wool")
The resulting table should have the following structure:
-
Number of columns: number of vectors (in this case 3)
-
Number of rows: product of vector length (in this case 5 · 4 · 3 = 60)
# Load data.table library
library(data.table)
# Create all possible combinations of color, size, and fabric
CJ(color,size,fabric)
## color size fabric
## 1: black extra large cotton
## 2: black extra large linen
## 3: black extra large wool
## 4: black large cotton
## 5: black large linen
## 6: black large wool
## 7: black medium cotton
## 8: black medium linen
## 9: black medium wool
## 10: black small cotton
## 11: black small linen
## 12: black small wool
## 13: blue extra large cotton
## 14: blue extra large linen
## 15: blue extra large wool
## 16: blue large cotton
## 17: blue large linen
## 18: blue large wool
## 19: blue medium cotton
## 20: blue medium linen
## 21: blue medium wool
## 22: blue small cotton
## 23: blue small linen
## 24: blue small wool
## 25: green extra large cotton
## 26: green extra large linen
## 27: green extra large wool
## 28: green large cotton
## 29: green large linen
## 30: green large wool
## 31: green medium cotton
## 32: green medium linen
## 33: green medium wool
## 34: green small cotton
## 35: green small linen
## 36: green small wool
## 37: red extra large cotton
## 38: red extra large linen
## 39: red extra large wool
## 40: red large cotton
## 41: red large linen
## 42: red large wool
## 43: red medium cotton
## 44: red medium linen
## 45: red medium wool
## 46: red small cotton
## 47: red small linen
## 48: red small wool
## 49: white extra large cotton
## 50: white extra large linen
## 51: white extra large wool
## 52: white large cotton
## 53: white large linen
## 54: white large wool
## 55: white medium cotton
## 56: white medium linen
## 57: white medium wool
## 58: white small cotton
## 59: white small linen
## 60: white small wool
## color size fabric
There is no limit to the number and size of vectors we can use, except the computer’s memory.
Sometimes the number of vectors can vary, or it’s just too cumbersome to write them one by one. If they are contained in a list, we can always call CJ using do.call
# Create a list containing the vectors
properties <- list(color=c("black","white","red","green","blue"),
size=c("small","medium","large","extra large"),
fabric=c("cotton","linen","wool"))
# Give the list as arguments to CJ, using do.call
do.call(what=CJ,args=properties)
## color size fabric
## 1: black extra large cotton
## 2: black extra large linen
## 3: black extra large wool
## 4: black large cotton
## 5: black large linen
## 6: black large wool
## 7: black medium cotton
## 8: black medium linen
## 9: black medium wool
## 10: black small cotton
## 11: black small linen
## 12: black small wool
## 13: blue extra large cotton
## 14: blue extra large linen
## 15: blue extra large wool
## 16: blue large cotton
## 17: blue large linen
## 18: blue large wool
## 19: blue medium cotton
## 20: blue medium linen
## 21: blue medium wool
## 22: blue small cotton
## 23: blue small linen
## 24: blue small wool
## 25: green extra large cotton
## 26: green extra large linen
## 27: green extra large wool
## 28: green large cotton
## 29: green large linen
## 30: green large wool
## 31: green medium cotton
## 32: green medium linen
## 33: green medium wool
## 34: green small cotton
## 35: green small linen
## 36: green small wool
## 37: red extra large cotton
## 38: red extra large linen
## 39: red extra large wool
## 40: red large cotton
## 41: red large linen
## 42: red large wool
## 43: red medium cotton
## 44: red medium linen
## 45: red medium wool
## 46: red small cotton
## 47: red small linen
## 48: red small wool
## 49: white extra large cotton
## 50: white extra large linen
## 51: white extra large wool
## 52: white large cotton
## 53: white large linen
## 54: white large wool
## 55: white medium cotton
## 56: white medium linen
## 57: white medium wool
## 58: white small cotton
## 59: white small linen
## 60: white small wool
## color size fabric
See the documentation file on data.table’s CJ for more details.