1.Joins on Encoded and Partitioned DataJae-Gil Lee2* Gopi Attaluri3 Ronald Barber1 Naresh Chainani3 Oliver Draese3 Frederick Ho5 Stratos Idreos4* Min-Soo Kim6* Sam Lightstone3 Guy Lohman1 Konstantinos Morfonios8* Keshava Murthy10*Ippokratis Pandis7* Lin Qiao9* Vijayshankar Raman1 Vincent Kulandai Samy3 Richard Sidle1 Knut Stolze3 Liping Zhang31IBM Almaden Research Center 2KAIST, Korea 3IBM Software Group4Harvard University 5IBM Informix 6DGIST, Korea 7Cloudera 8Oracle 9LinkedIn 10MapR* Work was done while the author was with IBM Almaden Research Center
VLDB 2014 Industrial Track
3.Blink Project
Accelerator technology developed by IBM Almaden Research Center since 2007
Main features
Storing a compressed copy of a (portion of a) data warehouse
Exploiting (i) large main memories, (ii) commodity multi-core processors, and (iii) proprietary compression
Improving the performance of typical business intelligence(BI) SQL queries by 10 to 100 times
Not requiring the tuning of indexes, materialized views, etc.
Products offered by IBM based upon Blink
Informix Warehouse Accelerator: released on March 2011
IBM Smart Analytics Optimizer for DB2 for z/OS V1.1
A predecessor to today’s IBM DB2 Analytics Accelerator for DB2 for z/OS
4.Informix Warehouse Accelerator(IWA)
A main-memory accelerator to the disk-based Informix database server product, packaged as the Informix Ultimate Warehouse Edition(IUWE)
System Architecture Data Loading and Query Execution
5.Main Features Related to Joins
Performing joins directly on encoded data
Join method: hash joins
Encoding method: dictionary encoding
Handling join columns encoded differently: encoding translation
Partitioning a column to support incremental updates and achieve better compression: frequency partitioning
Encoding non-join(payload) columns on the fly
6.Hash Joins
Build phase
Scan each dimension table, applying local predicates
Hash to an empty bucket in the hash table
Store the values of join columns as well as “payload” columns
Probe phase
Scan the fact table, applying local predicates
Look up the hash table with the foreign key per dimension
Retrieve the values of payload columns
Example
A simple join query betweenLINEITEM and ORDERS
scan(ORDERS)
σ(O_OrderDate …)
scan(LINEITEM)
σ(L_ShipDate …)
σ(L_OrderKey IN …)
Look up the values of O_OrderDate
Group by, Aggregation
O_OrderKey O_OrderDate
Dimension
Fact
Hash Table
7.Dictionary Encoding
A value of a column is replaced by an encoded value requiring only a few bits
Example
Dictionary
Encoding
10bytes
6bits
9.Updates in Dictionary Encoding
Option 1: leaving room for future values
Downside: overestimation of the number of future values will waste bits; underestimation will require re-encoding all values to add additional ones beyond the capacity
Option 2: partitioning the domain and creating separate dictionaries for each partition our approach
Upside: the impact of adding new values can be isolated from the dictionaries of any existing partitions
New values are simply added to a partition that will be created on the fly, as values arrive
We leave the values in that partition unencoded
10.Frequency Partitioning
Achieving better compression: approximate Huffman
Defining fixed-length codes within a partition
Example
Top 64 traded goods –6 bit code
Rest
origin
product
ChinaUSA
GER,FRA,…
Rest
Column partitions
Cell 4
Cell 1
Cell 2
Cell 3
Cell 5
Cell 6
Sales
China, USA: 1bit
EU: 5bits
Rest: 8bits
1M, 100K, 10K occurrences
of each group
Frequency partitioning=
8bits for all countries=
1.58Mbits
8.88Mbits
11.Catch-All Cell (1/2)
Cell: an intersection of the partitions for each column
The rows having one of the values from each corresponding partition, where each row is formed by concatenating the fixed-length code for each of its columns
Potential problem: proliferation of cells
e.g., 2 partitions for each column (one for encoded, one for unencoded) 2 𝐶 , 𝐶 is the number of columns
Catch-all cell: a special cell for unencoded values
Any rows containing an unencoded value in any column
Benefit: minimizing the number of cells for unencoded values
12.Catch-All Cell (2/2)
Example
Containing the 5th and 6th rows in unencoded form
LINEITEM
Encoding
100
200
100
300
100
400
8/2/2010
9/4/2010
9/4/2010
8/2/2010
5/1/2010
8/2/2010
Cell 0: K0 X D0
Cell 1: K1 X D0
Catch-All Cell
0
0
0
1
0
1
1
0
100
400
5/1/2010
8/2/2010
Dictionary of LINEITEM
L_OrderKey
Partition K0: 100
Partition K1: 200 300
L_ShipDate
Partition D0: 8/2/2010 9/4/2010
L_OrderKey L_ShipDate
L_OrderKey L_ShipDate
unencodable
same value
14.Joins on Encoded Values (1/2)
Option 1: per-domain encoding
Encoding join columns identically on disk
𝑉 1 = 𝑉 2 ⟺𝑀 𝑉 1 =𝑀( 𝑉 2 ), 𝑀 is an encoding scheme
Not clear which column’s distribution should be picked up
Option 2: translation to common code
Translating both join columns to a new common encoding at runtime
Incurring the CPU cost of decoding and re-encoding both columns
⊳⊲
⊳⊲
⊳⊲
Encoded using the same scheme
15.Joins on Encoded Values (2/2)
Option 3: per-column encoding our approach
Encoding join columns independently on disk
Translating only one join column to the encoding of the other at runtime
Encoding translation: 𝑀 𝐹𝐾 ( 𝑀 𝑃𝐾 −1 𝐸 𝑃𝐾 )
Typically, translating from the encoding of the build side to the encoding of the probe side
⊳⊲
⊳⊲
Encoding Translation
build probe build probe
16.Advantages of Per-Column Encoding
Better compression
The ideal encoding for one column may not be ideal for the other (see next page)
Flexible reorganization
Any tables sharing a common dictionary are inextricably linked
Ad hoc querying
Which columns might be joined in a query may not be known when the data is encoded
17.Better Compression of Skewed Data
33~50% gain
21% gain
per-column per-domain
18.Encoding Translation
Challenge
Dealing with the multiple representations of the same value caused by the catch-all cell
At least, one encoded and one unencoded
Two variants
DTRANS(Dimension TRANSlation)
Resolving the multiple representations in the dimension-table scan
Reducing the overhead of the probe phase
FTRANS(Fact TRANSlation)
Resolving the multiple representations during the fact-table scan
Reducing the overhead of the build phase
19.Encoding Translation: DTRANS
Partition 0
Partition 1
Catch-All Cell
0
0
0
1
100
400
HT[0] HT[1] HT[2]
0
0
1
100
200
300
400
Hash Tables
Direct Probes
Data
ORDERS
O_OrderKey O_OrderStatus
"S"
"S"
"S"
"S"
"R"
100
200
300
400
500
0
0
1
100
200
300
400
Hash Tables
HT[0] HT[1] HT[2]
Build Phase:
Probe Phase:
Having all qualifying key values in unencoded form
1 hash table per fact-table partition
Encodable
Unencodable
22.On-the-Fly(OTF) Encoding (1/2)
Reasons for encoding payload columns
The join key is usually just an integer, whereas the payloads are often wider strings higher impact of compression
Benefits of the on-the-fly(OTF) encoding
Updates: a mixture of encoded and unencoded payloads are hard to maintain using hash tables
Expressions: the results of an expression, e.g., MONTH(ShipDate), can be encoded very compactly
Correlation: correlated columns in a query, e.g., City, State, ZIPCode, and Country, can be used to create a tighter code
Predicates: local/join predicates will likely reduce the cardinality of each column, allowing a more compact representation
23.On-the-Fly(OTF) Encoding (2/2)
Mechanism
Use a mapping table that consists of a list of hash tables
Return an index into the bucket where the value was inserted an OTF code
The OTF code is not changed, even if the hash table is resized
Example
600+1024+2048+40=3712
Size:
1024
Size:
2048
Size:
4096
Hash Tables
40
value
Original Dictionary
Size:
600
25.Experimental Setting
Five alternative configurations
Data set and queries: a simplified TPC-H data set and queries
Measure: time for (i) build phase, (ii) probe phase, and (iii) scan
𝑡 𝑏𝑢𝑖𝑙𝑑
𝑡 𝑝𝑟𝑜𝑏𝑒
𝑡 𝑏𝑎𝑠𝑒
26.Per-Domain vs. Per-Column
DTRANS(per-column) outperforms:
DECODE in query performance
1DICT(per-domain) in compression ratio
27.When Does DTRANS Win?
wall clock time (sec)
DTRANS outperforms FTRANS when:
Dimension tables are small , OR
High ratio of rows are left unencoded
Varying the dimension size Varying the ratio of unencoded rows
28.Summary of the Results
DTRANS or FTRANS outperform traditional DECODE for most cases by up to 40% of query performance
DTRANS or FTRANS improve the compression ratio by at least 16%(or up to 50% in skewed data), with negligible overhead in query processing, in comparison with having one dictionary for both join columns(1DICT)
DTRANS is preferred when dimension tables are small
FTRANS is preferred when a fact table is small or local predicates on a fact table are very selective
DTRANS is preferred when high ratio of unencoded rows
30.Conclusions
Partitioning column domains benefits:
Compression ratio (partition by frequency)
Incremental update without changing dictionaries
Independently encoding join columns:
Optimizes compression of each
Requires translation at run time
Translating dimension table's values preferred when
| Dimension table | ≪ | Fact table |, OR
High ratio of unencoded rows
Encoding payload columns on the fly reduces hash-table space
Implemented in Informix Warehouse Accelerator
31.Blink Refereed Publications
Jae-Gil Lee et al.: Joins on Encoded and Partitioned Data. PVLDB 7(13): 1355-1366 (2014)
Vijayshankar Raman et al.: DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB 6(11): 1080-1091 (2013)
Lin Qiao, Vijayshankar Raman, Frederick Reiss, Peter J. Haas, Guy M. Lohman: Main-memory scan sharing for multi-core CPUs. PVLDB 1(1): 610-621 (2008)
Ryan Johnson, Vijayshankar Raman, Richard Sidle, Garret Swart: Row-wise parallel predicate evaluation. PVLDB 1(1): 622-634 (2008)
Vijayshankar Raman, Garret Swart, Lin Qiao, Frederick Reiss, Vijay Dialani, Donald Kossmann, Inderpal Narang, Richard Sidle: Constant-Time Query Processing. ICDE 2008: 60-69
Allison L. Holloway, Vijayshankar Raman, Garret Swart, David J. DeWitt: How to barter bits for chronons: compression and bandwidth trade offs for database scans. SIGMOD Conference 2007: 389-400
Vijayshankar Raman, Garret Swart: How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. VLDB 2006: 858-869