Improving Point-Lookup Using Data Block Hash Index

- Author:: [[Fenggang Wu]] - Source:: https://www.instapaper.com/read/1307869075 - Source:: [Improving Point-Lookup Using Data Block Hash Index](https://rocksdb.org/blog/2018/08/23/data-block-hash-index.html) - Recommended By:: [[Artem Krylysov]] - rel:: [[Databases MOC]] [[RocksDB]] [[Performance]] - # Summary - "We’ve designed and implemented a ==__data block hash index__== in [[RocksDB]] that has the benefit of both reducing the CPU util and increasing the throughput for point lookup queries with a reasonable and tunable space overhead." - ![[dQgnv2Ml2W.png]] - At the end of `Put` and `Delete` record types, appends a compact hash table of hashed keys to byte offsets in the block - backwards compatible - on hash collisions falls back to binary search - ![[m2aluaVvbC.png]] - Configuration - `BlockBasedTableOptions::data_block_index_type` - `DataBlockBinaryAndHash` - `BlockBasedTableOptions::data_block_hash_table_util_ratio` - ratio of keys/buckets - Tuning hash lookup - "The default bytewise comparator orders the keys in alphabetical order and works well with hash index, as different keys will never be regarded as equal." - "However, some specially crafted comparators will do. For example, say, a `StringToIntComparator` can convert a string into an integer, and use the integer to perform the comparison." - "```virtual bool CanKeysWithDifferentByteContentsBeEqual() const { return true; }```" - Tuning "`BlockBasedTableOptions::data_block_hash_table_util_ratio`" - "==Adding the hash index to the end of the data block essentially takes up the data block cache space, making the effective data block cache size smaller and increasing the data block cache miss ratio==. Therefore, a very small util ratio will result in a large data block cache miss ratio, and the extra I/O may drag down the throughput gain achieved by the hash index lookup" - NB: cache misses are especially CPU-costly for compressed db - "In our experiment, we found util ratio = 0.5 ~ 1 is a good range to explore that brings both CPU and throughput gains." - Experimental Results - "We can see that if cache size is greater than 8GB, hash index can bring throughput gain. Cache size greater than 8GB can be translated to a cache miss ratio smaller than 40%. So if the workload has a cache miss ratio smaller than 40%, hash index is able to increase the throughput." - # Highlights - We’ve designed and implemented a ==__data block hash index__== in [[RocksDB]] that has the benefit of both reducing the CPU util and increasing the throughput for point lookup queries with a reasonable and tunable space overhead. - Specifially, we ==append a compact hash table to the end of the data block for efficient indexing.== It is ==backward compatible with the data base created without this feature==. After turned on the hash index feature, existing data will be gradually converted to the hash index format. - Benchmarks with `db_bench` show the CPU utilization of one of the main functions in the point lookup code path, `DataBlockIter::Seek()`, is reduced by 21.8%, and the overall RocksDB throughput is increased by 10% under purely cached workloads, at an overhead of 4.6% more space. Shadow testing with Facebook production traffic shows good CPU improvements too. - The hash index is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash`. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`, which is valid only if `data_block_index_type = kDataBlockBinaryAndHash`. - Each binary search branching triggers CPU cache miss, causing much CPU utilization. - RocksDB does a binary search when performing point lookup for keys in data blocks to find the right restart interval the key may reside. - ==We implemented a hash map at the end of the block to index the key to reduce the CPU overhead of the binary search.== The hash index is just an array of pointers pointing into the binary seek index. - ==When multiple keys happen to hash into the same bucket (hash collision), we just mark the bucket as “collision”.== So that when later querying on that key, the hash table lookup knows that there was a hash collision happened so it can^^ fall back to the traditional binary search^^ to find the location of the key. - ==define hash table utilization ratio as the keys/buckets==. If a utilization ratio is 0.5 and there are 100 buckets, 50 keys are stored in the bucket. The less the util ratio, the less hash collision, and the less chance for a point lookup falls back to binary seek (fall back ratio) due to the collision. - Hash index will hash different keys (keys with different content, or byte sequence) into different hash values. This assumes the comparator will not treat different keys as equal if they have different content. - The default bytewise comparator orders the keys in alphabetical order and works well with hash index, as different keys will never be regarded as equal. - However, some specially crafted comparators will do. For example, say, a `StringToIntComparator` can convert a string into an integer, and use the integer to perform the comparison. - We add a new function member to the comparator interface: - ```virtual bool CanKeysWithDifferentByteContentsBeEqual() const { return true; }``` - ==Adding the hash index to the end of the data block essentially takes up the data block cache space, making the effective data block cache size smaller and increasing the data block cache miss ratio==. Therefore, a very small util ratio will result in a large data block cache miss ratio, and the extra I/O may drag down the throughput gain achieved by the hash index lookup - ==when compression is enabled, cache miss also incurs data block decompression, which is CPU-consuming.== Therefore the CPU may even increase if using a too small util ratio. - In our experiment, we found util ratio = 0.5 ~ 1 is a good range to explore that brings both CPU and throughput gains. - RocksDB supports many types of records, such as `Put`, `Delete`, `Merge`, etc (visit [here](https://github.com/facebook/rocksdb/wiki/rocksdb-basics) for more information). Currently we only support `Put` and `Delete`, but not `Merge`. - `kPutRecord` - `kDeleteRecord` - `kSingleDeleteRecord` - `kTypeBlobIndex` - For records not supported, the searching process will fall back to the traditional binary seek. - We can see that if cache size is greater than 8GB, hash index can bring throughput gain. Cache size greater than 8GB can be translated to a cache miss ratio smaller than 40%. So if the workload has a cache miss ratio smaller than 40%, hash index is able to increase the throughput.