🐦 Cuckoo Filter
Problems of Bloom Filter
- Bloom filters cannot remove existing items without rebuilding the entire filter
- Efforts to extend Bloom filters to support deletions introduce significant space and performance overhead
Contributions of Cuckoo Filter
- Cuckoo filters support adding and removing items dynamically
- Cuckoo filters provide higher lookup performance
- Cuckoo filters use less space than Bloom filters under many scenarios.
Cuckoo Hashing Basics
Components:
- Two hash tables T1 and T2, each with m elements
- Two hash functions: h1 and h2,
Note: Every element will either be at position in or in
Operations:
- Lookup: check and , test if they are equal to x.
- Insert
- First, check and
- If one of these 2 buckets is available, insert the entry into that slot.
- If no buckets are available
- choose one hash table (e.g. ) and kick out the element (element ).
- Insert element e to (because it is just replaced from )
- If there is still a collision, repeat the steps above until a maximum number of displacements is reached.
Performance:
- space: The table place can be 95% filled with proper configuration
- time
- insertion: amortized
Cuckoo Hashing Variants
As shown in the following figure
- 2 hash functions and 1 hash table
- 2 hash functions and 1 4-way hash table
How does Cuckoo Filter Do?
- To make cuckoo filters highly space-efficient, use a multi-way associative cuckoo hash table.
- To reduce hash table size, each item is first hashed into a constant-sized fingerprint before being inserted into the hash table.
Insertion
Cuckoo filters use original data to calculate hash value but only store the fingerprint of an element, which makes it hard to relocate. Relocations require the original data value.
To overcome this problem, cuckoo filters utilize a technique called partial-key cuckoo hashing to derive an item’s alternate location based on its fingerprint.
For each item x, the cuckoo filter calculates the indexed of the two candidate buckets:
With the above hash function, we are able to find an alternative bucket j with current bucket index i
Lookup – like cuckoo hashing
Delete – Simple
References
[1] Cuckoo Filter: Practically Better Than Bloom
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Explorer!