Problems of Bloom Filter

  • Bloom filters cannot remove existing items without rebuilding the entire filter
  • Efforts to extend Bloom filters to support deletions introduce significant space and performance overhead

Contributions of Cuckoo Filter

  • Cuckoo filters support adding and removing items dynamically
  • Cuckoo filters provide higher lookup performance
  • Cuckoo filters use less space than Bloom filters under many scenarios.

Cuckoo Hashing Basics

Components:

  • Two hash tables T1 and T2, each with m elements
  • Two hash functions: h1 and h2, U[m]\mathcal{U}\rightarrow [m]

Note: Every element xUx\in\mathcal{U} will either be at position h1(x)h1(x) in T1T1 or h2(x)h2(x) in T2T2

Operations:

  • Lookup: check T1[h1(x)]T1[h1(x)] and T2[h2(x)]T2[h2(x)], test if they are equal to x.
  • Insert
    • First, check T1[h1(x)]T1[h1(x)] and T2[h2(x)]T2[h2(x)]
    • If one of these 2 buckets is available, insert the entry into that slot.
    • If no buckets are available
      • choose one hash table (e.g. T1T1) and kick out the element (element e=T1[h1(x)]e = T1[h1(x)]).
      • Insert element e to T2T2 (because it is just replaced from T1T1)
        • If there is still a collision, repeat the steps above until a maximum number of displacements is reached.

imgimg

Performance:

  • space: The table place can be 95% filled with proper configuration
  • time
    • insertion: amortized O(1)O(1)

Cuckoo Hashing Variants

As shown in the following figure

  • 2 hash functions and 1 hash table
  • 2 hash functions and 1 4-way hash table
img

How does Cuckoo Filter Do?

  • To make cuckoo filters highly space-efficient, use a multi-way associative cuckoo hash table.
  • To reduce hash table size, each item is first hashed into a constant-sized fingerprint before being inserted into the hash table.

Insertion

Cuckoo filters use original data to calculate hash value but only store the fingerprint of an element, which makes it hard to relocate. Relocations require the original data value.

To overcome this problem, cuckoo filters utilize a technique called partial-key cuckoo hashing to derive an item’s alternate location based on its fingerprint.

For each item x, the cuckoo filter calculates the indexed of the two candidate buckets:

h1(x)=hash(x),h2(x)=h1(x)hash(x’s fingerprint)h_1(x)=\text{hash}(x),\\ h_2(x)=h_1(x)\oplus \text{hash}(x\text{'s fingerprint})

With the above hash function, we are able to find an alternative bucket j with current bucket index i

j=ihash(fingerprint)j=i\oplus \text{hash}(\text{fingerprint})

img

Lookup – like cuckoo hashing

img

Delete – Simple

img

References

[1] Cuckoo Filter: Practically Better Than Bloom

[2] https://codecapsule.com/2013/07/20/cuckoo-hashing/

[3] https://zhuanlan.zhihu.com/p/442498412