Universal Hashing

Cards (12)

  • Although hash tables are very efficient and have average cost per operation, assuming appropriate load factors, their analysis and performance depend on the hash function having two fundamental properties:

    The hash function must be computable in constant time (i.e., independent of the number of items in the hash table).
    The hash function must distribute its items uniformly among the array slots.
  • Universal Hashing
    A technique that allows hash functions to be chosen randomly in such a way that they distribute items uniformly among the array slots
  • Properties of hash functions for efficient hash tables
    • Must be computable in constant time
    • Must distribute items uniformly among the array slots
  • If the hash function is poor, the cost per operation can be linear
  • Universal hash function

    Allows choosing the hash function randomly in such a way that items are distributed uniformly among the array slots
  • Universal hash functions can be used in applications that require a high level of robustness, where worst-case or degraded performance cannot be tolerated
  • Universal hash function family
    For any x ≠ y, the number of hash functions h in the family for which h(x) = h(y) is at most |H|/M
  • Cuckoo hashing requires an O(logN)-universal hash function
  • Designing a simple universal hash function
    1. Assume mapping large integers to smaller integers 0 to M-1
    2. Let p be a prime larger than the largest input key
    3. H = {H(x) = ((ax + b) mod p) mod M, where 1 ≤ a ≤ p - 1, 0 ≤ b ≤ p - 1}
  • There are p(p-1) possible hash functions that can be chosen in this family
  • The hash family H = {H(x) = ((ax + b) mod p) mod M, where 1 ≤ a ≤ p - 1, 0 ≤ b ≤ p - 1} is universal
  • Implementation of universal hash function

    1. Requires two mod operations: mod p and mod M
    2. Can use a Mersenne prime p = 2^k - 1 to simplify the mod p operation
    3. Use the Carter-Wegman trick: r ≡ q'(p + 1) + r' (mod p), so r ≡ q' + r' (mod p)