Hash tables with worst-case O(1) Access

Cards (43)

  • Caching the hash code
    Technique to avoid expensive recomputation of hash code
  • Caching the hash code works only because Strings are immutable
  • Caching the hash code helps when the keys are only Strings that were stored in the original array of Strings
  • Hash tables can have O(1) worst-case access time
  • Perfect hashing

    Technique to achieve O(1) worst-case access time by using a primary hash table with secondary hash tables to resolve collisions
  • The total size of the secondary hash tables in perfect hashing has expected value at most 2N
  • Cuckoo hashing
    Technique that maintains two hash tables and two hash functions, with the invariant that an item is always stored in one of the two locations
  • Cuckoo hashing requires at most two table accesses for a search, and a remove is trivial once the item is located
  • Building a cuckoo hash table can be challenging when there are not enough available locations in the two tables
  • A search in a cuckoo hash table requires at most two table accesses
  • Removing an item is trivial, once the item is located
  • The cuckoo hashing algorithm is simple: To insert a new item x, first make sure it is not already there, then use the first hash function and if the (first) table location is empty, the item can be placed
  • If the table's load factor is below 0.5, the probability of a cycle is very low, the expected number of displacements is a small constant, and it is extremely unlikely that a successful insertion would require more than O(logN) displacements
  • If the table's load factor is at 0.5 or higher, the probability of a cycle becomes drastically higher, and this scheme is unlikely to work well at all
  • Cuckoo hashing variations

    • Use a higher number of tables (e.g. 3 or 4)
    • Allow each table to store multiple keys
    • Implement as one giant table with two (or more) hash functions that probe the entire table
  • HashFamily interface

    Provides a collection of hash functions for cuckoo hashing
  • The cuckoo hash table implementation uses a single array that is addressed by all the hash functions, rather than separately addressable hash tables
  • The maximum load for the table is set to 0.4, and if the load factor is about to exceed this limit, an automatic table expansion is performed
  • The implementation allows a maximum number of rehashes (ALLOWED_REHASHES) before expanding the table
  • Inserting an item

    1. Check if item is already present
    2. If table is fully loaded, expand it
    3. Call helper routine to do the insertion
  • The helper insertion routine keeps track of the number of rehashes and is mutually recursive with the rehash routine
  • Cuckoo hashing insertion
    1. Select appropriate hash function
    2. Scale hash function into valid array index
    3. Consult all hash functions to return index containing item x, or -1 if x is not found
    4. Check if item is already present, return if so
    5. Check if table is fully loaded, expand if so
    6. Call helper routine to do insertion work
  • Cuckoo hashing insertion helper routine
    1. Declare rehashes variable to track attempts to rehash
    2. Check if any valid positions are empty, place item in first available position if so
    3. Evict one of the existing items if no empty positions
    4. Avoid evicting first, last, or in sequence items
    5. Maintain last evicted position, select new random item if last evicted
    6. Limit rehash loop to 5 iterations
  • Cuckoo hashing expand and rehash
    1. Expand creates larger array but keeps same hash functions
    2. Rehash leaves array size unchanged but creates new array with newly chosen hash functions
  • StringHashFamily
    Provides simple hash functions for strings, replacing constant 37 with randomly chosen numbers
  • Benefits of cuckoo hashing include worst-case constant lookup and deletion times, avoidance of lazy deletion and extra data, and potential for parallelism
  • Cuckoo hashing is extremely sensitive to the choice of hash functions, many standard hash functions perform poorly
  • Insertion time in cuckoo hashing is expected to be constant time as long as load factor is below 1/2, but bound for expected insertion cost deteriorates rapidly as load factor approaches 1/2
  • Using lower load factors or more than two hash functions seems like a reasonable alternative in cuckoo hashing
  • Hopscotch hashing
    New algorithm that tries to improve on classic linear probing
  • Hopscotch hashing

    • Bounds the maximal length of the probe sequence by a predetermined constant optimized for the underlying computer's architecture
    • Gives constant-time lookups in the worst case
    • Lookup can be parallelized to simultaneously check the bounded set of possible locations
  • Hopscotch hashing insertion

    1. If insertion would place new item too far from its hash location, efficiently go backward toward the hash location, evicting potential items
    2. Evictions are done quickly and guarantee evicted items are not placed too far from their hash locations
    3. Deterministic algorithm, either items can be evicted or table is too crowded and needs rehash
  • MAX_DIST

    Chosen bound on maximum probe sequence, item x must be found somewhere in the MAX_DIST positions listed in hash(x), hash(x) + 1, ..., hash(x) + (MAX_DIST - 1)
  • For a table with load factor 1/2, the failure probability in hopscotch hashing is almost zero
  • Insertion routine for cuckoo hashing

    1. Choose item to evict randomly, attempting not to re-evict the last item
    2. Table will attempt to select new hash functions (rehash) if there are too many evictions
    3. Table will expand if there are too many rehashes
  • Rehashes
    Number of times the hash functions have been changed
  • Random
    Used to randomly select an item to evict
  • Insertion helper

    1. Check if item can be inserted in current positions
    2. If not, evict a random item and insert the new item
    3. If too many rehashes, expand the table
    4. If still too many rehashes, rehash with new hash functions
  • Expand
    Increase the size of the hash table
  • Rehash
    Generate new hash functions for the same size hash table