Extendible Hashing

Cards (5)

  • Universal hash functions exist for strings also. First, choose any prime p, larger than M (and larger than the largest character code). Then use our standard string hashing function, choosing the multiplier randomly between 1 and p − 1 and returning an intermediate hash value between 0 and p − 1, inclusive. Finally, apply a universal hash function to generate the final hash value between 0 and M − 1.
  • Our last topic in this chapter deals with the case where the amount of data is too large to fit in main memory. As we saw in Chapter 4, the main consideration then is the number of disk accesses required to retrieve data.
  • If either probing hashing or separate chaining hashing is used, the major problem is that collisions could cause several blocks to be examined during a search, even for a well-distributed hash table. Furthermore, when the table gets too full, an extremely expensive rehashing step must be performed, which requires O(N) disk accesses.
  • A clever alternative, known as extendible hashing, allows a search to be performed in two disk accesses. Insertions also require few disk accesses.
  • We recall from Chapter 4 that a B-tree has depth O(log N). As M increases, the depth of a B-tree decreases. We could in theory choose M to be so large that the depth of the B-tree would be 1. Then any search after the first would take one disk access, since, presumably, the root node could be stored in main memory. The problem with this strategy is that the branching factor is so high that it would take considerable processing to determine which leaf the data were in. If the time to perform this step could be reduced, then we would have a practical scheme. This is exactly the strategy used by extendible hashing.