By the end of this chapter you should be able to:
Now that we have an idea of how hash tables work, more specifically how hashing functions work, lets imagine we reach a point where our hashing function produces the same index for two distinct keys. This can cause serious problems for our hash table, because if there is already a value for that key, we do not want to overwrite it. So how can we manage this issue?
While there are quite a few other ways to manage collisions, we will start by examining these two potential solutions: separate chaining and linear probing.
Before moving on, watch this video for a more in-depth introduction.
The first solution to handling collisions involves storing multiple pieces of data at each index. This can be done using linked lists, balanced binary search trees or, even another entire hash table! The algorithm that we would have to implement for separate chaining to work look something like this (assuming we're using linked lists):
You can read more about separate chaining here.
The second solution to handling collisions involves a form of what is called "open addressing," which is the idea that when new key has to be inserted and there is a collision, the algorithm finds another open slot to place that key. Linear probing searches the hash table for the closest free location following the collision and inserts the new key there. Lookups are performed in the same way, by searching the table sequentially starting at the position given by the hash function, until finding a cell with a matching key or an empty cell. The algorithm that we would have to implement for linear probing to work would be
You can read more about linear probing here.
Note that both of these approaches emphasize the need for a hashing function which tends to distribute indices uniformly across the array. For example, if our hashing function always returns the index 0, the separate chaining technique reduces to just creating a linked list, and our linear probing technique just reduces to creating an array. In either case, we're losing many of the advantages of using a hash table in the first place!
When you're ready, move on to Hash Tables Exercises