{ Handling Collisions. }

Objectives

By the end of this chapter you should be able to:

  • Understand what a hashing collision is
  • Define separate chaining and understand how to implement it
  • Define linear probing and understand how to implement it

Collisions

Now that we have an idea of how hash tables work, more specifically how hashing functions work, lets imagine we reach a point where our hashing function produces the same index for two distinct keys. This can cause serious problems for our hash table, because if there is already a value for that key, we do not want to overwrite it. So how can we manage this issue?

While there are quite a few other ways to manage collisions, we will start by examining these two potential solutions: separate chaining and linear probing.

Before moving on, watch this video for a more in-depth introduction.

Separate Chaining

The first solution to handling collisions involves storing multiple pieces of data at each index. This can be done using linked lists, balanced binary search trees or, even another entire hash table! The algorithm that we would have to implement for separate chaining to work look something like this (assuming we're using linked lists):

  • Create a large array of prime length,
  • To add a key-value pair, hash the key using the hash function to obtain an index.
  • If there is no data in the array at that index, create a new linked list and put the key-value pair inside the link list's head,
  • If there is already a non-empty linked list in the array at that index (i.e. if there's a collision), insert the key-value pair at the next node in the linked list.

You can read more about separate chaining here.

Linear Probing

The second solution to handling collisions involves a form of what is called "open addressing," which is the idea that when new key has to be inserted and there is a collision, the algorithm finds another open slot to place that key. Linear probing searches the hash table for the closest free location following the collision and inserts the new key there. Lookups are performed in the same way, by searching the table sequentially starting at the position given by the hash function, until finding a cell with a matching key or an empty cell. The algorithm that we would have to implement for linear probing to work would be

  • Create a large array of prime length,
  • To add a key-value pair, hash the key using the hash function to obtain an index.
  • If there is no data in the array at that index, insert the data into the array
  • If there is a collision, move to the next index in the hash table and if there is nothing there, place the key and value
  • Otherwise, keep iterating through the hash table to find an empty spot and then place the key and value there

You can read more about linear probing here.

Note that both of these approaches emphasize the need for a hashing function which tends to distribute indices uniformly across the array. For example, if our hashing function always returns the index 0, the separate chaining technique reduces to just creating a linked list, and our linear probing technique just reduces to creating an array. In either case, we're losing many of the advantages of using a hash table in the first place!

When you're ready, move on to Hash Tables Exercises

Continue