By the end of this chapter, you should be able to:
So far, all of our sql table examples have used a single table. However, one of the most important features of relational databases is the ability to associate (relate) one table to another. The first thing we need to understand in order to create these associations of data is a foreign key constraint.
A foreign key is a constraint placed on a column. It associates the data in one column to a column in a different table. Let's look at an example:
CREATE TABLE people (id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT); CREATE TABLE interests (id SERIAL PRIMARY KEY, interest TEXT, people_id INTEGER REFERENCES people);
In our table definition, we have a people table and we have an interests table. Using the foreign key constraint in the interest table we have associated the
people_id column with the
id column from the
Let's look at the two tables visually with some sample data:
In our sample data, the values in the
people_id column reference the
id from the people table. Take the row form the
people table with the
id of 3 as an example. We can see that id 3, Grace Hopper, is associated with 2 rows in the interests table because the
people_id matches the id of 3. The two interests are software and debugging.
But why is a foreign key considered a constraint? Well the database actually ensures that the data in your foreign key column exists in the table that the foreign key references. Try inserting into the interests table with data that does not exist in the people table:
INSERT INTO interests (interest, people_id) VALUES ('scuba diving', 30);
After you run the insert statement, you should see an error like this:
ERROR: insert or update on table "interests" violates foreign key constraint
In our foreign key example above, we actually setup a one-to-many (1:M) association. The way we structured the database implies that one person can have many interests, and that each interest belongs to exactly one person.
One-to-many associations are extremely common in database and they association usually implies that one row from a table owns or has rows from another table.
CREATE TABLE users (id SERIAL PRIMARY KEY, email TEXT, password TEXT); CREATE TABLE photos (id SERIAL PRIMARY KEY, url TEXT, user_id INTEGER REFERENCES people);
In the table above, each user in the users table, can have many photos and one photo from the photos table belongs to exactly one user based on the value of
CREATE TABLE states (id SERIAL PRIMARY KEY, name TEXT); CREATE TABLE cities (id SERIAL PRIMARY KEY, name TEXT, state_id INTEGER REFERENCES states);
In this example, a state has many cities and a city belongs to a state. So this is a one to many relationship as well.
Often your database needs to model a more complex association in which an item from one table can associate to many rows in another table and vice versa. This is a many to many (M:N) relationship.
Many-to-Many in SQL
Let's create an association between students and courses in SQL. In order to make this association work, we will need a third table that maps a foreign key from courses to a foreign key from students. We will call that third table
enrollments in this case:
CREATE TABLE students (id SERIAL PRIMARY KEY, name TEXT); CREATE TABLE courses (id SERIAL PRIMARY KEY, name TEXT, teacher_name TEXT); CREATE TABLE enrollments (id SERIAL PRIMARY KEY, student_id INTEGER REFERENCES students, course_id INTEGER REFERENCES courses);
In the students and courses example, one student ID can be associated with one course ID by a single row in the
enrollments table. If that student is in another course, there will be another row in the
enrollments table that associates the same student ID to another course ID.
Let's look at some sample data for the example:
|4||Computer Science||Miss. Flurry|
So based on this data, Billy Jean King is taking P.E. and Biology, Dawn Riley is only taking P.E. and Grace Hopper is taking Computer Science, Biology, and Physics.
The same foreign key constraints apply to the enrollments table. If a
student_id does not exist in the
students table, then the data will not be allowed to be inserted into the
enrollments table. Likewise, if a
course_id does not exist in the
courses table, the data will not be allowed to be inserted into the
A query with a join combines data from two different tables. Typically, the join combines data based on a matching row.
For example, if we have a table like the following:
If we were to do a join on the
id column from the
people table with the
people_id column from the
interests table, our results would be:
people joined with interests on
Notice that we get two rows with the person Grace Hopper because in this data set, Grace Hopper has two matches in the
interests table, therefore, the match is represented with two different rows in the resulting join.
Let's start by creating a database and filling it with our sample data. Make sure to delete the database and recreate it if you have made it already. It is important that the
ids from people start counting from 1.
CREATE TABLE people (id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT); CREATE TABLE interests (id SERIAL PRIMARY KEY, interest TEXT, people_id INTEGER REFERENCES people); INSERT INTO people (first_name, last_name) VALUES ('Billy Jean', 'King'), ('Dawn', 'Riley'), ('Grace', 'Hopper'); INSERT INTO interests (interest, people_id) VALUES ('sailing', 2), ('tennis', 1), ('software', 3), ('debugging', 3);
Now that we have some data, let's do our first join:
SELECT * FROM people INNER JOIN interests ON people.id=interests.people_id;
Your output for the query should be the same as our example tables from above. So what are we doing? We are combining all rows from people with all possible rows from interests where the
people.id is equals to the
interests.people_id. In our example, the
INNER part of our join is not necessary. You could also write the join this way:
SELECT * FROM people JOIN interests ON people.id=interests.people_id;
Even thought the
INNER keyword is not necessary, it helps us remember what kind of join we are doing. Next we'll see other types of joins
So far we have looked a data set where all data from the first table has at least 1 match in the second table, and vice versa. Sometimes that is not the case though. Consider the following data set (if you are following along, you should drop the previous tables and create a new ones):
CREATE TABLE people (id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT); CREATE TABLE interests (id SERIAL PRIMARY KEY, interest TEXT, people_id INTEGER REFERENCES people); INSERT INTO people (first_name, last_name) VALUES ('Billy Jean', 'King'), ('Dawn', 'Riley'), ('Grace', 'Hopper'), ('Moxie', 'Garcia'); INSERT INTO interests (interest, people_id) VALUES ('sailing', 2), ('tennis', 1), ('software', 3), ('debugging', 3), ('snow boarding',null), ('ham radio',null);
In this data set, Moxie Garcia does not have any interests. Also, none of the people are interested in snow boarding or ham radios.
When we do an inner join, the results are only rows where a person matches an interest. But sometimes we may want to include people without interests, or only find people with no interests. We can solve this problem with left or right joins.
Here is how we would answer the following questions:
SELECT * FROM people LEFT JOIN interests ON people.id=interests.people_id;
Here are the results:
LEFT JOIN gets us all the rows from the left (the left in this case is the
people table) that have a match in the interests table as well as all the people that do not have a match.
We could get the same results by doing a right join. In this case, a right join gets all the matches from the right (the people table) plus all of the people without any interest matches.
SELECT * FROM interests RIGHT JOIN people ON people.id=interests.people_id;
We can also add the keyword
OUTER to remind ourselves that this is an outer join. Just like
INNER, this is an optional keyword that you can add, but it doesn't change the query:
SELECT * FROM interests RIGHT OUTER JOIN people ON people.id=interests.people_id;
SELECT * FROM people LEFT JOIN interests ON people.id=interests.people_id WHERE interests.id IS NULL;
In this query, we first get all of the people and all of their interests (including people with no interests), then we limit that result to only the rows without an interest id. In other words, the results where people do not have a matching interest. Your output should be:
SELECT * FROM people FULL JOIN interests ON people.id=interests.people_id;
Here are the results:
So with our
FULL JOIN we are getting all matching combinations of people and interests, plus interests that do not have any matching people and people that do not have any matching interests.
Here is a useful diagram from C.L. Moffet's article that illustrates the common joins which we just covered:
A cross join is not the most useful join, but it may come in handy once and a while. It gives you all possible combinations of rows from the first table and rows from the second table. Here is the query:
SELECT * FROM people CROSS JOIN interests;
And this is the output:
|1||Billy Jean||King||5||snow boarding|
|1||Billy Jean||King||6||ham radio|
A self join is actually just doing a normal join. The big difference is that the two tables that you are joining are the same. The classic example is an employee table with an ID for a boss. The
boss_id in the table references another employee in the same table:
Here is the sql for the table:
CREATE table employees (id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT, boss_id INTEGER); INSERT INTO employees (first_name, last_name, boss_id) VALUES ('Billy Jean', 'King', 3), ('Dawn', 'Riley', 3), ('Grace', 'Hopper', 4), ('Ada', 'Lovelace', NULL), ('Emmy', 'Noether', 4), ('Marie', 'Curie', 5);
SELECT e.id, e.first_name, e.last_name, e.boss_id, b.first_name as b_first_name, b.last_name as b_last_name FROM employees e LEFT JOIN employees b ON e.boss_id=b.id;
When you're ready, move on to Normalization