Introduction
This tutorial teaches the fundamentals of SQL (Structured Query Language) by using Python to build applications that interact with a relational database (PostgreSQL).
This blog post is Part 3 of the "Learn SQL with Python" series:
- SQL Fundamentals
- Filtering, Sorting, and Grouping
- Joins and Set Operators
The source code for this tutorial can be found on GitLab: https://gitlab.com/patkennedy79/learn_sql. There are separate folders for each of the 3 parts of this tutorial.
Project Structure
In this tutorial, a PostgreSQL database is being used for storing data about a farmer's market that sells vegetables, fruits, and herbs.
Here are the files used in Part 3 (Joins and Set Operators) of this tutorial:
- 'database.py' - contains the
Database
andDatabaseContextManager
classes - 'models.py' - contains the
Products
,Suppliers
, andLocations
classes - 'part3_locations_data.csv' - contains the data to be loaded into the 'locations' table
- 'part3_products_data.csv' - contains the data to be loaded into the 'products' table
- 'part3_suppliers_data.csv' - contains the data to be loaded into the 'suppliers' table
- 'part3.py' - application created in this tutorial (Part 3)
Database Relationships
In relational databases, different types of relationships can exist between tables. The most common types are:
One-to-One Relationship
In a one-to-one relationship, each record in the first table is related to only one record in the second table, and vice versa. This type of relationship is not very common because it often makes sense to combine the tables into a single table.
One-to-Many Relationship
In a one-to-many relationship, each record in the first table can be related to multiple records in the second table, but each record in the second table is related to only one record in the first table. One-to-many relationships are the most common relationship.
In this tutorial, a suppliers
table will be added to the database to indicate where each product in the products
table comes from. This relationship will have one supplier linked to many products. Each product will
be from a single supplier, but each supplier will provide many products.
Many-to-Many Relationship
In a many-to-many relationship, each record in the first table can be related to multiple records in the second table, and vice versa.
To implement a many-to-many relationship, a junction table (also called a linking or associative table) is used which contains foreign key references to the primary keys of the two related tables.
Database Tables
In part 3 of this SQL tutorial, there will be three tables in the PostgreSQL database:
A great resource for creating SQL diagrams is: https://drawsql.app/
Here's a summary of each table:
products
- list of all the products being sold at all locationssuppliers
- list of suppliers that provide productslocations
- list of locations (i.e. farmer's market stands) where products are sold to customers
In Part 1 (SQL Fundamentals), the CREATE TABLE
command was introduced for creating a new table in a database:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
....
);
The CREATE TABLE
command can be used to create the 'suppliers' and 'locations' tables:
CREATE TABLE suppliers(
s_id SERIAL PRIMARY KEY,
s_name VARCHAR(255) NOT NULL,
s_city VARCHAR(100),
s_state VARCHAR(50)
);
CREATE TABLE locations(
l_id SERIAL PRIMARY KEY,
l_name VARCHAR(255) NOT NULL,
l_street VARCHAR(100),
l_city VARCHAR(100),
l_state VARCHAR(50)
);
The primary key of a table is used to uniquely identify each record (i.e. row) in the table. For example, the 's_id' column in the 'suppliers' table will be an incrementing integer that is unique for each row.
The 'products' table can be created using:
CREATE TABLE products(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
quantity INT,
type VARCHAR(50),
price FLOAT
);
However, this code doesn't define the relationships with the 'suppliers' and the 'locations' tables. The 'products' table needs columns for identifying the supplier and location for each row:
CREATE TABLE products(
...
supplier_id INT,
location_id INT
};
For the relationship between the 'products' and the 'suppliers' table, the 'products' table contains a column for the primary key of the 'suppliers' table. To establish this relationship, a constraint needs to be added to the 'products' table to create a link (known as a "foreign key"):
CREATE TABLE products(
...
supplier_id INT,
...
CONSTRAINT fk_supplier FOREIGN KEY(supplier_id) REFERENCES suppliers(s_id) ON DELETE SET NULL,
...
);
The foreign key links the 'products.supplier_id' column to the 'suppliers.s_id' column:
The table containing the primary key ('suppliers') is the parent table and the table containing the foreign key ('products') is the child table.
For completeness, here is the full SQL command to create the 'products' table:
CREATE TABLE products(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
quantity INT,
type VARCHAR(50),
price FLOAT,
supplier_id INT,
location_id INT,
CONSTRAINT fk_supplier FOREIGN KEY(supplier_id) REFERENCES suppliers(s_id) ON DELETE SET NULL,
CONSTRAINT fk_location FOREIGN KEY(location_id) REFERENCES locations(l_id) ON DELETE SET NULL
);
The 'products' table contains a column 'fk_supplier' (fk == foreign key) that links to the 's_id' column in the 'products' table:
CONSTRAINT fk_supplier FOREIGN KEY(supplier_id) REFERENCES suppliers(s_id) ON DELETE SET NULL,
The end of this line (ON DELETE
) specifies what should happen when a record in the parent table (i.e. the
'suppliers' table) is deleted. By using ON DELETE SET NULL
, the 'products.supplier_id' field will be set to NULL if
the linked record in the parent table (i.e. the 'suppliers' table) is deleted.
Another common option is ON DELETE CASCADE
, which means that when a record is deleted from the parent table (i.e. the
'suppliers' table), any related records in the child table (i.e. the 'products' table) will be deleted. This approach
is much more impactful to the database, as deleting a record in a parent table can cause multiple records in a child
table to be deleted.
One additional note about creating these tables... the order that these tables are created in is critical, as the foreign keys created in the 'products' table need to link to valid primary keys in the 'suppliers' and 'locations' tables. Therefore, the 'suppliers' and 'locations' tables need to be created before the 'products' table to allow the foreign key constraints to be valid.
SQL Joins
A JOIN operation in SQL combines rows from two or more tables based on a related column. The purpose of joining tables is to retrieve data that spans multiple tables, enabling more complex and meaningful queries.
There are (4) primary types of joins in SQL:
- INNER JOIN (default)
- LEFT JOIN
- RIGHT JOIN
- FULL OUTER JOIN
The syntax for joining two tables in SQL is:
SELECT *
FROM left_table
JOIN right_table
ON left_table.foreign_id = right_table.id;
In the following examples for the (4) primary types of JOINs, the products
and suppliers
tables will be used. The
products
table contains 10 products, but some of them do not have suppliers specified yet:
+----+--------------+----------+-----------+-------+-------------+-------------+
| id | name | quantity | type | price | supplier_id | location_id |
+----+--------------+----------+-----------+-------+-------------+-------------+
| 1 | banana | 30 | fruit | 0.79 | 1 | 3 |
| 2 | pumpkin | 12 | vegetable | 8.49 | 4 | 1 |
| 3 | corn | 85 | vegetable | 0.89 | None | 2 |
| 4 | broccoli | 22 | vegetable | 2.99 | 3 | 2 |
| 5 | orange | 87 | fruit | 0.75 | 5 | 1 |
| 6 | strawberries | 28 | fruit | 5.49 | 4 | 3 |
| 7 | grapes | 35 | fruit | 4.79 | 5 | 3 |
| 8 | pineapple | 12 | fruit | 4.59 | None | 3 |
| 9 | carrot | 43 | vegetable | 0.39 | None | 1 |
| 10 | tomato | 17 | vegetable | 1.56 | 6 | 1 |
+----+--------------+----------+-----------+-------+-------------+-------------+
The suppliers
table contains 8 suppliers:
+------+----------------------+---------------+------------+
| s_id | s_name | s_city | s_state |
+------+----------------------+---------------+------------+
| 1 | Woodland Ranch | Santa Barbara | California |
| 2 | Organic Village Farm | Coalinga | California |
| 3 | Murray Farm | Santa Barbara | California |
| 4 | Berry Farm | Eugene | Oregon |
| 5 | Valley View Farm | Auburn | Washington |
| 6 | Butler Apple Orchard | Tacoma | Washington |
| 7 | Winter Green Farm | Medford | Oregon |
| 8 | Santa Barbara Farm | Santa Barbara | California |
+------+----------------------+---------------+------------+
There are several suppliers (s_id: 2, 7, and 8) listed that do not supply any products (to the products
table).
INNER JOIN
An INNER JOIN returns only the rows where there is a match in both tables based on the specified condition.
NOTE: INNER JOIN is the default JOIN type in SQL, so only
JOIN
needs to be utilized:
An INNER JOIN of the products
table and the suppliers
table is written as:
db.execute_sql_query("""
SELECT products.id, products.name, products.supplier_id, suppliers.s_id, suppliers.s_name
FROM products
JOIN suppliers
ON products.supplier_id = suppliers.s_id;
""")
Output:
+----+--------------+-------------+------+----------------------+
| id | name | supplier_id | s_id | s_name |
+----+--------------+-------------+------+----------------------+
| 1 | banana | 1 | 1 | Woodland Ranch |
| 2 | pumpkin | 4 | 4 | Berry Farm |
| 4 | broccoli | 3 | 3 | Murray Farm |
| 5 | orange | 5 | 5 | Valley View Farm |
| 6 | strawberries | 4 | 4 | Berry Farm |
| 7 | grapes | 5 | 5 | Valley View Farm |
| 10 | tomato | 6 | 6 | Butler Apple Orchard |
+----+--------------+-------------+------+----------------------+
The INNER JOIN results in a table with the columns from both tables. However, not all 10 of the products in the
products
table are listed, as the INNER JOIN returns only the rows that have a match on the specified condition
(ON products.supplier_id = suppliers.s_id
). The missing products do not have supplier IDs specified:
+----+--------------+----------+-----------+-------+-------------+-------------+
| id | name | quantity | type | price | supplier_id | location_id |
+----+--------------+----------+-----------+-------+-------------+-------------+
| 3 | corn | 85 | vegetable | 0.89 | None | 2 |
| 8 | pineapple | 12 | fruit | 4.59 | None | 3 |
| 9 | carrot | 43 | vegetable | 0.39 | None | 1 |
+----+--------------+----------+-----------+-------+-------------+-------------+
***********
LEFT JOIN
A LEFT JOIN returns all rows from the left table and the matched rows from the right table. If there is no match in the right table, NULL values are returned for columns from the right table.
A LEFT JOIN of the products
table and the suppliers
table is written as:
db.execute_sql_query("""
SELECT products.id, products.name, products.supplier_id, suppliers.s_id, suppliers.s_name
FROM products
LEFT JOIN suppliers
ON products.supplier_id = suppliers.s_id;
""")
Output:
+----+--------------+-------------+------+----------------------+
| id | name | supplier_id | s_id | s_name |
+----+--------------+-------------+------+----------------------+
| 1 | banana | 1 | 1 | Woodland Ranch |
| 2 | pumpkin | 4 | 4 | Berry Farm |
| 3 | corn | None | None | None |
| 4 | broccoli | 3 | 3 | Murray Farm |
| 5 | orange | 5 | 5 | Valley View Farm |
| 6 | strawberries | 4 | 4 | Berry Farm |
| 7 | grapes | 5 | 5 | Valley View Farm |
| 8 | pineapple | None | None | None |
| 9 | carrot | None | None | None |
| 10 | tomato | 6 | 6 | Butler Apple Orchard |
+----+--------------+-------------+------+----------------------+
The LEFT JOIN returns all the rows in the left table (products
) and the rows that do not have a match to the right
table (suppliers
) are filled in with NULL (None
in Python) values. For example, the 'carrot' product does not have
a supplier specified, so all the supplier fields are None
.
RIGHT JOIN
A RIGHT JOIN returns all rows from the right table and the matched rows from the left table. If there is no match in the left table, NULL values are returned for columns from the left table.
NOTE: RIGHT JOINs are not very common, as it's easy to switch the left and right tables to make a LEFT JOIN. LEFT JOINs are more common, as most people find them to be more intuitive.
A RIGHT JOIN of the products
table and the suppliers
table is written as:
db.execute_sql_query("""
SELECT products.id, products.name, products.supplier_id, suppliers.s_id, suppliers.s_name
FROM products
RIGHT JOIN suppliers
ON products.supplier_id = suppliers.s_id;
""")
Output:
+------+--------------+-------------+------+----------------------+
| id | name | supplier_id | s_id | s_name |
+------+--------------+-------------+------+----------------------+
| 1 | banana | 1 | 1 | Woodland Ranch |
| 2 | pumpkin | 4 | 4 | Berry Farm |
| 4 | broccoli | 3 | 3 | Murray Farm |
| 5 | orange | 5 | 5 | Valley View Farm |
| 6 | strawberries | 4 | 4 | Berry Farm |
| 7 | grapes | 5 | 5 | Valley View Farm |
| 10 | tomato | 6 | 6 | Butler Apple Orchard |
| None | None | None | 2 | Organic Village Farm |
| None | None | None | 8 | Santa Barbara Farm |
| None | None | None | 7 | Winter Green Farm |
+------+--------------+-------------+------+----------------------+
The RIGHT JOIN returns all the rows in the right table (suppliers
) and the rows that do not have a match to the left
table (products
) are filled in with NULL (None
in Python) values. For example, the 'Winter Green Farm' supplier
does not supply any products, so all the product fields are None
.
FULL OUTER JOIN
A FULL OUTER JOIN returns all rows when there is a match in either the left or right table. If there is no match, NULL values are returned for columns from the table without a match. A FULL OUTER JOIN will return the full set of rows from both tables.
A FULL OUTER JOIN of the products
table and the suppliers
table is written as:
db.execute_sql_query("""
SELECT *
FROM products
FULL OUTER JOIN suppliers
ON products.supplier_id = suppliers.s_id;
""")
Output:
+------+--------------+-------------+------+----------------------+
| id | name | supplier_id | s_id | s_name |
+------+--------------+-------------+------+----------------------+
| 1 | banana | 1 | 1 | Woodland Ranch |
| 2 | pumpkin | 4 | 4 | Berry Farm |
| 3 | corn | None | None | None |
| 4 | broccoli | 3 | 3 | Murray Farm |
| 5 | orange | 5 | 5 | Valley View Farm |
| 6 | strawberries | 4 | 4 | Berry Farm |
| 7 | grapes | 5 | 5 | Valley View Farm |
| 8 | pineapple | None | None | None |
| 9 | carrot | None | None | None |
| 10 | tomato | 6 | 6 | Butler Apple Orchard |
| None | None | None | 2 | Organic Village Farm |
| None | None | None | 8 | Santa Barbara Farm |
| None | None | None | 7 | Winter Green Farm |
+------+--------------+-------------+------+----------------------+
The FULL OUTER JOIN returns all the rows in the left table (products
) and the right table (suppliers
). Any fields
that are not populated in the database are filled in with NULL (None
in Python) values.
Join Examples
Let's use these different JOIN operations to learn about the data in the database!
To print all the products that have suppliers specified:
# Print all the products with suppliers
db.execute_sql_query("""
SELECT products.id, products.name, products.price, suppliers.s_name AS supplier_name
FROM products
JOIN suppliers
ON products.supplier_id = suppliers.s_id;
""")
Output:
+----+--------------+-------+----------------------+
| id | name | price | supplier_name |
+----+--------------+-------+----------------------+
| 1 | banana | 0.79 | Woodland Ranch |
| 2 | pumpkin | 8.49 | Berry Farm |
| 4 | broccoli | 2.99 | Murray Farm |
| 5 | orange | 0.75 | Valley View Farm |
| 6 | strawberries | 5.49 | Berry Farm |
| 7 | grapes | 4.79 | Valley View Farm |
| 10 | tomato | 1.56 | Butler Apple Orchard |
+----+--------------+-------+----------------------+
To print all the products that are sold by the farmer's market stand on Main Street:
# Print all the products sold at the farmer's market stand on Main Street
db.execute_sql_query("""
SELECT locations.l_name AS location_name, locations.l_street, locations.l_city, products.name AS product_name, products.price, products.quantity
FROM locations
JOIN products
ON locations.l_id = products.location_id
WHERE locations.l_name LIKE 'Main%';
""")
Output:
+---------------+-------------+---------------+--------------+-------+----------+
| location_name | l_street | l_city | product_name | price | quantity |
+---------------+-------------+---------------+--------------+-------+----------+
| Main Stand | Main Street | Santa Barbara | pumpkin | 8.49 | 12 |
| Main Stand | Main Street | Santa Barbara | orange | 0.75 | 87 |
| Main Stand | Main Street | Santa Barbara | carrot | 0.39 | 43 |
| Main Stand | Main Street | Santa Barbara | tomato | 1.56 | 17 |
+---------------+-------------+---------------+--------------+-------+----------+
To print all the fruits and where they are supplied from, but including the fruits that don't have a supplier, a LEFT JOIN is the best option (an INNER JOIN would not include the fruit without a supplier):
# Print all the fruits including where they are supplied from
db.execute_sql_query("""
SELECT products.id, products.name AS products_name, suppliers.s_name AS supplier_name, suppliers.s_city, suppliers.s_state
FROM products
LEFT JOIN suppliers
ON products.supplier_id = suppliers.s_id
WHERE products.type = 'fruit';
""")
Output:
+----+---------------+------------------+---------------+------------+
| id | products_name | supplier_name | s_city | s_state |
+----+---------------+------------------+---------------+------------+
| 1 | banana | Woodland Ranch | Santa Barbara | California |
| 5 | orange | Valley View Farm | Auburn | Washington |
| 6 | strawberries | Berry Farm | Eugene | Oregon |
| 7 | grapes | Valley View Farm | Auburn | Washington |
| 8 | pineapple | None | None | None |
+----+---------------+------------------+---------------+------------+
To print all the products supplied by the Berry Farm:
# Print all the products supplied by the Berry Farm
db.execute_sql_query("""
SELECT products.id, products.name AS products_from_berry_farm, products.price, products.quantity, suppliers.s_name AS supplier_name
FROM products
JOIN suppliers
ON products.supplier_id = suppliers.s_id
WHERE suppliers.s_name = 'Berry Farm'
""")
Output:
+----+--------------------------+-------+----------+---------------+
| id | products_from_berry_farm | price | quantity | supplier_name |
+----+--------------------------+-------+----------+---------------+
| 2 | pumpkin | 8.49 | 12 | Berry Farm |
| 6 | strawberries | 5.49 | 28 | Berry Farm |
+----+--------------------------+-------+----------+---------------+
Set Operators
Set operators allow you to combine the results of two or more SELECT statements. Set operators operate on the entire result sets produced by the SELECT statements rather than on individual rows.
For all set operations, the number of selected columns and their respective data types must be identical.
The primary set operators in SQL are:
- UNION
- UNION ALL
- INTERSECT
- EXCEPT
UNION
The UNION operator combines the results of two or more SELECT statements into a single result, but with duplicates not included in the result:
SELECT list_of_columns
FROM left_table
UNION
SELECT list_of_columns
FROM right_table;
The number, order, and data type of the columns from the two queries being combined must be identical.
Before looking at an example, here's a refresher on the contents of the locations
table:
+------+----------------------+-------------+----------------+------------+
| l_id | l_name | l_street | l_city | l_state |
+------+----------------------+-------------+----------------+------------+
| 1 | Main Stand | Main Street | Santa Barbara | California |
| 2 | Mission Canyon Stand | 2nd Street | Mission Canyon | California |
| 3 | Zoo Stand | Zoo Street | Santa Barbara | California |
| 4 | Beach Stand | Cliff Drive | Santa Barbara | California |
+------+----------------------+-------------+----------------+------------+
And the suppliers
tables:
+------+----------------------+---------------+------------+
| s_id | s_name | s_city | s_state |
+------+----------------------+---------------+------------+
| 1 | Woodland Ranch | Santa Barbara | California |
| 2 | Organic Village Farm | Coalinga | California |
| 3 | Murray Farm | Santa Barbara | California |
| 4 | Berry Farm | Eugene | Oregon |
| 5 | Valley View Farm | Auburn | Washington |
| 6 | Butler Apple Orchard | Tacoma | Washington |
| 7 | Winter Green Farm | Medford | Oregon |
| 8 | Santa Barbara Farm | Santa Barbara | California |
+------+----------------------+---------------+------------+
Let's look at an example by printing the records (based on city and state) that are in both the locations
and
suppliers
tables:
db.execute_sql_query("""
SELECT locations.l_city AS city, locations.l_state AS state
FROM locations
UNION
SELECT suppliers.s_city, suppliers.s_state
FROM suppliers;
""")
Output:
+----------------+------------+
| city | state |
+----------------+------------+
| Medford | Oregon |
| Mission Canyon | California |
| Santa Barbara | California |
| Coalinga | California |
| Eugene | Oregon |
| Auburn | Washington |
| Tacoma | Washington |
+----------------+------------+
This query returns all the city/state combinations in the locations
or suppliers
tables, but note that any
duplicate values are not included.
UNION ALL
The UNION ALL operator keeps the duplicate values that UNION does not:
SELECT list_of_columns
FROM left_table
UNION ALL
SELECT list_of_columns
FROM right_table;
Let's re-run the previous query (to print all the locations and suppliers based on city/state) but this time keep all the duplicate entries by using UNION ALL:
db.execute_sql_query("""
SELECT locations.l_city AS city, locations.l_state AS state
FROM locations
UNION ALL
SELECT suppliers.s_city, suppliers.s_state
FROM suppliers;
""")
Output:
+----------------+------------+
| city | state |
+----------------+------------+
| Santa Barbara | California |
| Mission Canyon | California |
| Santa Barbara | California |
| Santa Barbara | California |
| Santa Barbara | California |
| Coalinga | California |
| Santa Barbara | California |
| Eugene | Oregon |
| Auburn | Washington |
| Tacoma | Washington |
| Medford | Oregon |
| Santa Barbara | California |
+----------------+------------+
It's nice to see all the values, including the duplicates, but this data is hard to interpret. Let's add a new column
to the query results to show whether the row is from the locations
or suppliers
table:
db.execute_sql_query("""
SELECT locations.l_city AS city, locations.l_state AS state, 'Location' as type
FROM locations
UNION ALL
SELECT suppliers.s_city, suppliers.s_state, 'Supplier'
FROM suppliers;
""")
Output:
+----------------+------------+----------+
| city | state | type |
+----------------+------------+----------+
| Santa Barbara | California | Location |
| Mission Canyon | California | Location |
| Santa Barbara | California | Location |
| Santa Barbara | California | Location |
| Santa Barbara | California | Supplier |
| Coalinga | California | Supplier |
| Santa Barbara | California | Supplier |
| Eugene | Oregon | Supplier |
| Auburn | Washington | Supplier |
| Tacoma | Washington | Supplier |
| Medford | Oregon | Supplier |
| Santa Barbara | California | Supplier |
+----------------+------------+----------+
A new column can be added to the query results by manually including it in the SQL query (Location
and Supplier
):
SELECT locations.l_city AS city, locations.l_state AS state, 'Location' as type
...
SELECT suppliers.s_city, suppliers.s_state, 'Supplier'
Another way to distinguish between the records from the locations
and suppliers
table is to print the ID of each
record and prepend either l
(location) or s
(supplier) to the ID using CONCAT():
db.execute_sql_query("""
SELECT CONCAT('l', locations.l_id) AS id, locations.l_city AS city, locations.l_state AS state
FROM locations
UNION ALL
SELECT CONCAT('s', suppliers.s_id), suppliers.s_city, suppliers.s_state
FROM suppliers;
""")
Output:
+----+----------------+------------+
| id | city | state |
+----+----------------+------------+
| l1 | Santa Barbara | California |
| l2 | Mission Canyon | California |
| l3 | Santa Barbara | California |
| l4 | Santa Barbara | California |
| s1 | Santa Barbara | California |
| s2 | Coalinga | California |
| s3 | Santa Barbara | California |
| s4 | Eugene | Oregon |
| s5 | Auburn | Washington |
| s6 | Tacoma | Washington |
| s7 | Medford | Oregon |
| s8 | Santa Barbara | California |
+----+----------------+------------+
INTERSECT
The INTERSECT operator returns the rows that are in both tables:
SELECT list_of_columns
FROM left_table
INTERSECT
SELECT list_of_columns
FROM right_table;
Let's use INTERSECT to print all the locations (by city and state) that are in both the locations
and suppliers
tables:
db.execute_sql_query("""
SELECT locations.l_city AS city, locations.l_state AS state
FROM locations
INTERSECT
SELECT suppliers.s_city, suppliers.s_state
FROM suppliers;
""")
Output:
+---------------+------------+
| city | state |
+---------------+------------+
| Santa Barbara | California |
+---------------+------------+
The INTERSECT operator shows that there is only one city/state combination that is in both the locations
and
suppliers
tables.
EXCEPT
The EXCEPT operator returns all the rows in the left table, except the ones that are in the right table:
SELECT list_of_columns
FROM left_table
EXCEPT
SELECT list_of_columns
FROM right_table;
Let's print all the locations (by city and state) that are in the locations
table, but not in the suppliers
table:
db.execute_sql_query("""
SELECT locations.l_city AS city, locations.l_state AS state
FROM locations
EXCEPT
SELECT suppliers.s_city, suppliers.s_state
FROM suppliers;
""")
Output:
+----------------+------------+
| city | state |
+----------------+------------+
| Mission Canyon | California |
+----------------+------------+
The EXCEPT operator tells us that Mission Canyon, California, is the only city/state combination in the locations
table
that is not in the suppliers
table.
Sub-Queries
SQL queries can be nested within another query, which are called sub-queries (also referred to as nested queries).
Let's look at an example of when a subquery can be beneficial... say we want to find the products in the products
table that are less than the average price of all the products. The first step is to figure out the average price:
db.execute_sql_query("""
SELECT AVG(price)
FROM products;
""")
Output:
+-------+
| avg |
+-------+
| 3.073 |
+-------+
Now we can use that value to find all the products in the products
table that have a lower price than this average:
db.execute_sql_query("""
SELECT name, price, quantity
FROM products
WHERE price < 3.07;
""")
Output:
+----------+-------+----------+
| name | price | quantity |
+----------+-------+----------+
| banana | 0.79 | 30 |
| corn | 0.89 | 85 |
| broccoli | 2.99 | 22 |
| orange | 0.75 | 87 |
| carrot | 0.39 | 43 |
| tomato | 1.56 | 17 |
+----------+-------+----------+
That approach of two separate SQL queries works, but it's a very manual process that can be simplified by stringing together these two SQL queries:
db.execute_sql_query("""
SELECT name, price, quantity
FROM products
WHERE price < (
SELECT AVG(price)
FROM products
);
""")
The first query (SELECT AVG(price) FROM products
) to find the average price is run first and then that value is used
to print all the products with a price less than this average.
Let's look at another example... if we want to print all the products grown in Oregon and Washington:
db.execute_sql_query("""
SELECT name, price, quantity
FROM products
WHERE supplier_id IN (
SELECT s_id
FROM suppliers
WHERE s_state IN ('Oregon', 'Washington')
);
""")
Output:
+--------------+-------+----------+
| name | price | quantity |
+--------------+-------+----------+
| pumpkin | 8.49 | 12 |
| orange | 0.75 | 87 |
| strawberries | 5.49 | 28 |
| grapes | 4.79 | 35 |
| tomato | 1.56 | 17 |
+--------------+-------+----------+
In this example, the nested query returns a set of IDs, not just a single value, so the outer query needs to use the
IN operator (WHERE supplier_id IN (...nested query...)
.
Summary
Here's a summary of all the SQL statements learned in Part III of this tutorial:
Creating Tables with Relationships
CREATE TABLE
- create a new tableFOREIGN KEY
- create a link between tablesON DELETE SET NULL
- specify that the child records are set to NULL when a linked record is deleted in a parent tableON DELETE CASCADE
- specify that the child records are deleted when a linked record is deleted in a parent table
Joins
JOIN
/INNER JOIN
- returns only the rows where there is a match in bothLEFT JOIN
- returns all rows from the left table and the matched rows from the right tableRIGHT JOIN
- returns all rows from the right table and the matched rows from the left tableFULL OUTER JOIN
- returns all rows when there is a match in either the left or right table
Set Operators
UNION
- combines the results of two or more SELECT statements without duplicatesUNION ALL
- combines the results of two or more SELECT statements with duplicatesINTERSECT
- returns the rows that are in both tablesEXCEPT
- returns all the rows in the left table, except the ones that are in the right table