Introduction
This tutorial teaches the fundamentals of SQL (Structured Query Language) by using Python to build applications that interact with a relational database (PostgreSQL).
This blog post is Part 2 of the "Learn SQL with Python" series:
- SQL Fundamentals
- Filtering, Sorting, and Grouping
- Joins and Set Operators
The source code for this tutorial can be found on GitLab: https://gitlab.com/patkennedy79/learn_sql. There are separate folders for each of the 3 parts of this tutorial.
Project Structure
In this tutorial, a PostgreSQL database is being used for storing data about a farmer's market that sells vegetables, fruits, and herbs.
In Part 1 (SQL Fundamentals), the 'part1.py' script contained the following classes:
Database
- class for interacting with a PostgreSQL databaseDatabaseContextManager
- context manager for interacting with a PostgreSQL database using theDatabase
classProducts
- class for working with the 'products' table in the PostgreSQL database
These classes are being split into separate files to help organize the application. Here are the files used in Part 2 (Filtering, Sorting, and Grouping) of this tutorial:
- 'database.py' - contains the
Database
andDatabaseContextManager
classes - 'models.py' - contains the
Products
class - 'part2_data.csv' - contains the data to be loaded into the PostgreSQL database
- 'part2.py' - application created in this tutorial (Part 2)
Database
The 'database.py' file contains two classes interacting with a PostgreSQL database:
Database
- class for opening/closing a connection to a PostgreSQL database and for executing SQL commandsDatabaseContextManager
- context manager for theDatabase
class to ensure proper opening/closing of the database connection
Here is the source code for the Database
class:
import psycopg2
from prettytable import from_db_cursor
class Database:
"""Class for interacting with a PostgreSQL database."""
def __init__(self, host: str, name: str, user: str, password: str):
"""Initialize the configuration for connecting to a PostgreSQL database."""
self.database_host = host
self.database_name = name
self.database_user = user
self.database_password = password
self.db_connection = None
self.db_cursor = None
def open_database_connection(self):
"""Connect to the PostgreSQL database."""
print(f'Connecting to the PostgreSQL database ({self.database_name})...')
# The `db_connection` object encapsulates a database session. It is used to:
# 1. create a new cursor object
# 2. terminate database transactions using either:
# - commit() - success condition where database updates are saved (persist)
# - rollback() - failure condition where database updates are discarded
self.db_connection = psycopg2.connect(
host=self.database_host,
database=self.database_name,
user=self.database_user,
password=self.database_password
)
# The `db_cursor` object allows the following interactions with the database:
# 1. send SQL commands to the database using `execute()` or `executemany()`
# 2. retrieve data from the database using `fetchone()`, `fetchmany()`, or `fetchall()`
self.db_cursor = self.db_connection.cursor()
def print_database_version(self):
"""Print the database version information."""
self.db_cursor.execute('SELECT version();')
print(f'PostgreSQL database version: {self.db_cursor.fetchone()}')
def close_database_connection(self):
"""Close the database connection."""
print(f'\nClosing connection to the PostgreSQL database ({self.database_name}).')
self.db_cursor.close()
self.db_connection.close()
def commit(self):
"""Commit the changes to the database to make them persistent."""
self.db_connection.commit()
def rollback(self):
"""Discard the changes to the database to revert to the previous state."""
self.db_connection.rollback()
def execute_sql_query(self, query: str):
"""Execute a SQL query and print the result."""
self.db_cursor.execute(query)
table = from_db_cursor(self.db_cursor)
print(table)
The execute_sql_query()
method will be used extensively in this tutorial to execute SQL queries and print their
results in a table format.
The source code for the DatabaseContextManager
is here:
class DatabaseContextManager:
"""Context Manager for interacting with a PostgreSQL database."""
def __init__(self, host: str, name: str, user: str, password: str):
self.database = Database(host, name, user, password)
def __enter__(self):
self.database.open_database_connection()
return self.database
def __exit__(self, exc_type, exc_val, exc_tb):
self.database.commit()
self.database.close_database_connection()
The context manager should be used to ensure that the database connection is opened and closed properly. For example:
with DatabaseContextManager(HOST, NAME, USER, PASSWORD) as db:
db.print_database_version()
Models
The 'models.py' file contains the Products
class, which includes the methods for working with the 'products' table in
the PostgreSQL database. Here is the source code for the 'models.py' file:
from prettytable import from_db_cursor
# ---------------
# Database Tables
# ---------------
class Products:
"""Class for working with the products table in the PostgreSQL database."""
def __init__(self, db_cursor):
self.db_cursor = db_cursor
def create_table_schema(self):
"""Create the schema for the products table."""
# Create the schema (i.e. columns) for the products table
# NOTE: The keyword SERIAL is used in PostgreSQL for an
# auto-incrementing primary key. In other relational
# database systems, this would be "PRIMARY KEY AUTO_INCREMENT".
self.db_cursor.execute("""
CREATE TABLE products(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
quantity INT,
type VARCHAR(50),
price FLOAT
);
""")
def drop_table(self):
"""Drop the products table."""
self.db_cursor.execute('DROP TABLE IF EXISTS products;')
def add_column(self, column_name: str, column_type: str):
"""Add a new column to the products table."""
self.db_cursor.execute(f"""
ALTER TABLE products
ADD {column_name} {column_type};
""")
def delete_column(self, column_name: str):
"""Delete a column from the products table."""
self.db_cursor.execute(f"""
ALTER TABLE products
DROP COLUMN {column_name};
""")
def add_records_from_file(self, filename: str):
"""Add records by reading from a file."""
with open(filename) as f:
lines = [line for line in f]
for line in lines:
elements = line.split(',')
self.add_new_record(elements[0], int(elements[1]), elements[2], float(elements[3]))
def add_new_record(self, name: str, quantity: int, product_type: str, price_per_unit: float):
"""Add a new record to the products table."""
self.db_cursor.execute(f"""
INSERT INTO products(name, quantity, type, price)
VALUES ('{name}', {quantity}, '{product_type}', {price_per_unit});
""")
def change_product_price(self, name: str, new_price_per_unit: float):
"""Change the price per unit of the specified record in the products table."""
self.db_cursor.execute(f"""
UPDATE products
SET price = {new_price_per_unit}
WHERE name = '{name}';
""")
def delete_product(self, index: int):
"""Delete the specified product from in the products table."""
self.db_cursor.execute(f"""
DELETE FROM products
WHERE id = {index};
""")
def print_all_records(self):
"""Print all the records in the products table."""
print('\nproducts:')
self.db_cursor.execute("""
SELECT *
FROM products;
""")
table = from_db_cursor(self.db_cursor)
print(table)
def print_table_schema(self):
"""Print the schema for the products table."""
print('\nproducts table schema:')
self.db_cursor.execute("""
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'products';
""")
table = from_db_cursor(self.db_cursor)
print(table)
def get_product_id(self, name: str) -> int:
"""Return the ID of the specified product in the products table."""
self.db_cursor.execute(f"""
SELECT id
FROM products
WHERE name = '{name}';
""")
index = self.db_cursor.fetchone()
return index[0] # fetchone() returns a tuple, but only the first element is needed
The only method that is new to this class from Part 1 (SQL Fundamentals) is the add_records_from_file
method:
def add_records_from_file(self, filename: str):
"""Add records by reading from a CSV file."""
with open(filename) as f:
lines = [line for line in f]
for line in lines:
elements = line.split(',')
self.add_new_record(elements[0], int(elements[1]), elements[2], float(elements[3]))
This method takes the path to a CSV file, reads each line of the CSV file, and adds the data from that row as a new record in the 'products' table.
For this tutorial, the data for the 'products' table is found in the 'part2/part2.csv' file GitLab link.
Here are the first few lines of the file to show how each row corresponds to a record in the 'products' table:
banana,30,fruit,0.79
pumpkin,12,vegetable,8.49
corn,85,vegetable,0.89
Application
Using the DatabaseContextManager
and Products
classes, the main application that we'll be creating in this tutorial
is created in part2/part2.py
. Here's the initial version of the file to initialize the database:
from database import DatabaseContextManager
from models import Products
# ----------------------
# Database Configuration
# ----------------------
DATABASE_HOST = '127.0.0.1'
DATABASE_NAME = 'learn-sql'
DATABASE_USER = 'postgres'
DATABASE_PASSWORD = 'tomatoes'
# ------------
# Main Program
# ------------
if __name__ == '__main__':
with DatabaseContextManager(DATABASE_HOST, DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD) as db:
db.print_database_version()
# Create the model
products = Products(db.db_cursor)
# Drop all the tables
products.drop_table()
# Create the `products` table schema
products.create_table_schema()
# Read all the data from the input file and add to the database
products.add_records_from_file('part2/part2_data.csv')
products.print_all_records()
Output:
(venv) $ python part2.py
Connecting to the PostgreSQL database (learn-sql)...
PostgreSQL database version: ('PostgreSQL 16.0 ...',)
products:
+----+--------------------+----------+-----------+-------+
| id | name | quantity | type | price |
+----+--------------------+----------+-----------+-------+
| 1 | banana | 30 | fruit | 0.79 |
| 2 | pumpkin | 12 | vegetable | 8.49 |
| 3 | corn | 85 | vegetable | 0.89 |
...
| 42 | mint | 39 | herb | 1.45 |
| 43 | pear | 12 | fruit | 1.09 |
| 44 | oranges | 12 | fruit | 0.75 |
+----+--------------------+----------+-----------+-------+
Closing connection to the PostgreSQL database (learn-sql).
Throughout the rest of Part 2 (Filtering, Sorting, and Grouping), code will be added to the main application section within the context manager.
Let's dive into how to write SQL queries for this data...
Filtering
WHERE
The WHERE
statement is used to filter the results of a query in SQL:
SELECT id
FROM table_name
WHERE column_name = 'specified_column_name';
To see how filtering works, add the following code in part2.py:
# ------------
# Main Program
# ------------
if __name__ == '__main__':
with DatabaseContextManager(DATABASE_HOST, DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD) as db:
...
# Print all the herbs in the `products` table
db.execute_sql_query("""
SELECT *
FROM products
WHERE type = 'herb';
""")
Output:
+----+----------+----------+------+-------+
| id | name | quantity | type | price |
+----+----------+----------+------+-------+
| 38 | basil | 45 | herb | 1.99 |
| 39 | rosemary | 33 | herb | 1.59 |
| 40 | thyme | 28 | herb | 2.35 |
| 41 | cilantro | 47 | herb | 1.49 |
| 42 | mint | 39 | herb | 1.45 |
+----+----------+----------+------+-------+
The following comparison operators are available in SQL:
=
- equal<>
- not equal<
- less than>
- greater than<=
- less than or equal to>=
- greater than or equal to
Another example to print all the records that are not vegetables:
# Print all the products that are not vegetables in the `products` table
db.execute_sql_query("""
SELECT *
FROM products
WHERE type <> 'vegetable';
""")
COUNT
The COUNT
function returns the number of records returned by a SELECT
statement:
SELECT COUNT(*)
FROM table_name
WHERE condition;
For example, to find the number of records in the products
table:
# Find the number of records in the `products` table
db.execute_sql_query("""
SELECT COUNT(*)
FROM products;
""")
Output:
+-------+
| count |
+-------+
| 44 |
+-------+
AS
The AS
keyword can be used to create an alias, which then gets printed as the heading for the query result. To find
the number of vegetables in the products
table with a descriptive heading in the output:
# Find the number of vegetables in the `products` table
db.execute_sql_query("""
SELECT COUNT(*) AS num_of_veggies
FROM products
WHERE type = 'vegetable';
""")
Output:
+----------------+
| num_of_veggies |
+----------------+
| 19 |
+----------------+
DISTINCT
The DISTINCT
keyword is used to remove duplicate records based on a specific column:
SELECT DISTINCT column_name
FROM table_name
WHERE condition;
DISTINCT
can be combined with COUNT
to find the count of unique records based on a specific column:
SELECT COUNT(DISTINCT column_name)
FROM table_name
WHERE condition;
The total number of records that are fruits in the products
table can be found using:
# Find the total number of fruits in the `products` table
db.execute_sql_query("""
SELECT COUNT(*) AS num_of_fruits
FROM products
WHERE type = 'fruit';
""")
However, if we're interested in knowing the number of unique fruits (by name) in the products
table:
# Find the distinct number of fruits in the `products` table
db.execute_sql_query("""
SELECT COUNT(DISTINCT name) AS num_of_distinct_fruit
FROM products
WHERE type = 'fruit';
""")
Output:
+---------------+
| num_of_fruits |
+---------------+
| 20 |
+---------------+
+-----------------------+
| num_of_distinct_fruit |
+-----------------------+
| 18 |
+-----------------------+
There are 2 records that are fruits in the products
database that have duplicate names (pear and orange).
To get a list of all the distinct fruit names in the products
database:
# Print all the distinct fruits in the `products` table
db.execute_sql_query("""
SELECT DISTINCT name
FROM products
WHERE type = 'fruit';
""")
Output (notice that pear and orange are only included once each!):
+--------------+
| name |
+--------------+
| apple |
| avocado |
| banana |
| blackberries |
| blueberries |
| cantaloupe |
| grapes |
| kiwi |
| lemon |
| lime |
| orange |
| peach |
| pear |
| pineapple |
| plum |
| raspberries |
| strawberries |
| watermelon |
+--------------+
LIMIT
The LIMIT
keyword is used to limit the number of records in the query result:
SELECT *
FROM table_name
WHERE condition
LIMIT maximum_number_of_records;
For example, to find the herbs in the products
table, but limit the result to the first 3 records:
# Print all the herbs in the `products` table with a limit of 3 records
db.execute_sql_query("""
SELECT name, quantity, price
FROM products
WHERE type = 'herb'
LIMIT 3;
""")
Output:
+----------+----------+-------+
| name | quantity | price |
+----------+----------+-------+
| basil | 45 | 1.99 |
| rosemary | 33 | 1.59 |
| thyme | 28 | 2.35 |
+----------+----------+-------+
ADD and OR
The AND
and OR
keywords are typically used with WHERE
for more complex conditional statements:
SELECT *
FROM table_name
WHERE conditional1 AND/OR conditional2;
For example, to print all the vegetables that cost more than $2 in the products
table:
# Print all the vegetables that cost more than $2.00
db.execute_sql_query("""
SELECT name, price
FROM products
WHERE type = 'vegetable' AND price > 2.00;
""")
Output:
+-----------+-------+
| name | price |
+-----------+-------+
| pumpkin | 8.49 |
| broccoli | 2.99 |
| eggplant | 3.49 |
| artichoke | 3.58 |
+-----------+-------+
Another example of using OR
to print all the vegetables and herbs in the products
table:
# Print all the vegetables and herbs (Part I)
db.execute_sql_query("""
SELECT *
FROM products
WHERE type = 'vegetable' OR type = 'herb';
""")
Output:
+----+--------------------+----------+-----------+-------+
| id | name | quantity | type | price |
+----+--------------------+----------+-----------+-------+
| 2 | pumpkin | 12 | vegetable | 8.49 |
| 3 | corn | 85 | vegetable | 0.89 |
| 4 | broccoli | 22 | vegetable | 2.99 |
| 9 | carrot | 43 | vegetable | 0.39 |
| 10 | tomato | 17 | vegetable | 1.56 |
...
| 38 | basil | 45 | herb | 1.99 |
| 39 | rosemary | 33 | herb | 1.59 |
| 40 | thyme | 28 | herb | 2.35 |
| 41 | cilantro | 47 | herb | 1.49 |
| 42 | mint | 39 | herb | 1.45 |
+----+--------------------+----------+-----------+-------+
IN
The IN
keyword is used in a conditional statement to check whether a value matches any value in a list of values:
SELECT *
FROM table_name
WHERE value in (value1, value2, ...);
The previous example to find all the vegetables and herbs (WHERE type = 'vegetable' OR type = 'herb'
) can be
re-written using IN
:
# Print all the vegetables and herbs (Part II)
db.execute_sql_query("""
SELECT *
FROM products
WHERE type IN ('vegetable', 'herb');
""")
BETWEEN
The BETWEEN
keyword is used in a conditional statement to check if a value is within the specified range (inclusive
of the start and end values):
SELECT *
FROM products
WHERE value BETWEEN lower_limit AND upper_limit;
For example, to print all the fruits that have a quantity between 30 and 40 (inclusive):
# Print all the fruits that have a quantity between 30 and 40 (inclusive)
db.execute_sql_query("""
SELECT *
FROM products
WHERE type = 'fruit' AND quantity BETWEEN 30 and 40;
""")
Output:
+----+-------------+----------+-------+-------+
| id | name | quantity | type | price |
+----+-------------+----------+-------+-------+
| 1 | banana | 30 | fruit | 0.79 |
| 7 | grapes | 35 | fruit | 4.79 |
| 13 | plum | 40 | fruit | 0.86 |
| 15 | lime | 33 | fruit | 0.75 |
| 21 | raspberries | 30 | fruit | 2.99 |
| 24 | avocado | 34 | fruit | 1.79 |
+----+-------------+----------+-------+-------+
LIKE
When working with strings (i.e. VARCHAR) in SQL, the LIKE
statement finds strings that match a specified pattern:
SELECT *
FROM table_name
WHERE column_name LIKE "%pattern_";
The two wildcard patterns supported with the LIKE
command in PostgreSQL are:
%
- matches any sequence of zero or more characters_
- matches any single character
For example, to find all the products that contain the word "lettuce" in the products
table:
# Print all lettuce products
db.execute_sql_query("""
SELECT name, price, quantity
FROM products
WHERE name LIKE '%lettuce%';
""")
The conditional (WHERE name LIKE '%lettuce%'
) uses two of the %
wildcards to match any string that has zero or more
characters before the word "lettuce" and has zero or more characters after the word "lettuce".
Output:
+--------------------+-------+----------+
| name | price | quantity |
+--------------------+-------+----------+
| red leaf lettuce | 1.49 | 56 |
| iceberg lettuce | 1.49 | 67 |
| romaine lettuce | 1.39 | 43 |
| butterhead lettuce | 1.89 | 32 |
+--------------------+-------+----------+
To find the products that end with "berries":
# Print all the berries
db.execute_sql_query("""
SELECT name, price, quantity
FROM products
WHERE name LIKE '%berries';
""")
Output:
+--------------+-------+----------+
| name | price | quantity |
+--------------+-------+----------+
| strawberries | 5.49 | 28 |
| blackberries | 3.45 | 14 |
| raspberries | 2.99 | 30 |
| blueberries | 2.99 | 89 |
+--------------+-------+----------+
To find all the products that start with the letter "b":
# Print all the products that start with 'b'
db.execute_sql_query("""
SELECT *
FROM products
WHERE name LIKE 'b%';
""")
Output:
+----+--------------------+----------+-----------+-------+
| id | name | quantity | type | price |
+----+--------------------+----------+-----------+-------+
| 1 | banana | 30 | fruit | 0.79 |
| 4 | broccoli | 22 | vegetable | 2.99 |
| 12 | bell pepper | 35 | vegetable | 1.79 |
| 20 | blackberries | 14 | fruit | 3.45 |
| 33 | butterhead lettuce | 32 | vegetable | 1.89 |
| 36 | blueberries | 89 | fruit | 2.99 |
| 38 | basil | 45 | herb | 1.99 |
+----+--------------------+----------+-----------+-------+
NOT LIKE
The NOT LIKE
statement is the negation of the LIKE
command, so it returns the records that do NOT match the pattern:
SELECT *
FROM table_name
WHERE column_name NOT LIKE "%pattern_";
To find the number of products that do not start with the letter "t":
# Print the number of products that do not start with 't'
db.execute_sql_query("""
SELECT COUNT(*) AS products_not_starting_with_t
FROM products
WHERE name NOT LIKE 't%';
""")
Output ("tomato" and "thyme" are not included!):
+------------------------------+
| products_not_starting_with_t |
+------------------------------+
| 42 |
+------------------------------+
AVG
The AVG
function calculates the average of a set of data, which should be integers or decimal point numbers:
SELECT AVG(column_name)
FROM table_name;
For example, to find the average quantity of fruits in the products
table:
# Print the average quantity of fruits
db.execute_sql_query("""
SELECT AVG(quantity) AS average_quantity_fruits
FROM products
WHERE type = 'fruit';
""")
Output:
+-------------------------+
| average_quantity_fruits |
+-------------------------+
| 42.7000000000000000 |
+-------------------------+
ROUND
The AVG
function prints a result with a lot of unnecessary decimal points, which is where the ROUND
function helps:
SELECT ROUND(AVG(column_name), decimal_points_to_include)
FROM table_name;
The ROUND
function takes an optional argument for the number of decimal points to include when rounding:
- 0 (default) - do not include any decimal points (i.e.
ROUND(42.1234)
results in "42") - Positive Integer - round to the specified decimal points (i.e.
ROUND(42.1234123, 4)
results in "42.1234") - Negative Integer - round to the specified place value (i.e.
ROUND(4321.1234, 3)
results in "4200")
Re-writing the last query to use ROUND
:
# Print the average quantity of fruits rounded to one decimal place
db.execute_sql_query("""
SELECT ROUND(AVG(quantity), 1) AS average_quantity_fruits
FROM products
WHERE type = 'fruit';
""")
Output:
+-------------------------+
| average_quantity_fruits |
+-------------------------+
| 42.7 |
+-------------------------+
In PostgreSQL, the use of ROUND
with float values (such as with prices) is tricky. Here's how to get the average
price of the vegetables in the products
table:
# Print the average price of vegetables
db.execute_sql_query("""
SELECT AVG(price) AS average_vegetable_price
FROM products
WHERE type = 'vegetable';
""")
Output:
+-------------------------+
| average_vegetable_price |
+-------------------------+
| 1.9905263157894737 |
+-------------------------+
To round the average price of vegetables, the ROUND
keyword should be used with an argument of 2 decimal places,
but specifying a precision argument requires that the value being rounded is a numeric
data type:
# Print the average price of vegetables rounded to the nearest cent
# NOTE: In PostgreSQL, the ROUND statement can only accept a precision
# when using the `numeric` data type. Therefore, CAST the average
# price to a `numeric` before calling ROUND.
db.execute_sql_query("""
SELECT ROUND( CAST(AVG(price) as numeric), 2 ) AS average_vegetable_price
FROM products
WHERE type = 'vegetable';
""")
Output:
+-------------------------+
| average_vegetable_price |
+-------------------------+
| 1.99 |
+-------------------------+
SUM
The SUM
keyword is used to add up the specified column values:
SELECT SUM(column_name)
FROM table_name
WHERE conditional;
For example, to find the total quantity of herbs available in the products
table:
# Print the total quantity of herbs in stock
db.execute_sql_query("""
SELECT SUM(quantity) AS total_quantity_herbs
FROM products
WHERE type = 'herb';
""")
Output:
+----------------------+
| total_quantity_herbs |
+----------------------+
| 192 |
+----------------------+
MIN and MAX
The MIN
and MAX
keywords are used to find the minimum and maximum values (respectively) in a set of data:
SELECT MAX(column_name)
FROM table_name
WHERE conditional;
SELECT MIN(column_name)
FROM table_name
WHERE conditional;
For example, to find the lowest price of any vegetable in the products
table:
# Print the lowest price of any vegetable
db.execute_sql_query("""
SELECT MIN(price)
FROM products
WHERE type = 'vegetable';
""")
Output:
+------+
| min |
+------+
| 0.35 |
+------+
Similarly, to find the highest price of any fruit in the products
table:
# Print the highest price of any fruit
db.execute_sql_query("""
SELECT MAX(price)
FROM products
WHERE type = 'fruit';
""")
Output:
+------+
| max |
+------+
| 5.49 |
+------+
Sorting
Viewing the results from a query can often be improved by ordering the results in either ascending (lowest to highest) or descending (highest to lowest) order.
ORDER BY
The ORDER BY
statement specifies what column should be used to sort the results:
SELECT column_name
FROM table_name
ORDER BY sort_expression1 [ASC | DESC];
For example, to print the herbs in the products
table ordered by price:
# Print the herbs ordered by the price (lowest to highest)
# NOTE: ORDER BY sorts the records in ascending (ASC) order from lowest to highest
db.execute_sql_query("""
SELECT *
FROM products
WHERE type = 'herb'
ORDER BY price;
""")
Output:
+----+----------+----------+------+-------+
| id | name | quantity | type | price |
+----+----------+----------+------+-------+
| 42 | mint | 39 | herb | 1.45 |
| 41 | cilantro | 47 | herb | 1.49 |
| 39 | rosemary | 33 | herb | 1.59 |
| 38 | basil | 45 | herb | 1.99 |
| 40 | thyme | 28 | herb | 2.35 |
+----+----------+----------+------+-------+
ASC vs. DESC
When using ORDER BY
, the default ordering is ascending (ASC
). The ordering can be changed to descending (DESC
):
# Print the herbs ordered by the price (highest to lowest)
db.execute_sql_query("""
SELECT *
FROM products
WHERE type = 'herb'
ORDER BY price DESC;
""")
Output:
+----+----------+----------+------+-------+
| id | name | quantity | type | price |
+----+----------+----------+------+-------+
| 40 | thyme | 28 | herb | 2.35 |
| 38 | basil | 45 | herb | 1.99 |
| 39 | rosemary | 33 | herb | 1.59 |
| 41 | cilantro | 47 | herb | 1.49 |
| 42 | mint | 39 | herb | 1.45 |
+----+----------+----------+------+-------+
Grouping
When executing SQL queries, it can be beneficial to group records together by a specific column to gather new information about your data.
GROUP BY
The GROUP BY
statement is used to group records together by a specific column type:
SELECT column_name_1, column_nmae_2, ..., function(column_name_3)
FROM table_name
GROUP BY column_name_1;
Let's look at some examples to help solidify how grouping records works.
First, here's the SQL query for printing the number of each type of product in the products
table:
# Print the number of each type of product in the `products` table
db.execute_sql_query("""
SELECT type, COUNT(id) AS num_of_products
FROM products
GROUP BY type;
""")
Output:
+-----------+-----------------+
| type | num_of_products |
+-----------+-----------------+
| fruit | 20 |
| vegetable | 19 |
| herb | 5 |
+-----------+-----------------+
In the query, the GROUP BY type
statement groups all the records based on their type. Since there are 3 types
(vegetable, fruit, herb), the COUNT
of each type is calculated and output.
To print the average price of each type of product in the products
table:
# Print the average price of each type of product in the `products` table
db.execute_sql_query("""
SELECT type, ROUND(CAST(AVG(price) as numeric), 2) AS avg_price
FROM products
GROUP BY type;
""")
Output:
+-----------+-----------+
| type | avg_price |
+-----------+-----------+
| fruit | 2.12 |
| vegetable | 1.99 |
| herb | 1.77 |
+-----------+-----------+
HAVING
The HAVING
statement is used to create a conditional statement when using GROUP BY
:
SELECT column_name_1, column_nmae_2, ..., function(column_name_3)
FROM table_name
GROUP BY column_name_1;
HAVING condition;
For example, to find the count of each type of product that has at least 10 items in the products
table:
# Print find the count of each type of product that has at least 10 items in the `products` table
db.execute_sql_query("""
SELECT type, COUNT(id)
FROM products
GROUP BY type
HAVING COUNT(id) > 10;
""")
Output:
+-----------+-------+
| type | count |
+-----------+-------+
| fruit | 20 |
| vegetable | 19 |
+-----------+-------+
The herb
type is excluded since it only has 5 records (< 10).
Summary
SQL statements learned in this tutorial:
Filtering
WHERE
- filter the results of a query in SQLCOUNT
- return the number of records from aSELECT
statementAS
- create an alias as the heading for the query resultDISTINCT
- remove duplicate records based on a specific columnLIMIT
- limit the number of records in the query resultAND
/OR
- used withWHERE
for more complex conditional statementsIN
- used withWHERE
to check whether a value is in a listBETWEEN
- used withWHERE
to check if a value is within the specified rangeLIKE
- used withWHERE
to match the specified string patternNOT LIKE
- used withWHERE
to NOT match the specified string patternAVG
- calculate the average of a set of dataROUND
- round the specified valueSUM
- add up the specified column valuesMIN
/MAX
- find the minimum and maximum values (respectively) in a set of data
Sorting
ORDER BY
- specifies what column should be used to sort the resultsASC
/DESC
- specifies the ordering as ascending or descending
Grouping
GROUP BY
- used to group records together by a specific column typeHAVING
- used to create a conditional statement when usingGROUP BY