Patrick's Software Blog

Learn SQL with Python (Part 2)

Title of the blog post around a laptop computer being opened in the dark.

Introduction

This tutorial teaches the fundamentals of SQL (Structured Query Language) by using Python to build applications that interact with a relational database (PostgreSQL).

This blog post is Part 2 of the "Learn SQL with Python" series:

  1. SQL Fundamentals
  2. Filtering, Sorting, and Grouping
  3. Joins and Set Operators

The source code for this tutorial can be found on GitLab: https://gitlab.com/patkennedy79/learn_sql. There are separate folders for each of the 3 parts of this tutorial.

Project Structure

In this tutorial, a PostgreSQL database is being used for storing data about a farmer's market that sells vegetables, fruits, and herbs.

In Part 1 (SQL Fundamentals), the 'part1.py' script contained the following classes:

  • Database - class for interacting with a PostgreSQL database
  • DatabaseContextManager - context manager for interacting with a PostgreSQL database using the Database class
  • Products - class for working with the 'products' table in the PostgreSQL database

These classes are being split into separate files to help organize the application. Here are the files used in Part 2 (Filtering, Sorting, and Grouping) of this tutorial:

  • 'database.py' - contains the Database and DatabaseContextManager classes
  • 'models.py' - contains the Products class
  • 'part2_data.csv' - contains the data to be loaded into the PostgreSQL database
  • 'part2.py' - application created in this tutorial (Part 2)

Database

The 'database.py' file contains two classes interacting with a PostgreSQL database:

  1. Database - class for opening/closing a connection to a PostgreSQL database and for executing SQL commands
  2. DatabaseContextManager - context manager for the Database class to ensure proper opening/closing of the database connection

Here is the source code for the Database class:

import psycopg2
from prettytable import from_db_cursor


class Database:
    """Class for interacting with a PostgreSQL database."""
    def __init__(self, host: str, name: str, user: str, password: str):
        """Initialize the configuration for connecting to a PostgreSQL database."""
        self.database_host = host
        self.database_name = name
        self.database_user = user
        self.database_password = password
        self.db_connection = None
        self.db_cursor = None

    def open_database_connection(self):
        """Connect to the PostgreSQL database."""
        print(f'Connecting to the PostgreSQL database ({self.database_name})...')

        # The `db_connection` object encapsulates a database session.  It is used to:
        #   1. create a new cursor object
        #   2. terminate database transactions using either:
        #      - commit() - success condition where database updates are saved (persist)
        #      - rollback() - failure condition where database updates are discarded
        self.db_connection = psycopg2.connect(
            host=self.database_host,
            database=self.database_name,
            user=self.database_user,
            password=self.database_password
        )

        # The `db_cursor` object allows the following interactions with the database:
        #   1. send SQL commands to the database using `execute()` or `executemany()`
        #   2. retrieve data from the database using `fetchone()`, `fetchmany()`, or `fetchall()`
        self.db_cursor = self.db_connection.cursor()

    def print_database_version(self):
        """Print the database version information."""
        self.db_cursor.execute('SELECT version();')
        print(f'PostgreSQL database version: {self.db_cursor.fetchone()}')

    def close_database_connection(self):
        """Close the database connection."""
        print(f'\nClosing connection to the PostgreSQL database ({self.database_name}).')
        self.db_cursor.close()
        self.db_connection.close()

    def commit(self):
        """Commit the changes to the database to make them persistent."""
        self.db_connection.commit()

    def rollback(self):
        """Discard the changes to the database to revert to the previous state."""
        self.db_connection.rollback()

    def execute_sql_query(self, query: str):
        """Execute a SQL query and print the result."""
        self.db_cursor.execute(query)
        table = from_db_cursor(self.db_cursor)
        print(table)

The execute_sql_query() method will be used extensively in this tutorial to execute SQL queries and print their results in a table format.

The source code for the DatabaseContextManager is here:

class DatabaseContextManager:
    """Context Manager for interacting with a PostgreSQL database."""
    def __init__(self, host: str, name: str, user: str, password: str):
        self.database = Database(host, name, user, password)

    def __enter__(self):
        self.database.open_database_connection()
        return self.database

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.database.commit()
        self.database.close_database_connection()

The context manager should be used to ensure that the database connection is opened and closed properly. For example:

    with DatabaseContextManager(HOST, NAME, USER, PASSWORD) as db:
        db.print_database_version()

Models

The 'models.py' file contains the Products class, which includes the methods for working with the 'products' table in the PostgreSQL database. Here is the source code for the 'models.py' file:

from prettytable import from_db_cursor


# ---------------
# Database Tables
# ---------------

class Products:
    """Class for working with the products table in the PostgreSQL database."""

    def __init__(self, db_cursor):
        self.db_cursor = db_cursor

    def create_table_schema(self):
        """Create the schema for the products table."""

        # Create the schema (i.e. columns) for the products table
        # NOTE: The keyword SERIAL is used in PostgreSQL for an
        #       auto-incrementing primary key.  In other relational
        #       database systems, this would be "PRIMARY KEY AUTO_INCREMENT".
        self.db_cursor.execute("""
            CREATE TABLE products(
                id SERIAL PRIMARY KEY,
                name VARCHAR(255) NOT NULL,
                quantity INT,
                type VARCHAR(50),
                price FLOAT
            );
        """)

    def drop_table(self):
        """Drop the products table."""
        self.db_cursor.execute('DROP TABLE IF EXISTS products;')

    def add_column(self, column_name: str, column_type: str):
        """Add a new column to the products table."""
        self.db_cursor.execute(f"""
            ALTER TABLE products
            ADD {column_name} {column_type};
        """)

    def delete_column(self, column_name: str):
        """Delete a column from the products table."""
        self.db_cursor.execute(f"""
            ALTER TABLE products
            DROP COLUMN {column_name};
        """)

    def add_records_from_file(self, filename: str):
        """Add records by reading from a file."""
        with open(filename) as f:
            lines = [line for line in f]
            for line in lines:
                elements = line.split(',')
                self.add_new_record(elements[0], int(elements[1]), elements[2], float(elements[3]))

    def add_new_record(self, name: str, quantity: int, product_type: str, price_per_unit: float):
        """Add a new record to the products table."""
        self.db_cursor.execute(f"""
            INSERT INTO products(name, quantity, type, price)
            VALUES ('{name}', {quantity}, '{product_type}', {price_per_unit});
        """)

    def change_product_price(self, name: str, new_price_per_unit: float):
        """Change the price per unit of the specified record in the products table."""
        self.db_cursor.execute(f"""
            UPDATE products
            SET price = {new_price_per_unit}
            WHERE name = '{name}';
        """)

    def delete_product(self, index: int):
        """Delete the specified product from in the products table."""
        self.db_cursor.execute(f"""
            DELETE FROM products
            WHERE id = {index};
        """)

    def print_all_records(self):
        """Print all the records in the products table."""
        print('\nproducts:')
        self.db_cursor.execute("""
            SELECT *
            FROM products;
        """)
        table = from_db_cursor(self.db_cursor)
        print(table)

    def print_table_schema(self):
        """Print the schema for the products table."""
        print('\nproducts table schema:')
        self.db_cursor.execute("""
            SELECT column_name, data_type
            FROM information_schema.columns
            WHERE table_name = 'products';
        """)
        table = from_db_cursor(self.db_cursor)
        print(table)

    def get_product_id(self, name: str) -> int:
        """Return the ID of the specified product in the products table."""
        self.db_cursor.execute(f"""
            SELECT id
            FROM products
            WHERE name = '{name}';
        """)
        index = self.db_cursor.fetchone()
        return index[0]  # fetchone() returns a tuple, but only the first element is needed

The only method that is new to this class from Part 1 (SQL Fundamentals) is the add_records_from_file method:

def add_records_from_file(self, filename: str):
    """Add records by reading from a CSV file."""
    with open(filename) as f:
        lines = [line for line in f]
        for line in lines:
            elements = line.split(',')
            self.add_new_record(elements[0], int(elements[1]), elements[2], float(elements[3]))

This method takes the path to a CSV file, reads each line of the CSV file, and adds the data from that row as a new record in the 'products' table.

For this tutorial, the data for the 'products' table is found in the 'part2/part2.csv' file GitLab link.

Here are the first few lines of the file to show how each row corresponds to a record in the 'products' table:

banana,30,fruit,0.79
pumpkin,12,vegetable,8.49
corn,85,vegetable,0.89

Application

Using the DatabaseContextManager and Products classes, the main application that we'll be creating in this tutorial is created in part2/part2.py. Here's the initial version of the file to initialize the database:

from database import DatabaseContextManager
from models import Products


# ----------------------
# Database Configuration
# ----------------------
DATABASE_HOST = '127.0.0.1'
DATABASE_NAME = 'learn-sql'
DATABASE_USER = 'postgres'
DATABASE_PASSWORD = 'tomatoes'


# ------------
# Main Program
# ------------
if __name__ == '__main__':
    with DatabaseContextManager(DATABASE_HOST, DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD) as db:
        db.print_database_version()

        # Create the model
        products = Products(db.db_cursor)

        # Drop all the tables
        products.drop_table()

        # Create the `products` table schema
        products.create_table_schema()

        # Read all the data from the input file and add to the database
        products.add_records_from_file('part2/part2_data.csv')
        products.print_all_records()

Output:

(venv) $ python part2.py
Connecting to the PostgreSQL database (learn-sql)...
PostgreSQL database version: ('PostgreSQL 16.0 ...',)

products:
+----+--------------------+----------+-----------+-------+
| id |        name        | quantity |    type   | price |
+----+--------------------+----------+-----------+-------+
| 1  |       banana       |    30    |   fruit   |  0.79 |
| 2  |      pumpkin       |    12    | vegetable |  8.49 |
| 3  |        corn        |    85    | vegetable |  0.89 |
...
| 42 |        mint        |    39    |    herb   |  1.45 |
| 43 |        pear        |    12    |   fruit   |  1.09 |
| 44 |      oranges       |    12    |   fruit   |  0.75 |
+----+--------------------+----------+-----------+-------+

Closing connection to the PostgreSQL database (learn-sql).

Throughout the rest of Part 2 (Filtering, Sorting, and Grouping), code will be added to the main application section within the context manager.

Let's dive into how to write SQL queries for this data...

Filtering

WHERE

The WHERE statement is used to filter the results of a query in SQL:

SELECT id
FROM table_name
WHERE column_name = 'specified_column_name';

To see how filtering works, add the following code in part2.py:

# ------------
# Main Program
# ------------

if __name__ == '__main__':
    with DatabaseContextManager(DATABASE_HOST, DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD) as db:
        ...

        # Print all the herbs in the `products` table
        db.execute_sql_query("""
            SELECT *
            FROM products
            WHERE type = 'herb';
        """)

Output:

+----+----------+----------+------+-------+
| id |   name   | quantity | type | price |
+----+----------+----------+------+-------+
| 38 |  basil   |    45    | herb |  1.99 |
| 39 | rosemary |    33    | herb |  1.59 |
| 40 |  thyme   |    28    | herb |  2.35 |
| 41 | cilantro |    47    | herb |  1.49 |
| 42 |   mint   |    39    | herb |  1.45 |
+----+----------+----------+------+-------+

The following comparison operators are available in SQL:

  • = - equal
  • <> - not equal
  • < - less than
  • > - greater than
  • <= - less than or equal to
  • >= - greater than or equal to

Another example to print all the records that are not vegetables:

# Print all the products that are not vegetables in the `products` table
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type <> 'vegetable';
""")

COUNT

The COUNT function returns the number of records returned by a SELECT statement:

SELECT COUNT(*) 
FROM table_name
WHERE condition;

For example, to find the number of records in the products table:

# Find the number of records in the `products` table
db.execute_sql_query("""
    SELECT COUNT(*)
    FROM products;
""")

Output:

+-------+
| count |
+-------+
|   44  |
+-------+

AS

The AS keyword can be used to create an alias, which then gets printed as the heading for the query result. To find the number of vegetables in the products table with a descriptive heading in the output:

# Find the number of vegetables in the `products` table
db.execute_sql_query("""
    SELECT COUNT(*) AS num_of_veggies
    FROM products
    WHERE type = 'vegetable';
""")

Output:

+----------------+
| num_of_veggies |
+----------------+
|       19       |
+----------------+

DISTINCT

The DISTINCT keyword is used to remove duplicate records based on a specific column:

SELECT DISTINCT column_name
FROM table_name
WHERE condition;

DISTINCT can be combined with COUNT to find the count of unique records based on a specific column:

SELECT COUNT(DISTINCT column_name)
FROM table_name
WHERE condition;

The total number of records that are fruits in the products table can be found using:

# Find the total number of fruits in the `products` table
db.execute_sql_query("""
    SELECT COUNT(*) AS num_of_fruits
    FROM products
    WHERE type = 'fruit';
""")

However, if we're interested in knowing the number of unique fruits (by name) in the products table:

# Find the distinct number of fruits in the `products` table
db.execute_sql_query("""
    SELECT COUNT(DISTINCT name) AS num_of_distinct_fruit
    FROM products
    WHERE type = 'fruit';
""")

Output:

+---------------+
| num_of_fruits |
+---------------+
|       20      |
+---------------+
+-----------------------+
| num_of_distinct_fruit |
+-----------------------+
|           18          |
+-----------------------+

There are 2 records that are fruits in the products database that have duplicate names (pear and orange).

To get a list of all the distinct fruit names in the products database:

# Print all the distinct fruits in the `products` table
db.execute_sql_query("""
    SELECT DISTINCT name
    FROM products
    WHERE type = 'fruit';
""")

Output (notice that pear and orange are only included once each!):

+--------------+
|     name     |
+--------------+
|    apple     |
|   avocado    |
|    banana    |
| blackberries |
| blueberries  |
|  cantaloupe  |
|    grapes    |
|     kiwi     |
|    lemon     |
|     lime     |
|    orange    |
|    peach     |
|     pear     |
|  pineapple   |
|     plum     |
| raspberries  |
| strawberries |
|  watermelon  |
+--------------+

LIMIT

The LIMIT keyword is used to limit the number of records in the query result:

SELECT * 
FROM table_name 
WHERE condition
LIMIT maximum_number_of_records;

For example, to find the herbs in the products table, but limit the result to the first 3 records:

# Print all the herbs in the `products` table with a limit of 3 records
db.execute_sql_query("""
    SELECT name, quantity, price
    FROM products
    WHERE type = 'herb'
    LIMIT 3;
""")

Output:

+----------+----------+-------+
|   name   | quantity | price |
+----------+----------+-------+
|  basil   |    45    |  1.99 |
| rosemary |    33    |  1.59 |
|  thyme   |    28    |  2.35 |
+----------+----------+-------+

ADD and OR

The AND and OR keywords are typically used with WHERE for more complex conditional statements:

SELECT *
FROM table_name
WHERE conditional1 AND/OR conditional2;

For example, to print all the vegetables that cost more than $2 in the products table:

# Print all the vegetables that cost more than $2.00
db.execute_sql_query("""
    SELECT name, price
    FROM products
    WHERE type = 'vegetable' AND price > 2.00;
""")

Output:

+-----------+-------+
|    name   | price |
+-----------+-------+
|  pumpkin  |  8.49 |
|  broccoli |  2.99 |
|  eggplant |  3.49 |
| artichoke |  3.58 |
+-----------+-------+

Another example of using OR to print all the vegetables and herbs in the products table:

# Print all the vegetables and herbs (Part I)
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type = 'vegetable' OR type = 'herb';
""")

Output:

+----+--------------------+----------+-----------+-------+
| id |        name        | quantity |    type   | price |
+----+--------------------+----------+-----------+-------+
| 2  |      pumpkin       |    12    | vegetable |  8.49 |
| 3  |        corn        |    85    | vegetable |  0.89 |
| 4  |      broccoli      |    22    | vegetable |  2.99 |
| 9  |       carrot       |    43    | vegetable |  0.39 |
| 10 |       tomato       |    17    | vegetable |  1.56 |
...
| 38 |       basil        |    45    |    herb   |  1.99 |
| 39 |      rosemary      |    33    |    herb   |  1.59 |
| 40 |       thyme        |    28    |    herb   |  2.35 |
| 41 |      cilantro      |    47    |    herb   |  1.49 |
| 42 |        mint        |    39    |    herb   |  1.45 |
+----+--------------------+----------+-----------+-------+

IN

The IN keyword is used in a conditional statement to check whether a value matches any value in a list of values:

SELECT *
FROM table_name
WHERE value in (value1, value2, ...);

The previous example to find all the vegetables and herbs (WHERE type = 'vegetable' OR type = 'herb') can be re-written using IN:

# Print all the vegetables and herbs (Part II)
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type IN ('vegetable', 'herb');
""")

BETWEEN

The BETWEEN keyword is used in a conditional statement to check if a value is within the specified range (inclusive of the start and end values):

SELECT *
FROM products
WHERE value BETWEEN lower_limit AND upper_limit;

For example, to print all the fruits that have a quantity between 30 and 40 (inclusive):

# Print all the fruits that have a quantity between 30 and 40 (inclusive)
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type = 'fruit' AND quantity BETWEEN 30 and 40;
""")

Output:

+----+-------------+----------+-------+-------+
| id |     name    | quantity |  type | price |
+----+-------------+----------+-------+-------+
| 1  |    banana   |    30    | fruit |  0.79 |
| 7  |    grapes   |    35    | fruit |  4.79 |
| 13 |     plum    |    40    | fruit |  0.86 |
| 15 |     lime    |    33    | fruit |  0.75 |
| 21 | raspberries |    30    | fruit |  2.99 |
| 24 |   avocado   |    34    | fruit |  1.79 |
+----+-------------+----------+-------+-------+

LIKE

When working with strings (i.e. VARCHAR) in SQL, the LIKE statement finds strings that match a specified pattern:

SELECT *
FROM table_name
WHERE column_name LIKE "%pattern_";

The two wildcard patterns supported with the LIKE command in PostgreSQL are:

  • % - matches any sequence of zero or more characters
  • _ - matches any single character

For example, to find all the products that contain the word "lettuce" in the products table:

# Print all lettuce products
db.execute_sql_query("""
    SELECT name, price, quantity
    FROM products
    WHERE name LIKE '%lettuce%';
""")

The conditional (WHERE name LIKE '%lettuce%') uses two of the % wildcards to match any string that has zero or more characters before the word "lettuce" and has zero or more characters after the word "lettuce".

Output:

+--------------------+-------+----------+
|        name        | price | quantity |
+--------------------+-------+----------+
|  red leaf lettuce  |  1.49 |    56    |
|  iceberg lettuce   |  1.49 |    67    |
|  romaine lettuce   |  1.39 |    43    |
| butterhead lettuce |  1.89 |    32    |
+--------------------+-------+----------+

To find the products that end with "berries":

# Print all the berries
db.execute_sql_query("""
    SELECT name, price, quantity
    FROM products
    WHERE name LIKE '%berries';
""")

Output:

+--------------+-------+----------+
|     name     | price | quantity |
+--------------+-------+----------+
| strawberries |  5.49 |    28    |
| blackberries |  3.45 |    14    |
| raspberries  |  2.99 |    30    |
| blueberries  |  2.99 |    89    |
+--------------+-------+----------+

To find all the products that start with the letter "b":

# Print all the products that start with 'b'
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE name LIKE 'b%';
""")

Output:

+----+--------------------+----------+-----------+-------+
| id |        name        | quantity |    type   | price |
+----+--------------------+----------+-----------+-------+
| 1  |       banana       |    30    |   fruit   |  0.79 |
| 4  |      broccoli      |    22    | vegetable |  2.99 |
| 12 |    bell pepper     |    35    | vegetable |  1.79 |
| 20 |    blackberries    |    14    |   fruit   |  3.45 |
| 33 | butterhead lettuce |    32    | vegetable |  1.89 |
| 36 |    blueberries     |    89    |   fruit   |  2.99 |
| 38 |       basil        |    45    |    herb   |  1.99 |
+----+--------------------+----------+-----------+-------+

NOT LIKE

The NOT LIKE statement is the negation of the LIKE command, so it returns the records that do NOT match the pattern:

SELECT *
FROM table_name
WHERE column_name NOT LIKE "%pattern_";

To find the number of products that do not start with the letter "t":

# Print the number of products that do not start with 't'
db.execute_sql_query("""
    SELECT COUNT(*) AS products_not_starting_with_t
    FROM products
    WHERE name NOT LIKE 't%';
""")

Output ("tomato" and "thyme" are not included!):

+------------------------------+
| products_not_starting_with_t |
+------------------------------+
|              42              |
+------------------------------+

AVG

The AVG function calculates the average of a set of data, which should be integers or decimal point numbers:

SELECT AVG(column_name)
FROM table_name;

For example, to find the average quantity of fruits in the products table:

# Print the average quantity of fruits
db.execute_sql_query("""
    SELECT AVG(quantity) AS average_quantity_fruits
    FROM products
    WHERE type = 'fruit';
""")

Output:

+-------------------------+
| average_quantity_fruits |
+-------------------------+
|   42.7000000000000000   |
+-------------------------+

ROUND

The AVG function prints a result with a lot of unnecessary decimal points, which is where the ROUND function helps:

SELECT ROUND(AVG(column_name), decimal_points_to_include)
FROM table_name;

The ROUND function takes an optional argument for the number of decimal points to include when rounding:

  • 0 (default) - do not include any decimal points (i.e. ROUND(42.1234) results in "42")
  • Positive Integer - round to the specified decimal points (i.e. ROUND(42.1234123, 4) results in "42.1234")
  • Negative Integer - round to the specified place value (i.e. ROUND(4321.1234, 3) results in "4200")

Re-writing the last query to use ROUND:

# Print the average quantity of fruits rounded to one decimal place
db.execute_sql_query("""
    SELECT ROUND(AVG(quantity), 1) AS average_quantity_fruits
    FROM products
    WHERE type = 'fruit';
""")

Output:

+-------------------------+
| average_quantity_fruits |
+-------------------------+
|           42.7          |
+-------------------------+

In PostgreSQL, the use of ROUND with float values (such as with prices) is tricky. Here's how to get the average price of the vegetables in the products table:

# Print the average price of vegetables
db.execute_sql_query("""
    SELECT AVG(price) AS average_vegetable_price
    FROM products
    WHERE type = 'vegetable';
""")

Output:

+-------------------------+
| average_vegetable_price |
+-------------------------+
|    1.9905263157894737   |
+-------------------------+

To round the average price of vegetables, the ROUND keyword should be used with an argument of 2 decimal places, but specifying a precision argument requires that the value being rounded is a numeric data type:

# Print the average price of vegetables rounded to the nearest cent
# NOTE: In PostgreSQL, the ROUND statement can only accept a precision
#       when using the `numeric` data type.  Therefore, CAST the average
#       price to a `numeric` before calling ROUND.
db.execute_sql_query("""
    SELECT ROUND( CAST(AVG(price) as numeric), 2 ) AS average_vegetable_price
    FROM products
    WHERE type = 'vegetable';
""")

Output:

+-------------------------+
| average_vegetable_price |
+-------------------------+
|           1.99          |
+-------------------------+

SUM

The SUM keyword is used to add up the specified column values:

SELECT SUM(column_name)
FROM table_name
WHERE conditional;

For example, to find the total quantity of herbs available in the products table:

# Print the total quantity of herbs in stock
db.execute_sql_query("""
    SELECT SUM(quantity) AS total_quantity_herbs
    FROM products
    WHERE type = 'herb';
""")

Output:

+----------------------+
| total_quantity_herbs |
+----------------------+
|         192          |
+----------------------+

MIN and MAX

The MIN and MAX keywords are used to find the minimum and maximum values (respectively) in a set of data:

SELECT MAX(column_name) 
FROM table_name
WHERE conditional;

SELECT MIN(column_name) 
FROM table_name
WHERE conditional;

For example, to find the lowest price of any vegetable in the products table:

# Print the lowest price of any vegetable
db.execute_sql_query("""
    SELECT MIN(price)
    FROM products
    WHERE type = 'vegetable';
""")

Output:

+------+
| min  |
+------+
| 0.35 |
+------+

Similarly, to find the highest price of any fruit in the products table:

# Print the highest price of any fruit
db.execute_sql_query("""
    SELECT MAX(price)
    FROM products
    WHERE type = 'fruit';
""")

Output:

+------+
| max  |
+------+
| 5.49 |
+------+

Sorting

Viewing the results from a query can often be improved by ordering the results in either ascending (lowest to highest) or descending (highest to lowest) order.

ORDER BY

The ORDER BY statement specifies what column should be used to sort the results:

SELECT column_name
FROM table_name 
ORDER BY sort_expression1 [ASC | DESC];

For example, to print the herbs in the products table ordered by price:

# Print the herbs ordered by the price (lowest to highest)
# NOTE: ORDER BY sorts the records in ascending (ASC) order from lowest to highest
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type = 'herb'
    ORDER BY price;
""")

Output:

+----+----------+----------+------+-------+
| id |   name   | quantity | type | price |
+----+----------+----------+------+-------+
| 42 |   mint   |    39    | herb |  1.45 |
| 41 | cilantro |    47    | herb |  1.49 |
| 39 | rosemary |    33    | herb |  1.59 |
| 38 |  basil   |    45    | herb |  1.99 |
| 40 |  thyme   |    28    | herb |  2.35 |
+----+----------+----------+------+-------+

ASC vs. DESC

When using ORDER BY, the default ordering is ascending (ASC). The ordering can be changed to descending (DESC):

# Print the herbs ordered by the price (highest to lowest)
db.execute_sql_query("""
    SELECT *
    FROM products
    WHERE type = 'herb'
    ORDER BY price DESC;
""")

Output:

+----+----------+----------+------+-------+
| id |   name   | quantity | type | price |
+----+----------+----------+------+-------+
| 40 |  thyme   |    28    | herb |  2.35 |
| 38 |  basil   |    45    | herb |  1.99 |
| 39 | rosemary |    33    | herb |  1.59 |
| 41 | cilantro |    47    | herb |  1.49 |
| 42 |   mint   |    39    | herb |  1.45 |
+----+----------+----------+------+-------+

Grouping

When executing SQL queries, it can be beneficial to group records together by a specific column to gather new information about your data.

GROUP BY

The GROUP BY statement is used to group records together by a specific column type:

SELECT column_name_1, column_nmae_2, ..., function(column_name_3)
FROM  table_name
GROUP BY column_name_1;

Let's look at some examples to help solidify how grouping records works.

First, here's the SQL query for printing the number of each type of product in the products table:

# Print the number of each type of product in the `products` table
db.execute_sql_query("""
    SELECT type, COUNT(id) AS num_of_products
    FROM products
    GROUP BY type;
""")

Output:

+-----------+-----------------+
|    type   | num_of_products |
+-----------+-----------------+
|   fruit   |        20       |
| vegetable |        19       |
|    herb   |        5        |
+-----------+-----------------+

In the query, the GROUP BY type statement groups all the records based on their type. Since there are 3 types (vegetable, fruit, herb), the COUNT of each type is calculated and output.

To print the average price of each type of product in the products table:

# Print the average price of each type of product in the `products` table
db.execute_sql_query("""
    SELECT type, ROUND(CAST(AVG(price) as numeric), 2) AS avg_price
    FROM products
    GROUP BY type;
""")

Output:

+-----------+-----------+
|    type   | avg_price |
+-----------+-----------+
|   fruit   |    2.12   |
| vegetable |    1.99   |
|    herb   |    1.77   |
+-----------+-----------+

HAVING

The HAVING statement is used to create a conditional statement when using GROUP BY:

SELECT column_name_1, column_nmae_2, ..., function(column_name_3)
FROM  table_name
GROUP BY column_name_1;
HAVING  condition;

For example, to find the count of each type of product that has at least 10 items in the products table:

# Print find the count of each type of product that has at least 10 items in the `products` table
db.execute_sql_query("""
    SELECT type, COUNT(id)
    FROM products
    GROUP BY type
    HAVING COUNT(id) > 10;
""")

Output:

+-----------+-------+
|    type   | count |
+-----------+-------+
|   fruit   |   20  |
| vegetable |   19  |
+-----------+-------+

The herb type is excluded since it only has 5 records (< 10).

Summary

SQL statements learned in this tutorial:

Filtering

  • WHERE - filter the results of a query in SQL
  • COUNT - return the number of records from a SELECT statement
  • AS - create an alias as the heading for the query result
  • DISTINCT - remove duplicate records based on a specific column
  • LIMIT - limit the number of records in the query result
  • AND/OR - used with WHERE for more complex conditional statements
  • IN - used with WHERE to check whether a value is in a list
  • BETWEEN - used with WHERE to check if a value is within the specified range
  • LIKE - used with WHERE to match the specified string pattern
  • NOT LIKE - used with WHERE to NOT match the specified string pattern
  • AVG - calculate the average of a set of data
  • ROUND - round the specified value
  • SUM - add up the specified column values
  • MIN/MAX - find the minimum and maximum values (respectively) in a set of data

Sorting

  • ORDER BY - specifies what column should be used to sort the results
  • ASC/DESC - specifies the ordering as ascending or descending

Grouping

  • GROUP BY - used to group records together by a specific column type
  • HAVING - used to create a conditional statement when using GROUP BY