Introduction to DBMS

A Database Management System (DBMS) is software that provides an interface for users to interact with databases. It enables the management, storage, and manipulation of data within databases. DBMS can be broadly classified into two main types: Relational DBMS and NoSQL DBMS.

In a Relational DBMS, data is stored in structured tables and is manipulated using SQL (Structured Query Language). Examples of relational DBMS inlcude MySQL, Oracle, and PostgreSQL.

In a NoSQL DBMS, data is typically unstructured or semi-structured, offering flexibility to handle diverse data types. Examples of NoSQL DBMS include MongoDB and Redis.


MySQL

MySQL is a relational database management system (RDBMS) that uses Structured Query Language (SQL) to interact with databases. It stores data in tables made up of rows and columns, allowing for efficient organization, retrival, and manipulation of data.


SQL Commands

SQL Commands are used to interact with relational databases. These commands can be classified into five types: DDL, DQL, DML, DCL, and TCL.

SQL Commands
  • DDL (Data Definition Language): Used to define and manage the structure of database instances, such as creating, altering, and deleting tables, and defining constraints.
  • DQL (Data Query Language): Used to retrive data from the database, primarily through the SELECT statement.
  • DML (Data Manipulation Language): Used to manipulate data within tables, such as inserting, updating, and deleting records.
  • DCL (Data Control Language): Used to control access to data, manage user priviledges, and ensure data security. It includes commands like GRANT and REVOKE.
  • TCL (Transaction Control Language): Used to manage transactions, ensuring data consistency. It includes commands like COMMIT, ROLLBACK, and SAVEPOINT.

SQL Query Execution Order

MySQL Execution Order: In MySQL, the execution order of an SQL query differs from the order in which components (e.g., SELECT, FROM, WHERE) are written in the query. MySQL begins by identifying the tables or views specified in the FROM clause. If there are any JOIN operation, MySQL performs the joins first and applies the ON condition. After completing the join operations, MySQL filters the based on the conditions specified in the WHERE clause.

If a GROUP BY clause is present, MySQL groups the rows based on the specified column(s) and then executes any aggregation functions (e.g., SUM(), COUNT(), etc.). Following the grouping, if a HAVING clause is provided, MySQL further filters the grouped results based on the specified condition.

After filtering grouped results, MySQL selects the specified columns in the SELECT clause. If the DISTINCT keyword is used, MySQL removes duplicate rows from the selected result. Once the rows are selected and duplicates are removed, MySQL sorts the result set according to the rules defined in the ORDER BY clause.

Finally, if a LIMIT or OFFSET clause is included, MySQL limits the number of rows returned, starting from the specified OFFSET.

SQL Query Execution Order

Select

Find Customer Referee

SHOW PROBLEM

Table: Customer

Column Name Type
id int
name varchar
referee_id int
  • The id column is the primary key.
  • Each row represents a customer, their name, and the id of the customer who referred them. If a customer wasn’t referred by anyone, the referee_id is NULL.

Problem:

Find the names of customers who were not referred by the customer with id = 2.

The result can be returned in any order.


Example:

Input: Customer table:

id name referee_id
1 Will NULL
2 Jane NULL
3 Alex 2
4 Bill NULL
5 Zack 1
6 Mark 2

Output:

name
Will
Jane
Bill
Zack

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT name FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL;

SHOW NOTES

In SQL, NULL is not equal to any value, including 2. Therefore, the condition referee_id != 2 will exclude records where referee_id is NULL, because comparisons with NULL always result in an unknown NULL value, which is neither TRUE nor FALSE. It is crucial to consider the handling of NULL values when working with databases.


Article Views I

SHOW PROBLEM

Table: Views

Column Name Type
article_id int
author_id int
viewer_id int
view_date date
  • There is no primary key in this table, and the table may contain duplicate rows.
  • Each row indicates that a viewer viewed an article, where the article is written by an author, on a specific date.
  • If author_id and viewer_id are the same, it indicates that the author viewed their own article.

Problem:

Write a query to find the IDs of authors who have viewed at least one of their own articles.

Return the result table sorted by author_id in ascending order.


Example:

Input:

Views table:

article_id author_id viewer_id view_date
1 3 5 2019-08-01
1 3 6 2019-08-02
2 7 7 2019-08-01
2 7 6 2019-08-02
4 7 1 2019-07-22
3 4 4 2019-07-21
3 4 4 2019-07-21

Output:

id
4
7

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT DISTINCT author_id AS id FROM Views
WHERE author_id = viewer_id
ORDER BY id ASC;

SHOW NOTES

In SQL, the DISTINCT keyword is used to remove duplicate records from the result set, ensuring that only unique values are returned. It is commonly used in scenarios such as aggregating distinct values and ensuring uniqueness in join queries.

# Aggregating unique values
SELECT DISTINCT product_id
FROM Order_Items;

# Ensuring unique results in join queries
SELECT DISTINCT c.customer_id, c.name
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id;


Baisc Joins

LEFT JOIN: Replace Employee ID With The Unique Identifier

In MySQL, LEFT JOIN is used to combine rows from two or more tables based on a related column between them. It returns all rows from the left table and the matching rows from the right table. If there is no match, the result will include NULL values for the column from the right table.

SHOW PROBLEM

Tables:

Employees

Column Name Type
id int
name varchar
  • id is the primary key for this table.
  • Each row represents an employee in the company, containing their id and name.

EmployeeUNI

Column Name Type
id int
unique_id int
  • (id, unique_id) is the primary key, meaning the combination of id and unique_id is unique across the table.
  • Each row represents an employee’s id and their corresponding unique_id in the company.

Task:

Write a SQL query that returns the unique_id of each employee from the Employees table. If an employee does not have a corresponding unique_id, return NULL for that employee.

Query Requirements:

  1. If an employee has a unique_id, show it.
  2. If an employee does not have a unique_id, return NULL.
  3. The result should display the unique_id and the name of each employee.

The results can be returned in any order.


Example:

Input:

Employees table:

id name
1 Alice
7 Bob
11 Meir
90 Winston
3 Jonathan

EmployeeUNI table:

id unique_id
3 1
11 2
90 3

Output:

unique_id name
NULL Alice
NULL Bob
2 Meir
3 Winston
1 Jonathan

Explanation:

  • Alice and Bob do not have a unique_id, so NULL is shown for both.
  • Meir has a unique_id of 2.
  • Winston has a unique_id of 3.
  • Jonathan has a unique_id of 1.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT euni.unique_id, e.name
FROM Employees e
LEFT JOIN EmployeeUNI euni
ON e.id = euni.id;

LEFT JOIN: Confirmation Rate

SHOW PROBLEM

Table: Signups

Column Name Type
user_id int
time_stamp datetime
  • user_id is a unique identifier for each user.
  • Each row represents the signup time for the user identified by user_id.

Table: Confirmations

Column Name Type
user_id int
time_stamp datetime
action ENUM
  • The primary key is a combination of user_id and time_stamp.
  • user_id is a foreign key referencing the Signups table.
  • action is an ENUM with two possible values: 'confirmed' or 'timeout'.
  • Each row represents a confirmation request made by the user with user_id, indicating whether the confirmation message was confirmed ('confirmed') or timed out ('timeout').

Problem Description

The confirmation rate of a user is calculated as the ratio of 'confirmed' actions to the total number of confirmation requests. If a user has not made any confirmation requests, the confirmation rate is 0. The rate should be rounded to two decimal places.

Goal

Write a SQL query to find the confirmation rate for each user.

Return the result in any order.


Input

Signups table:

user_id time_stamp
3 2020-03-21 10:16:13
7 2020-01-04 13:57:59
2 2020-07-29 23:09:44
6 2020-12-09 10:39:37

Confirmations table:

user_id time_stamp action
3 2021-01-06 03:30:46 timeout
3 2021-07-14 14:00:00 timeout
7 2021-06-12 11:57:29 confirmed
7 2021-06-13 12:58:28 confirmed
7 2021-06-14 13:59:27 confirmed
2 2021-01-22 00:00:00 confirmed
2 2021-02-28 23:59:59 timeout

Output

user_id confirmation_rate
6 0.00
3 0.00
7 1.00
2 0.50

Explanation:

  • User 6 did not request any confirmation messages, so their confirmation rate is 0.
  • User 3 made two confirmation requests, but both timed out, so their confirmation rate is 0.
  • User 7 made three requests, and all were confirmed, so their confirmation rate is 1.00.
  • User 2 made two requests: one confirmed and one timed out, so their confirmation rate is 1/2 = 0.50.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT
    user_id,
    ROUND(IFNULL(SUM(action = 'confirmed') / COUNT(1), 0), 2) AS confirmation_rate
FROM
    SignUps
    LEFT JOIN Confirmations USING (user_id)
GROUP BY 1;

SELECT s.user_id, 
    CASE WHEN c.user_id IS NULL THEN 0.00 
        ELSE ROUND(SUM(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END) / COUNT(*),2) 
    END 
    AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON c.user_id = s.user_id
GROUP BY s.user_id

SHOW NOTES
  • SUM(action = 'confirmed'): action = 'confirmed' returns 1 for confirmed actions and 0 for 'timeout' action.
  • COUNT(1): Counts the total number of rows. (Use COUNT(*) for counting rows is more clear and idiomatic).
  • LEFT JOIN Confirmations USING (user_id): Joins the Signups table with the Confirmations table on the user_id column.
  • GROUP BY 1: A shorthand for GROUP BY user_id. It groups the results by the first column in the SELECT list, which is user_id in this case.


INNER JOIN: Product Sales Analysis I

In MySQL, the INNER JOIN is used to combine rows from two or more tables based on a related column between them. It only returns rows where there is a match in all the joined tables.

SHOW PROBLEM

Tables:

Sales

Column Name Type
sale_id int
product_id int
year int
quantity int
price int
  • (sale_id, year) is the primary key, meaning this combination of columns is unique.
  • product_id is a foreign key referencing the Product table.
  • Each row represents a sale of a product identified by product_id in a particular year.
  • The price is per unit of the product.

Product

Column Name Type
product_id int
product_name varchar
  • product_id is the primary key for this table.
  • Each row represents a product with its product_name.

Task:

Write a SQL query to report the product_name, year, and price for each sale in the Sales table.

Query Requirements:

  1. Return the product_name from the Product table, corresponding to the product_id in the Sales table.
  2. Include the year and price from the Sales table for each sale.
  3. The result can be returned in any order.

Example:

Input:

Sales table:

sale_id product_id year quantity price
1 100 2008 10 5000
2 100 2009 12 5000
7 200 2011 15 9000

Product table:

product_id product_name
100 Nokia
200 Apple
300 Samsung

Output:

product_name year price
Nokia 2008 5000
Nokia 2009 5000
Apple 2011 9000

Explanation:

  • From sale_id = 1, we can conclude that Nokia was sold for 5000 in 2008.
  • From sale_id = 2, we can conclude that Nokia was sold for 5000 in 2009.
  • From sale_id = 7, we can conclude that Apple was sold for 9000 in 2011.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT p.product_name, s.year, s.price
FROM Sales s
INNER JOIN Product p
ON s.product_id = p.product_id;

LEFT JOIN: Customer Who Visited but Did Not Make Any Transactions

SHOW PROBLEM

Table: Visits

Column Name Type
visit_id int
customer_id int
  • visit_id is the unique identifier for each visit.
  • This table contains information about the customers who visited the mall.

Table: Transactions

Column Name Type
transaction_id int
visit_id int
amount int
  • transaction_id is the unique identifier for each transaction.
  • This table contains information about the transactions made during each visit.

Task:

Write a SQL query to find the IDs of customers who visited the mall without making any transactions, and count how many times they made these visits.

The result should include:

  • customer_id โ€” the ID of the customer.
  • count_no_trans โ€” the number of visits where no transaction was made.

Query Requirements:

  1. Return the result sorted in any order.

Input:

Visits table:

visit_id customer_id
1 23
2 9
4 30
5 54
6 96
7 54
8 54

Transactions table:

transaction_id visit_id amount
2 5 310
3 5 300
9 5 200
12 1 910
13 2 970

Output:

customer_id count_no_trans
54 2
30 1
96 1

Explanation:

  • Customer with id = 23 visited once and made a transaction during the visit with id = 12.
  • Customer with id = 9 visited once and made a transaction during the visit with id = 13.
  • Customer with id = 30 visited once and did not make any transactions.
  • Customer with id = 54 visited three times. During two visits, they did not make any transactions, and during one visit, they made three transactions.
  • Customer with id = 96 visited once and did not make any transactions.

In the output, customers with id = 30, id = 96, and id = 54 (for two of their visits) made visits without transactions.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT v.customer_id, COUNT(v.visit_id) as count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;

SHOW NOTES

In the above SQL query, MySQL first identifies the Visits table and performs a LEFT JOIN with the Transactions table based on the condition v.visit_id = t.visit_id. The LEFT JOIN operation combines the two tables and fills NULL in rows where there is no match in the Transactions table. The result looks like this:

SELECT *
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id;
visit_id customer_id transaction_id visit_id amount
1 23 12 1 910
2 9 13 2 970
4 30 null null null
5 54 9 5 200
5 54 3 5 300
5 54 2 5 310
6 96 null null null
7 54 null null null
8 54 null null null

After joining the tables, MySQL applies the WHERE clause to filter rows where t.transaction_id IS NULL, which means visits without any transactions.

SELECT *
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL;
visit_id customer_id transaction_id visit_id amount
4 30 null null null
6 96 null null null
7 54 null null null
8 54 null null null

Next, MySQL groups the filtered rows by customer_id and calculates the number of visits per customer using the COUNT(v.visit_id) function.

SELECT *, COUNT(v.visit_id)
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;
visit_id customer_id transaction_id visit_id amount COUNT(v.visit_id)
4 30 null null null 1
6 96 null null null 1
7 54 null null null 2

Finally, MySQL selects the desired columns and renames the COUNT(v.visit_id) column using the AS clause.

SELECT v.customer_id, COUNT(v.visit_id) AS count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;
customer_id count_no_trans
30 1
96 1
54 2


SELF JOIN: Rising Temperature

In MySQL, a self join is an operation where a table joins itself. It is commonly used when comparing rows within the same table based on a specific condition or when working with hierarchical data, such as relating employees to their managers.

SHOW PROBLEM

Table: Weather

Column Name Type
id int
recordDate date
temperature int
  • id is a column with unique values.
  • There are no duplicate rows for the same recordDate.
  • This table contains information about the temperature on specific dates.

Problem Statement

Write a solution to find the id of all dates where the temperature is higher compared to the previous day (yesterday).

Return

  • Return the result table in any order.

Input:

Weather Table:

id recordDate temperature
1 2015-01-01 10
2 2015-01-02 25
3 2015-01-03 20
4 2015-01-04 30

Output:

id
2
4

Explanation:

  • On 2015-01-02, the temperature was higher than the previous day (10 -> 25).
  • On 2015-01-04, the temperature was higher than the previous day (20 -> 30).

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT w1.id
FROM Weather w1
JOIN Weather w2
  ON DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate
WHERE w1.temperature > w2.temperature;

SHOW NOTES

In the above query, the DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate condition ensures that the recordDate of w1 matches the day following w2’s recordDate.

SELECT *
FROM Weather w1
JOIN Weather w2
  ON DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate
id recordDate temperature id recordDate temperature
1 2015-01-01 10
2 2015-01-02 25 1 2015-01-01 10
3 2015-01-03 20 2 2015-01-02 25
4 2015-01-04 30 3 2015-01-03 20
4 2015-01-04 30

Result of the Join:

id recordDate temperature id recordDate temperature
2 2015-01-02 25 1 2015-01-01 10
3 2015-01-03 20 2 2015-01-02 25
4 2015-01-04 30 3 2015-01-03 20
  • when id equals to 2, 25 > 10, so it is selcted.
  • when id equals to 3, 20 < 25, so it is not selcted.
  • when id equals to 4, 25 > 10, so it is selcted.

Therefore, the result will be:

id
2
4

Example: Relating Employees to Their Managers

Employees Table:

id name manager_id
1 Alice NULL
2 Bob 1
3 Charlie 1
4 David 2
SELECT e1.name AS Employee, e2.name AS Manager
FROM Employees e1
JOIN Employees e2
  ON e1.manager_id = e2.id;

Result of the Join:

id name manager_id id name manager_id
2 Bob 1 1 Alice NULL
3 Charlie 1 1 Alice NULL
4 David 2 2 Bob 1

Final Result:

Employee Manager
Bob Alice
Charlie Alice
David Bob


SELF JOIN: Managers with at Least 5 Direct Reports

SHOW PROBLEM

Problem Description

Table: Employee

Column Name Data Type Description
id int Unique identifier for each employee.
name varchar Name of the employee.
department varchar Department to which the employee belongs.
managerId int ID of the employee’s manager. If managerId is null, the employee does not have a manager.
  • id is the primary key, ensuring each value is unique.
  • Each row represents an employee, including their name, department, and their managerโ€™s ID.
  • If managerId is null, the employee does not report to any manager.
  • An employee cannot be their own manager.

Task

Write a query to find the managers who have at least five direct reports.

Return the result in any order.


Input:

Employee Table:

id name department managerId
101 John A null
102 Dan A 101
103 James A 101
104 Amy A 101
105 Anne A 101
106 Ron B 101

Output:

name
John

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT b.name
FROM Employee a
JOIN Employee b ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

SHOW NOTES

Table: Employee:

id name department managerId
101 John A null
102 Dan A 101
103 James A 101
104 Amy A 101
105 Anne A 101
106 Ron B 101

Perform a self join, the table becomes:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id

Table: Slef Joined Result:

id name department managerId id name department managerId
106 Ron B 101 101 John A null
105 Anne A 101 101 John A null
104 Amy A 101 101 John A null
103 James A 101 101 John A null
102 Dan A 101 101 John A null

Group the self joined result:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id

Table: Grouped Result:

id name department managerId id name department managerId
106 Ron B 101 101 John A null

Filter grouped result:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

Table: Filtered Grouped Result:

id name department managerId id name department managerId
106 Ron B 101 101 John A null

Select the specified column(s):

SELECT b.name
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

Final Result:

name
John


SELF JOIN: Average Time of Process per Machine

SHOW PROBLEM

Table: Activity

Column Name Type
machine_id int
process_id int
activity_type enum
timestamp float
  • The table records user activities for machines on a factory website.
  • The combination of (machine_id, process_id, activity_type) is the primary key, ensuring uniqueness in the table.
  • machine_id is the ID of the machine.
  • process_id is the ID of the process running on the machine with ID machine_id.
  • activity_type is an ENUM type with values 'start' and 'end', representing the start and end of a process.
  • timestamp is a float value representing the time (in seconds) the event occurred.
  • The 'start' timestamp is always earlier than the 'end' timestamp for each (machine_id, process_id) pair.
  • It is guaranteed that each (machine_id, process_id) pair has a corresponding 'start' and 'end' timestamp.

Problem:

There are several machines on a factory website, and each machine runs the same number of processes. Your task is to write a SQL query that calculates the average time each machine takes to complete a process.

  • The time to complete a process is the difference between the 'end' timestamp and the 'start' timestamp.
  • The average time for each machine is calculated by dividing the total time for all processes on that machine by the number of processes.

The result should contain the following columns:

  • machine_id โ€” the ID of the machine.
  • processing_time โ€” the average processing time, rounded to 3 decimal places.

Input:

Activity table:
|------------|------------|---------------|-----------|
| machine_id | process_id | activity_type | timestamp |
|------------|------------|---------------|-----------|
| 0          | 0          | start         | 0.712     |
| 0          | 0          | end           | 1.520     |
| 0          | 1          | start         | 3.140     |
| 0          | 1          | end           | 4.120     |
| 1          | 0          | start         | 0.550     |
| 1          | 0          | end           | 1.550     |
| 1          | 1          | start         | 0.430     |
| 1          | 1          | end           | 1.420     |
| 2          | 0          | start         | 4.100     |
| 2          | 0          | end           | 4.512     |
| 2          | 1          | start         | 2.500     |
| 2          | 1          | end           | 5.000     |
|------------|------------|---------------|-----------|

Output:

|------------|-----------------|
| machine_id | processing_time |
|------------|-----------------|
| 0          | 0.894           |
| 1          | 0.995           |
| 2          | 1.456           |
|------------|-----------------|

Explanation:

  • Machine 0:

    • Process 0: End time 1.520, Start time 0.712 โ†’ Time taken = 1.520 - 0.712 = 0.808
    • Process 1: End time 4.120, Start time 3.140 โ†’ Time taken = 4.120 - 3.140 = 0.980
    • Average time = (0.808 + 0.980) / 2 = 0.894
  • Machine 1:

    • Process 0: End time 1.550, Start time 0.550 โ†’ Time taken = 1.550 - 0.550 = 1.000
    • Process 1: End time 1.420, Start time 0.430 โ†’ Time taken = 1.420 - 0.430 = 0.990
    • Average time = (1.000 + 0.990) / 2 = 0.995
  • Machine 2:

    • Process 0: End time 4.512, Start time 4.100 โ†’ Time taken = 4.512 - 4.100 = 0.412
    • Process 1: End time 5.000, Start time 2.500 โ†’ Time taken = 5.000 - 2.500 = 2.500
    • Average time = (0.412 + 2.500) / 2 = 1.456

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT machine_id,
       ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
    SELECT a.machine_id,
           a.process_id,
           a.timestamp AS start_time,
           b.timestamp AS end_time
    FROM Activity a
    JOIN Activity b ON a.machine_id = b.machine_id
                    AND a.process_id = b.process_id
                    AND a.activity_type = 'start'
                    AND b.activity_type = 'end'
) AS process_times
GROUP BY machine_id;

SHOW NOTES

In the following SQL query, the condition a.activity_type = 'start' and b.activity_type = 'end' ensures that the start and end timestamp are correctly matched for the same process on the same machine.

SELECT *
FROM Activity a
JOIN Activity b ON a.machine_id = b.machine_id
                AND a.process_id = b.process_id
                AND a.activity_type = 'start'
                AND b.activity_type = 'end'  

Result of the Join:

machine_id process_id activity_type timestamp machine_id process_id activity_type timestamp
0 0 start 0.712 0 0 end 1.52
0 1 start 3.14 0 1 end 4.12
1 0 start 0.55 1 0 end 1.55
1 1 start 0.43 1 1 end 1.42
2 0 start 4.1 2 0 end 4.512
2 1 start 2.5 2 1 end 5

The result is selected from of the joined result:

SELECT machine_id,
       ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
    SELECT a.machine_id,
           a.process_id,
           a.timestamp AS start_time,
           b.timestamp AS end_time
    FROM Activity a
    JOIN Activity b ON a.machine_id = b.machine_id
                    AND a.process_id = b.process_id
                    AND a.activity_type = 'start'
                    AND b.activity_type = 'end'
) AS process_times
GROUP BY machine_id;

Final Result:

machine_id processing_time
0 0.894
1 0.995
2 1.456


CROSS JOIN: Students and Examinations

In MySQL, a CROSS JOIN is a type of join that returns the Cartesian product of two tables. It combines each row from the first table with every row from the second table, without the need for a join condition. For example, consider two tables, Students and Subjects:

Table 1: Students

student_id student_name
1 Alice
2 Bob

Table 2: Subjects

subject_name
Math
Physics

When a Cross Join is performed between these two tables, the result will be:

Table: Result of Cross Join

student_id student_name subject_name
1 Alice Math
1 Alice Physics
2 Bob Math
2 Bob Physics

SHOW PROBLEM

Table: Students

Column Name Type
student_id int
student_name varchar
  • student_id is the primary key (unique values) for this table.
  • Each row represents a student with their unique ID and name.

Table: Subjects

Column Name Type
subject_name varchar
  • subject_name is the primary key (unique values) for this table.
  • Each row represents the name of a subject offered in the school.

Table: Examinations

Column Name Type
student_id int
subject_name varchar
  • There is no primary key for this table, and it may contain duplicates.
  • Each row indicates that a student with student_id attended the exam for the subject subject_name.
  • Every student takes every course from the Subjects table.

Task:

Write a query to find the number of times each student attended each exam, ordered by student_id and subject_name.


Input:

Students Table:

student_id student_name
1 Alice
2 Bob
13 John
6 Alex

Subjects Table:

subject_name
Math
Physics
Programming

Examinations Table:

student_id subject_name
1 Math
1 Physics
1 Programming
2 Programming
1 Physics
1 Math
13 Math
13 Programming
13 Physics
2 Math
1 Math

Output:

student_id student_name subject_name attended_exams
1 Alice Math 3
1 Alice Physics 2
1 Alice Programming 1
2 Bob Math 1
2 Bob Physics 0
2 Bob Programming 1
6 Alex Math 0
6 Alex Physics 0
6 Alex Programming 0
13 John Math 1
13 John Physics 1
13 John Programming 1

Explanation:

The result table contains all students and all subjects.

  • Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
  • Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.
  • Alex did not attend any exams.
  • John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    s.student_id,
    s.student_name,
    sub.subject_name,
    COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name
ORDER BY 
    s.student_id, 
    sub.subject_name;    

SHOW NOTES

Table: Students

student_id student_name
1 Alice
2 Bob
13 John
6 Alex

Table: Subjects:

subject_name
Math
Physics
Programming

When a CROSS JOIN is performed between the Students and Subjects tables, the result will be:

SELECT *
FROM 
    Students s
CROSS JOIN 
    Subjects sub

Table: Result of Cross Join:

student_id student_name subject_name
1 Alice Programming
1 Alice Physics
1 Alice Math
2 Bob Programming
2 Bob Physics
2 Bob Math
13 John Programming
13 John Physics
13 John Math
6 Alex Programming
6 Alex Physics
6 Alex Math

Table: Examniations:

student_id subject_name
1 Math
1 Physics
1 Programming
2 Programming
1 Physics
1 Math
13 Math
13 Programming
13 Physics
2 Math
1 Math

Perform a LEFT JOIN on the cross-joined tables with the Examinations table:

SELECT *
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name

Table: Left Join Result:

student_id student_name subject_name student_id subject_name
1 Alice Programming 1 Programming
1 Alice Physics 1 Physics
1 Alice Physics 1 Physics
1 Alice Math 1 Math
1 Alice Math 1 Math
1 Alice Math 1 Math
2 Bob Programming 2 Programming
2 Bob Physics null null
2 Bob Math 2 Math
13 John Programming 13 Programming
13 John Physics 13 Physics
13 John Math 13 Math
6 Alex Programming null null
6 Alex Physics null null
6 Alex Math null null

After grouping the left joined result and count the e.studentId, the table will look like:

SELECT *, COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name

Table: Grouped Result with Count:

student_id student_name subject_name student_id subject_name attended_exams
1 Alice Programming 1 Programming 1
1 Alice Physics 1 Physics 2
1 Alice Math 1 Math 3
2 Bob Programming 2 Programming 1
2 Bob Physics null null 0
2 Bob Math 2 Math 1
13 John Programming 13 Programming 1
13 John Physics 13 Physics 1
13 John Math 13 Math 1
6 Alex Programming null null 0
6 Alex Physics null null 0
6 Alex Math null null 0

After selecting the specified columns, the result table becomes:

SELECT 
    s.student_id,
    s.student_name,
    sub.subject_name,
    COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name
student_id student_name subject_name attended_exams
1 Alice Programming 1
1 Alice Physics 2
1 Alice Math 3
2 Bob Programming 1
2 Bob Physics 0
2 Bob Math 1
13 John Programming 1
13 John Physics 1
13 John Math 1
6 Alex Programming 0
6 Alex Physics 0
6 Alex Math 0

Sort the selcted table based on the condition s.student_id and sub.subject_name, return the final result:

Table: Final Result:

student_id student_name subject_name attended_exams
1 Alice Math 3
1 Alice Physics 2
1 Alice Programming 1
2 Bob Math 1
2 Bob Physics 0
2 Bob Programming 1
6 Alex Math 0
6 Alex Physics 0
6 Alex Programming 0
13 John Math 1
13 John Physics 1
13 John Programming 1


Basic Aggregate Functions

Average Selling Price

SHOW PROBLEM

Table: Prices

Column Name Type
product_id int
start_date date
end_date date
price int
  • The primary key for this table is the combination of (product_id, start_date, end_date), ensuring unique periods for each product.
  • Each row represents the price of a product for a specific period, from start_date to end_date.
  • For each product, no two periods will overlap.

Table: UnitsSold

Column Name Type
product_id int
purchase_date date
units int
  • This table may contain duplicate rows.
  • Each row represents the date, number of units, and product_id for a product sold on that date.

Problem:

Write a SQL query to find the average selling price for each product. The average_price should be rounded to two decimal places. If a product does not have any sold units, its average selling price should be considered 0.

Return the result table in any order.


Input:

Prices table:

product_id start_date end_date price
1 2019-02-17 2019-02-28 5
1 2019-03-01 2019-03-22 20
2 2019-02-01 2019-02-20 15
2 2019-02-21 2019-03-31 30

UnitsSold table:

product_id purchase_date units
1 2019-02-25 100
1 2019-03-01 15
2 2019-02-10 200
2 2019-03-22 30

Output:

product_id average_price
1 6.96
2 16.96

Explanation:

  • For product 1:

    • From 2019-02-17 to 2019-02-28, 100 units were sold at a price of 5.
    • From 2019-03-01 to 2019-03-22, 15 units were sold at a price of 20.
    • Average price is calculated based on the total price and total units sold.
  • For product 2:

    • From 2019-02-01 to 2019-02-20, 200 units were sold at a price of 15.
    • From 2019-02-21 to 2019-03-31, 30 units were sold at a price of 30.
    • Average price is calculated based on the total price and total units sold.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM
    Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id 
    AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

SHOW NOTES: Right Answer

MySQL Query Walkthrough:

Table: Prices

product_id start_date end_date price
1 2019-02-17 2019-02-28 5
1 2019-03-01 2019-03-22 20
2 2019-02-01 2019-02-20 15
2 2019-02-21 2019-03-31 30
3 2019-02-21 2019-03-31 30

Table: UnitsSold

product_id purchase_date units
1 2019-02-25 100
1 2019-03-01 15
2 2019-02-10 200
2 2019-03-22 30

Join the two tables based on the product_id and the purchase_date, where the purchase_date falls between the start_date and the end_date:

SELECT * 
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date;
product_id start_date end_date price product_id purchase_date units
1 2019-02-17 2019-02-28 5 1 2019-02-25 100
1 2019-03-01 2019-03-22 20 1 2019-03-01 15
2 2019-02-01 2019-02-20 15 2 2019-02-10 200
2 2019-02-21 2019-03-31 30 2 2019-03-22 30
3 2019-02-21 2019-03-31 30 null null null

Group the joined result and then use aggregation functions to calculate the average price:

SELECT 
    *,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;
product_id start_date end_date price product_id purchase_date units average_price
1 2019-02-17 2019-02-28 5 1 2019-02-25 100 6.96
2 2019-02-01 2019-02-20 15 2 2019-02-10 200 16.96
3 2019-02-21 2019-03-31 30 null null null 0

Select the desired columns:

SELECT 
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;
product_id average_price
1 6.96
2 16.96
3 0

SHOW NOTES: Wrong Answer
# Wrong Answer
SELECT
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM
    Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

Join the two tables based on the condition p.product_id = u.product_id:

SELECT *
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id;
product_id start_date end_date price product_id purchase_date units
1 2019-02-17 2019-02-28 5 1 2019-03-01 15
1 2019-02-17 2019-02-28 5 1 2019-02-25 100
1 2019-03-01 2019-03-22 20 1 2019-03-01 15
1 2019-03-01 2019-03-22 20 1 2019-02-25 100
2 2019-02-01 2019-02-20 15 2 2019-03-22 30
2 2019-02-01 2019-02-20 15 2 2019-02-10 200
2 2019-02-21 2019-03-31 30 2 2019-03-22 30
2 2019-02-21 2019-03-31 30 2 2019-02-10 200
3 2019-02-21 2019-03-31 30 null null null

Filter out the rows that do not fall between start_date and end_date:

SELECT *
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date;
product_id start_date end_date price product_id purchase_date units
1 2019-02-17 2019-02-28 5 1 2019-02-25 100
1 2019-03-01 2019-03-22 20 1 2019-03-01 15
2 2019-02-01 2019-02-20 15 2 2019-02-10 200
2 2019-02-21 2019-03-31 30 2 2019-03-22 30

Group the filtered result and then use aggregation functions to calculate average price:

SELECT 
    *,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;
product_id start_date end_date price product_id purchase_date units average_price
1 2019-02-17 2019-02-28 5 1 2019-02-25 100 6.96
2 2019-02-01 2019-02-20 15 2 2019-02-10 200 16.96

Select the desired columns:

SELECT 
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;
product_id average_price
1 6.96
2 16.96

The correct answer is:

product_id average_price
1 6.96
2 16.96
3 0

Summary: The LEFT JOIN operation will return NULL results for rows without without matching records in the UnitsSold table. In the condition p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date, the date range check is applied during the join operation, while WHERE purchase_date BETWEEN start_date AND end_date is applied after the join operation. The WHERE clause exlcudes rows with NULL values, which means products with no matching sales records (i.e., where purchase_date doesn’t match) will be excluded from the result. When grouping the result, no rows containing NULL values remain, leading to an incorrect result (missing product_id = 3 with average_price = 0).


Percentage of Users Attended a Contest

SHOW PROBLEM

Table: Users

Column Name Type
user_id int
user_name varchar
  • user_id is the primary key (a unique identifier) for this table.
  • Each row in this table represents a user, with their unique ID and name.

Table: Register

Column Name Type
contest_id int
user_id int
  • (contest_id, user_id) is the primary key (a unique combination of columns) for this table.
  • Each row in this table represents the registration of a user in a specific contest.

Task

Write a solution to calculate the percentage of users registered for each contest, rounded to two decimal places.

Return the result table sorted by percentage in descending order. If there is a tie in percentage, order by contest_id in ascending order.


Input:

Users table:

user_id user_name
6 Alice
2 Bob
7 Alex

Register table:

contest_id user_id
215 6
209 2
208 2
210 6
208 6
209 7
209 6
215 7
208 7
210 2
207 2
210 7

Output:

contest_id percentage
208 100.0
209 100.0
210 100.0
215 66.67
207 33.33

Explanation:

  • Contests 208, 209, and 210 had 100% user registration. The results are sorted by contest_id in ascending order.
  • Contest 215 had a registration rate of 66.67%, as Alice and Alex registered, out of a total of three users.
  • Contest 207 had a registration rate of 33.33%, as only Bob registered, out of a total of three users.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    contest_id,
    ROUND(COUNT(user_id) * 100 / (SELECT COUNT(user_id) FROM Users),2) percentage
FROM Register a 
GROUP BY contest_id
ORDER BY percentage DESC,contest_id ASC;

SHOW NOTES

$$ \text{percentage} = \frac{\text{number of users attended in the contest}}{\text{total number of users}} \times 100 \% $$

  • Compute the total number of users: SELECT COUNT(user_id) FROM Users
  • Compute the number of users attended in the contest: Group the table based on the column contest_id, then count the number of users in each group.


Queries Quality and Percentage

SHOW PROBLEM

Table: Queries

Column Name Type
query_name varchar
result varchar
position int
rating int
  • This table may contain duplicate rows.
  • It contains information collected from various queries executed on a database.
  • The position column has a value ranging from 1 to 500.
  • The rating column has a value between 1 and 5. Queries with a rating less than 3 are considered poor queries.

Definitions:

  • Query quality: The average of the ratio between the query’s rating and its position.
  • Poor query percentage: The percentage of queries with a rating less than 3.

Objective: Write a solution to find:

  • The query_name,
  • The quality of each query (rounded to 2 decimal places),
  • The poor_query_percentage for each query (rounded to 2 decimal places).

Input: Queries table:

query_name result position rating
Dog Golden Retriever 1 5
Dog German Shepherd 2 5
Dog Mule 200 1
Cat Shirazi 5 2
Cat Siamese 3 3
Cat Sphynx 7 4

Output:

query_name quality poor_query_percentage
Dog 2.50 33.33
Cat 0.66 33.33

Explanation:

  • Dog queries:

    • Quality:
      $$ \left( \frac{5}{1} + \frac{5}{2} + \frac{1}{200} \right) / 3 = 2.50 $$
    • Poor query percentage:
      $$ \frac{1}{3} \times 100 = 33.33 $$
  • Cat queries:

    • Quality:
      $$ \left( \frac{2}{5} + \frac{3}{3} + \frac{4}{7} \right) / 3 = 0.66 $$
    • Poor query percentage:
      $$ \frac{1}{3} \times 100 = 33.33 $$

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT query_name,
       ROUND(AVG(rating / position), 2) AS quality,
       ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS poor_query_percentage
FROM Queries
GROUP BY query_name;

Monthly Transactions I

SHOW PROBLEM

Table: Transactions

Column Name Type
id int
country varchar
state enum
amount int
trans_date date
  • id is the primary key of this table.
  • The table stores information about incoming transactions.
  • The state column is an enum with values [“approved”, “declined”].

Task:

For each month and country, find the following information:

  • The total number of transactions (trans_count).
  • The total amount of all transactions (trans_total_amount).
  • The number of approved transactions (approved_count).
  • The total amount of approved transactions (approved_total_amount).

The results should be returned in any order.


Example Input:

id country state amount trans_date
121 US approved 1000 2018-12-18
122 US declined 2000 2018-12-19
123 US approved 2000 2019-01-01
124 DE approved 2000 2019-01-07

Example Output:

month country trans_count approved_count trans_total_amount approved_total_amount
2018-12 US 2 1 3000 1000
2019-01 US 1 1 2000 2000
2019-01 DE 1 1 2000 2000

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT
    DATE_FORMAT(trans_date, '%Y-%m') AS month,
    country,
    COUNT(*) AS trans_count,
    SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
    SUM(amount) AS trans_total_amount,
    SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country;

SELECT
    DATE_FORMAT(trans_date, '%Y-%m') AS month,
    country,
    COUNT(*) AS trans_count,
    COUNT(IF(state = 'approved', 1, NULL)) AS approved_count,
    SUM(amount) AS trans_total_amount,
    SUM(IF(state = 'approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country;

SHOW NOTES
  • The DATE_FORMAT function in SQL is used to format a date into a specific string format. For example, DATE_FORMAT('2024-12-19', '%Y-%m') will output '2024-12', extracting the year and month from the date.
  • The IF function in SQL returns one value if the condition is true and another value if the condition is false. For example, IF(amount > 1000, 1, 0) will return 1 if the amount is greater than 1000, otherwise, it will return 0.
  • The CASE...WHEN...THEN...ELSE...END statement is similar to the switch statement in other programming languages like Java. It works like the IF function, but with the ability to handle multiple conditions. It checks each condition in sequence and returns the corresponding result for the first true condition.


Immediate Food Delivery II

SHOW PROBLEM

Table: Delivery

Column Name Type
delivery_id int
customer_id int
order_date date
customer_pref_delivery_date date
  • delivery_id is the unique identifier for each delivery.
  • The table contains information about food deliveries, where customers place orders on a specific date and specify a preferred delivery date (either on the same day or later).
  • If the customer_pref_delivery_date is the same as the order_date, the order is considered immediate; otherwise, it is scheduled.
  • The first order of a customer is defined as the one with the earliest order_date. Each customer has exactly one first order.

Task:

Find the percentage of immediate orders among the first orders of all customers, rounded to two decimal places.

The result should follow the format shown below:


Example Input:

delivery_id customer_id order_date customer_pref_delivery_date
1 1 2019-08-01 2019-08-02
2 2 2019-08-02 2019-08-02
3 1 2019-08-11 2019-08-12
4 3 2019-08-24 2019-08-24
5 3 2019-08-21 2019-08-22
6 2 2019-08-11 2019-08-13
7 4 2019-08-09 2019-08-09

Example Output:

immediate_percentage
50.00

Explanation:

  • Customer 1 has their first order with delivery_id 1, which is scheduled.
  • Customer 2 has their first order with delivery_id 2, which is immediate.
  • Customer 3 has their first order with delivery_id 5, which is scheduled.
  • Customer 4 has their first order with delivery_id 7, which is immediate.

Hence, 50% of the first orders are immediate.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    ROUND((SUM(IF(order_date = customer_pref_delivery_date, 1, 0)) / COUNT(*)) * 100, 2) AS immediate_percentage
FROM 
    (SELECT 
        customer_id, 
        MIN(order_date) AS first_order_date
    FROM Delivery
    GROUP BY customer_id) AS first_orders
INNER JOIN Delivery AS d 
    ON first_orders.customer_id = d.customer_id 
    AND first_orders.first_order_date = d.order_date;

Game Play Analysis IV

SHOW PROBLEM

Table: Activity

Column Name Type
player_id int
device_id int
event_date date
games_played int
  • The combination of (player_id, event_date) is the primary key of this table, ensuring each player can only log in once per day.
  • This table tracks the activity of players in games, where each record represents a player’s login and the number of games played (which could be zero) before logging out on a specific day using a particular device.

Task:

Calculate the fraction of players who logged in again on the day after their first login date. The result should be rounded to two decimal places.

In other words, identify the players who logged in on consecutive days starting from their first login date, then divide that count by the total number of players.


Example Input:

player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-03-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5

Example Output:

fraction
0.33

Explanation:

  • Player 1 logged in on 2016-03-01 and 2016-03-02, meaning they logged in for at least two consecutive days, starting from their first login.
  • Player 2 did not log in again on the day after their first login.
  • Player 3 did not log in on consecutive days, as there was a gap between their logins.

Thus, only player 1 satisfies the condition, and the fraction is calculated as 1/3 = 0.33.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    ROUND(
        SUM(IF(DATEDIFF(event_date, min_event_date) = 1, 1, 0)) / COUNT(DISTINCT player_id), 2) AS fraction
FROM (
    SELECT 
        player_id,
        event_date,
        MIN(event_date) OVER (PARTITION BY player_id) AS min_event_date
    FROM activity
) AS activity_with_min_date;

SHOW NOTES
  • The DATEDIFF function in MySQL calculates the difference in days between two dates. For example, DATEDIFF('2024-12-30', '2024-12-25') will output 5, indicating there are 5 days between the two dates.
  • The PARTITION BY clause is used in window functions to divide the result ser into partitions (groups) based on a specified column. It applies window functions, such as RANK(), SUM(), ROW_NUMBER(), and others, to each partition. This works similarly to GROUP BY, but unlike GROUP BY, the PARTITION BY clause allows retaining the row-level data while applying the window functions.

Input Table: employees

department_id employee_id salary
101 1 5000
101 2 4000
101 3 6000
102 4 4500
102 5 5500
102 6 4800
SELECT 
    department_id, 
    employee_id, 
    salary, 
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees;

Output (Result of Query):

department_id employee_id salary salary_rank
101 3 6000 1
101 1 5000 2
101 2 4000 3
102 5 5500 1
102 6 4800 2
102 4 4500 3


Sorting and Grouping

User Activity for the Past 30 Days I

SHOW PROBLEM

Table: Activity

Column Name Type
user_id int
session_id int
activity_date date
activity_type enum
  • The activity_type column is an ENUM with values: ‘open_session’, ’end_session’, ‘scroll_down’, ‘send_message’.
  • This table logs user activities for a social media website, where each session is linked to exactly one user.
  • The table may contain duplicate rows.

Task:

You need to find the count of unique active users per day for the 30-day period ending on 2019-07-27, inclusive. A user is considered active on a particular day if they perform at least one activity on that day.


Example:

Input: Activity table

user_id session_id activity_date activity_type
1 1 2019-07-20 open_session
1 1 2019-07-20 scroll_down
1 1 2019-07-20 end_session
2 4 2019-07-20 open_session
2 4 2019-07-21 send_message
2 4 2019-07-21 end_session
3 2 2019-07-21 open_session
3 2 2019-07-21 send_message
3 2 2019-07-21 end_session
4 3 2019-06-25 open_session
4 3 2019-06-25 end_session

Output:

day active_users
2019-07-20 2
2019-07-21 2

Explanation:

  • Only the days with active users (those who performed at least one activity) are included in the output.
  • For 2019-07-20, users 1 and 2 were active, and for 2019-07-21, users 2 and 3 were active.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT activity_date AS day,
       COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND DATEDIFF('2019-07-27', activity_date) < 30
GROUP BY activity_date

SELECT activity_date AS day,
       COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND activity_date> DATE_SUB('2019-07-27',INTERVAL 30 day)
GROUP BY activity_date

SHOW NOTES

In MySQL, both DATEDIFF and DATE_SUB are used for date calculations. DATEDIFF calculates the difference between two dates, returning the result in days. On the other hand, DATE_SUB is used to subtract a specified time interval (such as days, months, or years) from a given date.

SELECT DATEDIFF('2023-12-31', '2023-01-01'); 
-- Result: 364

SELECT DATE_SUB('2023-12-31', INTERVAL 30 DAY); 
-- Result: '2023-12-01'

SELECT DATE_SUB('2023-12-31', INTERVAL 2 MONTH);
-- Result: '2023-10-31'


Product Sales Analysis III

SHOW PROBLEM

Table: Sales

Column Name Type
sale_id int
product_id int
year int
quantity int
price int
  • The combination of sale_id and year is the primary key of this table, ensuring each record is unique for a given sale.
  • product_id is a foreign key referencing the Product table.
  • Each row in this table represents a sale of a specific product (product_id) in a given year.
  • The price refers to the price per unit of the product.

Table: Product

Column Name Type
product_id int
product_name varchar
  • product_id is the primary key of this table, ensuring unique product identifiers.
  • Each row represents the name of a product associated with the corresponding product_id.

Task: Write a query to select the product_id, year, quantity, and price for the first year in which each product was sold.

Return the result in any order.


Example:

Input: Sales table

sale_id product_id year quantity price
1 100 2008 10 5000
2 100 2009 12 5000
7 200 2011 15 9000

Product table:

product_id product_name
100 Nokia
200 Apple
300 Samsung

Output:

product_id first_year quantity price
100 2008 10 5000
200 2011 15 9000

Explanation:

  • For product_id = 100, the first sale occurred in 2008 with a quantity of 10 and a price of 5000.
  • For product_id = 200, the first sale occurred in 2011 with a quantity of 15 and a price of 9000.

The query should return the first sale year for each product, along with the corresponding quantity and price.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT
    product_id,
    year AS first_year,
    quantity,
    price
FROM Sales
WHERE
    (product_id, year) IN (
        SELECT
            product_id,
            MIN(year) AS year
        FROM Sales
        GROUP BY product_id
    );

SELECT product_id, first_year, quantity, price
FROM (
    SELECT 
        product_id,
        year AS first_year,
        quantity,
        price,
        RANK() OVER (PARTITION BY product_id ORDER BY year) AS row_num
    FROM Sales
) subquery
WHERE row_num = 1;

SELECT s.product_id,
       s.year AS first_year,
       s.quantity,
       s.price
FROM Sales s
JOIN (
    SELECT product_id, MIN(year) AS first_year
    FROM Sales
    GROUP BY product_id
) first_sale ON s.product_id = first_sale.product_id
             AND s.year = first_sale.first_year;

SHOW NOTES


Biggest Single Number

SHOW PROBLEM

Table: MyNumbers

Column Name Type
num int

This table may contain duplicates (i.e., there is no primary key in the SQL table). Each row contains an integer.

Problem Description

A single number is a number that appears only once in the MyNumbers table.

The task is to find the largest single number. If there is no single number, return null.


Example 1:

Input: MyNumbers table:

num
8
8
3
3
1
4
5
6

Output:

num
6

Explanation:
The single numbers are 1, 4, 5, and 6. Since 6 is the largest single number, we return it.

Example 2:

Input: MyNumbers table:

num
8
8
7
7
3
3
3

Output:

num
null

Explanation:
There are no single numbers in the input table, so we return null.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT MAX(num) AS num
FROM
    (
        SELECT num
        FROM MyNumbers
        GROUP BY 1
        HAVING COUNT(1) = 1
    ) AS t;

SELECT
    IF(COUNT(num) = 1, num, null) AS num
FROM MyNumbers
GROUP BY num
ORDER BY 1 DESC
LIMIT 1;

Advanced Select & Joins

The Number of Employees Which Report to Each Employee

SHOW PROBLEM

Table: Employees

Column Name Type
employee_id int
name varchar
reports_to int
age int
  • employee_id is the unique identifier for each employee in the table.
  • This table stores information about employees and the ID of the manager they report to. Some employees may not report to anyone (reports_to is null).
  • A manager is defined as an employee who has at least one other employee reporting to them.

Problem Description

You are required to write a solution that reports the following information for each manager:

  • The manager’s employee_id and name.
  • The number of employees directly reporting to the manager.
  • The average age of these employees, rounded to the nearest integer.

The results should be ordered by employee_id.


Example 1:

Input:

Employees table:

employee_id name reports_to age
9 Hercy null 43
6 Alice 9 41
4 Bob 9 36
2 Winston null 37

Output:

employee_id name reports_count average_age
9 Hercy 2 39

Explanation:

  • Hercy manages 2 employees (Alice and Bob).
  • The average age of Alice and Bob is (41 + 36) / 2 = 38.5, which rounds to 39.

Example 2:

Input:

Employees table:

employee_id name reports_to age
1 Michael null 45
2 Alice 1 38
3 Bob 1 42
4 Charlie 2 34
5 David 2 40
6 Eve 3 37
7 Frank null 50
8 Grace null 48

Output:

employee_id name reports_count average_age
1 Michael 2 40
2 Alice 2 37
3 Bob 1 37

Explanation:

  • Michael manages Alice and Bob. The average age of Alice and Bob is (38 + 42) / 2 = 40.
  • Alice manages Charlie and David, with an average age of (34 + 40) / 2 = 37.
  • Bob manages Eve, with an average age of 37.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    e1.employee_id,
    e1.name, 
    COUNT(e1.employee_id) AS reports_count,
    ROUND(AVG(e2.age)) AS average_age
FROM Employees e1
INNER JOIN Employees e2
ON e1.employee_id = e2.reports_to
GROUP BY e1.employee_id
ORDER BY e1.employee_id;

Primary Department for Each Employee

SHOW PROBLEM

Table: Employee

Column Name Type
employee_id int
department_id int
primary_flag varchar
  • The combination of (employee_id, department_id) is the primary key for this table, meaning each employee can belong to multiple departments, but each (employee_id, department_id) pair is unique.
  • employee_id: The ID of the employee.
  • department_id: The ID of the department to which the employee belongs.
  • primary_flag: A flag indicating whether the department is the primary department for the employee. It can be one of the following:
    • 'Y': The department is the primary department.
    • 'N': The department is not the primary department.

Problem Description

Employees may belong to multiple departments, and when they do, they must designate one department as their primary. If an employee belongs to only one department, the primary_flag for that department will be 'N'.

You are tasked with reporting the primary department for each employee. If an employee has only one department, report that department as their primary.


Input:

Employee table:

employee_id department_id primary_flag
1 1 N
2 1 Y
2 2 N
3 3 N
4 2 N
4 3 Y
4 4 N

Output:

employee_id department_id
1 1
2 1
3 3
4 3

Explanation:

  • For employee 1, their only department is department 1, so it is reported as their primary department.
  • Employee 2 belongs to two departments (1 and 2). The primary_flag for department 1 is 'Y', so department 1 is their primary department.
  • Employee 3 only belongs to department 3, so it is reported as their primary department.
  • Employee 4 belongs to three departments (2, 3, and 4). The primary_flag for department 3 is 'Y', so department 3 is reported as their primary department.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT employee_id, department_id
FROM Employee
WHERE primary_flag = 'Y'
UNION
SELECT employee_id, department_id
FROM Employee
GROUP BY employee_id
HAVING COUNT(1) = 1;

SHOW NOTES

The UNION operator is used to combine the results of two or more SELECT queries into a single result set. By default, UNION removes duplicate rows, ensuring that the final result contains only distinct records. In contrast, UNION ALL includes all rows, even if they are duplicates. It’s important to note that each SELECT statement involved in a UNION operation must contain the same number of columns, and the corresponding columns must have compatable data types.


Triangle Judgement

SHOW PROBLEM

Table: Triangle

Column Name Type
x int
y int
z int
  • Primary Key: (x, y, z)
    Each row in this table represents the lengths of three line segments.

Task:
Determine whether the three line segments (x, y, z) from each row can form a triangle.
Return the result table in any order, including a column that specifies if the segments form a triangle.

Triangle Formation Rule:
Three segments can form a triangle if and only if the following conditions are met:

  1. $x + y > z$
  2. $x + z > y$
  3. $y + z > x$

Example:

Input:

**Triangle table: **

x y z
13 15 30
10 20 15

Output:

x y z triangle
13 15 30 No
10 20 15 Yes

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT
    *,
    IF(x + y > z AND x + z > y AND y + z > x, 'Yes', 'No') AS triangle
FROM
Triangle;

SELECT 
    *, 
    CASE 
        WHEN (x + y > z) AND (x + z > y) AND (y + z > x) THEN 'Yes'
        ELSE 'No'
    END AS triangle
FROM 
Triangle;

Consecutive Numbers

SHOW PROBLEM

Table: Logs

Column Name Type
id int
num varchar
  • id is the primary key for this table.
  • The id column is an auto-increment column that starts from 1.

Problem Statement:

Find all numbers (num) that appear at least three times consecutively in the table.

Return the result table in any order.


Example:

Input:

Logs table:

id num
1 1
2 1
3 1
4 2
5 1
6 2
7 2

Output:

ConsecutiveNums
1

Explanation:

The number 1 is the only value that appears at least three times consecutively in the table.


Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1
JOIN Logs l2 ON l1.id = l2.id - 1
JOIN Logs l3 ON l2.id = l3.id - 1
WHERE l1.num = l2.num AND l2.num = l3.num;

SELECT DISTINCT l2.num AS ConsecutiveNums
FROM Logs AS l1
JOIN Logs AS l2 ON l1.id = l2.id - 1 AND l1.num = l2.num
JOIN Logs AS l3 ON l2.id = l3.id - 1 AND l2.num = l3.num;

Product Price at a Given Date

SHOW PROBLEM

Table: Products

Column Name Type
product_id int
new_price int
change_date date
  • (product_id, change_date) is the primary key (a combination of columns with unique values) of this table.
  • Each row indicates that the price of a product was changed to a new price on a specific date.

Problem Statement

Write a query to find the prices of all products on 2019-08-16. Assume the price of all products before any change is 10.

Return the result in any order.


Example:

Input:

Products table:

product_id new_price change_date
1 20 2019-08-14
2 50 2019-08-14
1 30 2019-08-15
1 35 2019-08-16
2 65 2019-08-17
3 20 2019-08-18

Output:

product_id price
2 50
1 35
3 10

Go to Leetcode ๐Ÿ”—
SHOW CODE
WITH
    T AS (SELECT DISTINCT product_id FROM Products),
    P AS (
        SELECT product_id, new_price AS price
        FROM Products
        WHERE
            (product_id, change_date) IN (
                SELECT product_id, MAX(change_date) AS change_date
                FROM Products
                WHERE change_date <= '2019-08-16'
                GROUP BY 1
            )
    )

SELECT product_id, IFNULL(price, 10) AS price
FROM T
LEFT JOIN P USING (product_id);

WITH
    P AS (
        SELECT p1.product_id, new_price, change_date
        FROM
            (
                SELECT DISTINCT product_id
                FROM Products
            ) AS p1
            LEFT JOIN Products AS p2
                ON p1.product_id = p2.product_id AND p2.change_date <= '2019-08-16'
    ),
    T AS (
        SELECT
            *,
            RANK() OVER (
                PARTITION BY product_id
                ORDER BY change_date DESC
            ) AS rk
        FROM P
    )
SELECT product_id, IFNULL(new_price, 10) AS price
FROM T
WHERE rk = 1;

SHOW NOTES


Last Person to Fit in the Bus

SHOW PROBLEM

Table: Queue

Column Name Type
person_id int
person_name varchar
weight int
turn int
  • The person_id column contains unique values.
  • This table holds information about people waiting to board a bus.
  • The person_id and turn columns contain values from 1 to n, where n is the number of rows in the table.
  • The turn column determines the order in which people will board the bus. A turn value of 1 indicates the first person to board, and turn = n indicates the last person to board.
  • The weight column represents the personโ€™s weight in kilograms.

The bus has a weight limit of 1000 kilograms, so some people may not be able to board if the total weight exceeds this limit. The goal is to identify the name of the last person who can board the bus without exceeding the weight limit.

  • Only one person can board the bus at any given time, based on their turn.

Input:

Queue table:

person_id person_name weight turn
5 Alice 250 1
4 Bob 175 5
3 Alex 350 2
6 John Cena 400 3
1 Winston 500 6
2 Marie 200 4

Output:

person_name
John Cena

Explanation:

The following table shows the order in which people board the bus, ordered by the turn column for simplicity:

Turn person_id person_name weight Total Weight
1 5 Alice 250 250
2 3 Alex 350 600
3 6 John Cena 400 1000
4 2 Marie 200 1200
5 4 Bob 175 ___
6 1 Winston 500 ___

John Cena is the last person who can board the bus without exceeding the weight limit of 1000 kilograms.


Go to Leetcode ๐Ÿ”—
SHOW CODE
WITH
    T AS (
        SELECT
            person_name,
            SUM(weight) OVER (ORDER BY turn) AS acumulated_weight
        FROM Queue
    )
SELECT person_name
FROM T
WHERE acumulated_weight <= 1000
ORDER BY acumulated_weight DESC
LIMIT 1;

Count Salary Categories

SHOW PROBLEM

Table: Accounts

Column Name Type
account_id int
income int
  • The account_id column is the primary key, meaning it contains unique values for each row.
  • Each row represents the monthly income of a specific bank account.

Task:

Write a solution to calculate the number of bank accounts in each salary category. The salary categories are defined as:

  • “Low Salary”: Salaries strictly less than $20,000.
  • “Average Salary”: Salaries in the inclusive range [$20,000, $50,000].
  • “High Salary”: Salaries strictly greater than $50,000.

The result table should contain all three categories. If no accounts fall into a category, return 0 for that category.

The result can be returned in any order.


Input:

Accounts table:

account_id income
3 108939
2 12747
8 87709
6 91796

Output:

category accounts_count
Low Salary 1
Average Salary 0
High Salary 3

Explanation:

  • Low Salary: Account 2 has an income of $12,747, which is below $20,000.
  • Average Salary: There are no accounts with an income between $20,000 and $50,000.
  • High Salary: Accounts 3, 6, and 8 have incomes above $50,000.

Go to Leetcode ๐Ÿ”—
SHOW CODE
WITH
    S AS (
        SELECT 'Low Salary' AS category
        UNION
        SELECT 'Average Salary'
        UNION
        SELECT 'High Salary'
    ),
    T AS (
        SELECT
            CASE
                WHEN income < 20000 THEN 'Low Salary'
                WHEN income > 50000 THEN 'High Salary'
                ELSE 'Average Salary'
            END AS category,
            COUNT(1) AS accounts_count
        FROM Accounts
        GROUP BY category
    )
SELECT category, IFNULL(accounts_count, 0) AS accounts_count
FROM S
LEFT JOIN T USING (category);

Subqueries

Exchange Seats

SHOW PROBLEM

Table: Seat

Column Name Type
id int
student varchar
  • id is the primary key for this table.
  • Each row represents a student’s name and their seat ID. The id starts from 1 and increases consecutively.

Problem Description:

Write a query to swap the seat IDs of every two consecutive students in the Seat table. If the number of students is odd, the last student’s seat ID should remain unchanged.

The result should be ordered by the id column in ascending order.


Input:

Seat table:

id student
1 Abbot
2 Doris
3 Emerson
4 Green
5 Jeames

Output:

id student
1 Doris
2 Abbot
3 Green
4 Emerson
5 Jeames

Explanation:

  • The students Abbot and Doris swap places, as well as Emerson and Green.
  • Since there are an odd number of students, Jeames’s seat remains unchanged.

Go to Leetcode ๐Ÿ”—
SHOW CODE
SELECT 
    CASE
        WHEN id % 2 = 1 AND id < (SELECT MAX(id) FROM Seat) THEN id + 1
        WHEN id % 2 = 0 THEN id - 1
        ELSE id
    END AS id,
    student
FROM Seat
ORDER BY id;

Movie Rating

SHOW PROBLEM

Table: Movies

Column Name Type
movie_id int
title varchar

movie_id is the primary key (column with unique values) for this table. title is the name of the movie.

Table: Users

Column Name Type
user_id int
name varchar

user_id is the primary key (column with unique values) for this table. The column ’name’ has unique values.

Table: MovieRating

Column Name Type
movie_id int
user_id int
rating int
created_at date

(movie_id, user_id) is the primary key (column with unique values) for this table. This table contains the rating of a movie by a user in their review. created_at is the user’s review date.

Problem:

  1. Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
  2. Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.

Result format:

The result should contain two columns:

  1. results โ€“ containing the user name with the greatest number of movie ratings.
  2. results โ€“ containing the movie name with the highest average rating in February 2020.

Input:

Movies table:

movie_id title
1 Avengers
2 Frozen 2
3 Joker

Users table:

user_id name
1 Daniel
2 Monica
3 Maria
4 James

MovieRating table:

movie_id user_id rating created_at
1 1 3 2020-01-12
1 2 4 2020-02-11
1 3 2 2020-02-12
1 4 1 2020-01-01
2 1 5 2020-02-17
2 2 2 2020-02-01
2 3 2 2020-03-01
3 1 3 2020-02-22
3 2 4 2020-02-25

Output:

results
Daniel
Frozen 2

Explanation:

  • Daniel and Monica have rated 3 movies (“Avengers”, “Frozen 2” and “Joker”), but Daniel is lexicographically smaller.
  • “Frozen 2” and “Joker” both have an average rating of 3.5 in February, but “Frozen 2” is lexicographically smaller.

Go to Leetcode ๐Ÿ”—
SHOW CODE
(
    SELECT name AS results
    FROM
        Users
        JOIN MovieRating USING (user_id)
    GROUP BY user_id
    ORDER BY COUNT(1) DESC, name
    LIMIT 1
)
UNION ALL
(
    SELECT title
    FROM
        MovieRating
        JOIN Movies USING (movie_id)
    WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
    GROUP BY movie_id
    ORDER BY AVG(rating) DESC, title
    LIMIT 1
);

Index

In MySQL, an index is a database structure that improves the speed of data retrival operations on a table. Indexes are created on one or more columns of a table and allow the database to quickly locate data without scanning the entire table. Indexes are typically implemented using data structures, with the B+ tree being the most common.

There are four main types of indexes in MySQL: primary index, unique index, ordinary index, and full-text index.

  • A primary index is a unique index that identifies each row in a table. It is automatically created when a primary key is defined.
  • A unique index ensures that all values in the indexed columns(s) are unique.
  • An ordinary index improves query performance but does not enforce uniqueness.
  • A full-text index is used for full-text searches on text-based columns.
SHOW CODE
-- Primary Index
CREATE TABLE employees (
    id INT PRIMARY KEY, -- Primary index
    name VARCHAR(100)
);

-- Unique Index
CREATE UNIQUE INDEX idx_email ON employees(email);

-- Ordinary Index
CREATE INDEX idx_name ON employees(name);

-- Full Text Index
CREATE FULLTEXT INDEX idx_description ON products(description);
SELECT * FROM products
WHERE MATCH(description) AGAINST('database');

Clustered Index vs. Secondary Index

In MySQL, indexes can be categorized into two types: clustered index and secondary index.

  • A clustered index determines the physical order of data rows in a table. The table data is stored in the order of the clustered index.
  • A secondary index does not affect the physical order of data rows in a table. Instead, it stores a separate structure that points to the actual data rows.
Clustered Index vs. Secondary Index
SHOW CODE
CREATE TABLE employees (
    id INT PRIMARY KEY, -- Clustered index
    name VARCHAR(100),
    salary DECIMAL(10, 2),
    INDEX idx_name (name) -- Secondary index
);

Back-to-Table Queries


SQL Performance Analysis:

  1. SQL Execution Frequency
  2. Slow Query Log
  3. show profiles
  4. explain

Stored Procedure

A stored procedure is a precompiled SQL code block that is stored in a database and can be executed repeatedly. It is particularly useful in scenarios involving complex SQL logic, as it helps streamline operations, improve performance, and enhance code reusability.

SHOW CODE
-- Create a Stored Procedure:
DELIMITER //
CREATE PROCEDURE get_all_users()
BEGIN
    SELECT * FROM users;
END //
DELIMITER ;

-- Call a Procedure**:
CALL get_all_users();

-- List all procedures
show procedure status where db = 'mydatabase';

-- View the specific stored procedure
show create procedure get_all_users;

-- Delete a Stored Procedure:
drop procedure get_all_users;

Variables

Variables in MySQL can be categorized into four types based on their scope: global, session, user-defined, and local.

  • Global variables and session variables are used to control server behavior.
    • Global variables affect all sessions connected to the server.
    • Session variables apply only to the current session.
  • User-defined variables are defined within a session and persist until the session ends.
  • Local variables are used within stored procedures and have a scope limited to the procedure in which they are declared.
SHOW CODE
# Global Variables
SET GLOBAL max_connections = 200;

-- Session Variables
SET SESSION sql_mode = 'STRICT_TRANS_TABLES';

-- User-Defined Variables
SET @my_var = 100;
SELECT @my_var;

-- Local Variables
DELIMITER //
CREATE PROCEDURE test_proc()
BEGIN
    DECLARE my_local_var INT DEFAULT 10;
    SELECT my_local_var;
END //
DELIMITER ;

Control Flow Statements

MySQL supports various control flow statements, including IF, CASE...WHEN, WHILE, REPEAT, and LOOP.

SHOW CODE
# IF statement
DELIMITER //
CREATE PROCEDURE check_salary(IN emp_id INT)
BEGIN
    DECLARE emp_salary INT;
    SELECT salary INTO emp_salary FROM employees WHERE id = emp_id;
    
    IF emp_salary > 5000 THEN
        SELECT 'High Salary';
    ELSEIF emp_salary BETWEEN 3000 AND 5000 THEN
        SELECT 'Medium Salary';
    ELSE
        SELECT 'Low Salary';
    END IF;
END //
DELIMITER ;

-- CASE WHEN statement
DELIMITER //
CREATE PROCEDURE check_grade(IN marks INT)
BEGIN
    DECLARE grade CHAR(1);
    
    CASE 
        WHEN marks >= 90 THEN SET grade = 'A';
        WHEN marks >= 80 THEN SET grade = 'B';
        WHEN marks >= 70 THEN SET grade = 'C';
        ELSE SET grade = 'F';
    END CASE;
    
    SELECT grade;
END //
DELIMITER ;

-- WHILE
DELIMITER //
CREATE PROCEDURE count_down()
BEGIN
    DECLARE x INT DEFAULT 5;
    
    WHILE x > 0 DO
        SELECT x;
        SET x = x - 1;
    END WHILE;
END //
DELIMITER ;

-- REPEAT (similar to do...while in Java)
DELIMITER //
CREATE PROCEDURE repeat_example()
BEGIN
    DECLARE x INT DEFAULT 1;

    REPEAT
        SELECT x;
        SET x = x + 1;
    UNTIL x > 5
    END REPEAT;
END //
DELIMITER ;

-- LOOP (Use an explicit LEAVE to exit)
DELIMITER //
CREATE PROCEDURE loop_example()
BEGIN
    DECLARE x INT DEFAULT 1;
    
    my_loop: LOOP
        SELECT x;
        SET x = x + 1;
        
        IF x > 5 THEN
            LEAVE my_loop;
        END IF;
    END LOOP;
END //
DELIMITER ;

Curosor and Handler

In MySQL, a cursor is a database object that enables row-by-row processing of a result set. On the other hand, a handler is used to manage exceptions or errors that may occur during execution. Cursors and handlers are commonly used in stored procedures when dealing with result sets that contain multiple rows and require row-by-row processing.

SHOW CODE
DELIMITER //
CREATE PROCEDURE fetch_salaries()
BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE emp_name VARCHAR(100);
    DECLARE emp_salary INT;

    DECLARE emp_cursor CURSOR FOR SELECT name, salary FROM employees;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    OPEN emp_cursor;

    read_loop: LOOP
        FETCH emp_cursor INTO emp_name, emp_salary;
        IF done THEN
            LEAVE read_loop;
        END IF;

        SELECT emp_name, emp_salary;
    END LOOP;

    CLOSE emp_cursor;
END //
DELIMITER ;

Trigger

In MySQL, a trigger is a data object that automatically executes a specified set of sql statements before or after certain events occurred, such as INSERT, UPDATE, or DELETE. It is commonly used in audit logging. MySQL only supports row-level trigger, meaning a trigger is activated FOR EACH ROW affected by INSERT, UPDATE, or DELETE operation.

SHOW CODE
-- Create a trigger
CREATE TRIGGER before_employee_update
BEFORE UPDATE ON employees
FOR EACH ROW
BEGIN
    INSERT INTO audit_log (employee_id, action, change_date)
    VALUES (OLD.id, 'UPDATE', NOW());
END;

-- View triggers
SHOW TRIGGERS;
SHOW TRIGGERS FROM database_name;

-- Delete a trigger
DROP TRIGGER IF EXISTS before_employee_update;

Locking

In MySQL, locking is a mechanism used to control database access, ensuring data integrity and consistency when multiple sessions interact with the databases. MySQL supports three main types of locks: global-level locks, table-level locks, and low-level locks.

Global-Level Lock

The global-level lock is a coarse-grained lock that blocks the entire database. It is commonly used during operations like data backups. Due to its coarse granularity, any updates to the database are blocked while the lock is active. Additionally, when the global lock is in place, secondary databases cannot read the binary log (binlog) from the primary database, leading to replication delays between the primary and secondary databases.

For InnoDB tables, it is generally recommended to use the --single-transaction option during data backups. This approach ensures data consistency without acquiring a global read lock, allowing updates to continue uninterrupted.

SHOW CODE
-- From mysql client
FLUSH TABLES WITH READ LOCK;

-- From bash
mysqldump -u username -p --single-transaction database_name > backup.sql

UNLOCK TABLES;

Table-Level Lock

The table-level lock is a mechanism that locks an entire table to control database access when multiple sessions interact with the database. There are three main types of table-level locks: table locks, metadata locks, and intention locks.

  • Table locks include read locks and write locks:
    • When a session acquires a read lock, other sessions can also read data but cannot modify it.
    • When a session acquire a write lock, other sessions cannot read or write data from the locked table.
  • Metadata lock is a type of lock that protects the structure of a database object when it is being accessed or modified by a session. It it automatically managed by MySQL. There are two main types of metadata locks: shared metadata locks and exclusive metadata locks.
    • Shared metadata locks include SHARED_READ and SHARED_WRITE locks. These are acquired by queries, modification, or transactions.
      • A SHARED_READ lock allows other sessions to read the locked data.
      • A SHARED_WRITE lock allows other sessions to read but not modify the locked data.
    • The exclusive metadata lock is acquired by operations that modify the table’s structure, such as ALTER TABLE. It prevents other sessions from accessing the table until the operation completes.
  • An intention lock is a type of lock used by the InnoDB storage engine to indicate a transaction’s intention to acquire row-level locks at a finer granularity. It works in conjunction with row-level locks, helping to avoid confilicts between transactions that might lock the same table at different levels (e.g., table-level vs. row-level locks). There are two main types of intention locks: Intention Shared Lock (IS) and Intention Exclusive Lock (IX).
    • The Intention Shared Lock (IS) indicates that a transaction intends to place shared wor-level locks on some rows in the table. It allows other transactions to acquire IS locks or table-level shared locks but blocks table-level exclusive locks.
    • The Intention Exclusive Lock (IX) indicates that a transaction intends to place exlcusive row-level locks on some rows in the table. It allows other transactions to acquire IS locks but blocks table-level shared locks and table-level exclusive locks.
      Lock Type IS IX S X
      IS โœ… โœ… โœ… โŒ
      IX โœ… โœ… โŒ โŒ
      S (Shared) โœ… โŒ โœ… โŒ
      X (Exclusive) โŒ โŒ โŒ โŒ
SHOW CODE
--- TABLE LOCK ---
-- Read lock
LOCK TABLES sales READ;
SELECT * FROM sales WHERE sale_date = '2023-10-01';
UNLOCK TABLES;

-- Write lock
LOCK TABLES products WRITE;
UPDATE products SET price = price * 1.1; 
UNLOCK TABLES;
--- META DATA LOCK ---
-- Session 1 starts a transaction and reads from a table
START TRANSACTION;
SELECT * FROM employees;
-- Session 2 attempts to alter the table
ALTER TABLE employees ADD COLUMN salary DECIMAL(10, 2);
-- Session 2 will block until Session 1 releases its shared metadata lock

-- Session 1 starts an `ALTER TABLE` operation
ALTER TABLE employees ADD COLUMN department VARCHAR(50);
-- session 2 attempts to query the table
SELECT * FROM employees;
-- Session 2 will block until Session 1 comoletes the `ALTER TABLE` operation
--- INTENSION LOCK ---
-- Transaction A
-- Transaction A acquires an Intention Shared Lock (IS) on the employees table.
-- It then acquires a Shared Row-Level Lock (S) on the row with id = 1.
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction B
-- Transaction B acquires an Intention Exclusive Lock (IX) on the employees table.
-- It then acquires an Exclusive Row-Level Lock (X) on the row with id = 2.
START TRANSACTION;
UPDATE employees SET salary = 65000 WHERE id = 2;

-- Transaction C
-- Transaction C attempts to acquire an Exclusive Table-Level Lock (X) on the employees table.
-- It will block until Transaction A and Transaction B release their locks.
ALTER TABLE employees ADD COLUMN department VARCHAR(50);

Row-Level Lock

In MySQL, a row-level lock is a fine-grained locking mechanism that locks individual rows instead of the entire table. There are two main types of row-level locks: Shared Lock and Exclusive Lock.

  • A Shared Lock allows other transactions to read the same row but prevents them from modifying it.
  • An **Exclusive Lock prevents other transactions from reading or modifying the same row until the lock is released.

Row-level locks are supported by the InnoDB storage engine to ensure data consistency and integrity. They are automatically acquired and released based on the transaction’s operations.

SHOW CODE
-- Transaction A reads a row with a shared lock
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction B can also reads the same row
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction C cannot modify the same row until the lock is released
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1; -- Blocks
-- Transaction A updates a row with an exclusive lock
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1;

-- Transaction B cannot read or modify the same row until the lock is released
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE; -- Blocks

InnoDB provides additional locking mechanisms, such as gap locks and next-key locks, to ensure data consistency and integrity and prevent issues like phantom reads. By default, InnoDB operates at the REPEATABLE READ isolation level, where it uses next-key locks for searches and index scans to avoid phantom reads.

  • A gap lock locks the intervals between rows in an index, preventing new rows from being inserted into those gaps. It does not lock the rows themselves but rather the gaps between them. The primary purpose of a gap lock is to prevent other transactions from inseting data into the blocked gap. Gap locks are coexistent, meaning they do not block other transactions from locking the same gap.
  • A next-key lock is a combination of a row lock and a gap lock. It locks both the row and the gap before it.
SHOW CODE
| id  | name      | salary  |
|-----|-----------|---------|
| 1   | John Doe  | 50000   |
| 3   | Jane Smith| 60000   |
| 5   | Alice Lee | 55000   |
SELECT * FROM employees WHERE id BETWEEN 2 AND 4 FOR UPDATE;
-- The gap lock will be placed on the interval (1, 3) and (3, 5)
SELECT * FROM employees WHERE id = 3 FOR UPDATE;
-- The next-key lock will be placed on the row with id = 3 and the gap (1, 3)