Introduction to DBMS

A Database Management System (DBMS) is software that provides an interface for users to interact with databases. It enables the management, storage, and manipulation of data within databases. DBMS can be broadly classified into two main types: Relational DBMS and NoSQL DBMS.

In a Relational DBMS, data is stored in structured tables and is manipulated using SQL (Structured Query Language). Examples of relational DBMS inlcude MySQL, Oracle, and PostgreSQL.

In a NoSQL DBMS, data is typically unstructured or semi-structured, offering flexibility to handle diverse data types. Examples of NoSQL DBMS include MongoDB and Redis.

MySQL

MySQL is a relational database management system (RDBMS) that uses Structured Query Language (SQL) to interact with databases. It stores data in tables made up of rows and columns, allowing for efficient organization, retrival, and manipulation of data.

SQL Commands

SQL Commands are used to interact with relational databases. These commands can be classified into five types: DDL, DQL, DML, DCL, and TCL.

DDL (Data Definition Language): Used to define and manage the structure of database instances, such as creating, altering, and deleting tables, and defining constraints.
DQL (Data Query Language): Used to retrive data from the database, primarily through the SELECT statement.
DML (Data Manipulation Language): Used to manipulate data within tables, such as inserting, updating, and deleting records.
DCL (Data Control Language): Used to control access to data, manage user priviledges, and ensure data security. It includes commands like GRANT and REVOKE.
TCL (Transaction Control Language): Used to manage transactions, ensuring data consistency. It includes commands like COMMIT, ROLLBACK, and SAVEPOINT.

SQL Query Execution Order

MySQL Execution Order: In MySQL, the execution order of an SQL query differs from the order in which components (e.g., SELECT, FROM, WHERE) are written in the query. MySQL begins by identifying the tables or views specified in the FROM clause. If there are any JOIN operation, MySQL performs the joins first and applies the ON condition. After completing the join operations, MySQL filters the based on the conditions specified in the WHERE clause.

If a GROUP BY clause is present, MySQL groups the rows based on the specified column(s) and then executes any aggregation functions (e.g., SUM(), COUNT(), etc.). Following the grouping, if a HAVING clause is provided, MySQL further filters the grouped results based on the specified condition.

After filtering grouped results, MySQL selects the specified columns in the SELECT clause. If the DISTINCT keyword is used, MySQL removes duplicate rows from the selected result. Once the rows are selected and duplicates are removed, MySQL sorts the result set according to the rules defined in the ORDER BY clause.

Finally, if a LIMIT or OFFSET clause is included, MySQL limits the number of rows returned, starting from the specified OFFSET.

Select

Find Customer Referee

SHOW PROBLEM

Table: Customer

Column Name	Type
id	int
name	varchar
referee_id	int

The id column is the primary key.
Each row represents a customer, their name, and the id of the customer who referred them. If a customer wasn’t referred by anyone, the referee_id is NULL.

Problem:

Find the names of customers who were not referred by the customer with id = 2.

The result can be returned in any order.

Example:

Input: Customer table:

id	name	referee_id
1	Will	NULL
2	Jane	NULL
3	Alex	2
4	Bill	NULL
5	Zack	1
6	Mark	2

Output:

name
Will
Jane
Bill
Zack

Go to Leetcode 🔗

SHOW CODE

SELECT name FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL;

SHOW NOTES

In SQL, NULL is not equal to any value, including 2. Therefore, the condition referee_id != 2 will exclude records where referee_id is NULL, because comparisons with NULL always result in an unknown NULL value, which is neither TRUE nor FALSE. It is crucial to consider the handling of NULL values when working with databases.

Article Views I

SHOW PROBLEM

Table: Views

Column Name	Type
article_id	int
author_id	int
viewer_id	int
view_date	date

There is no primary key in this table, and the table may contain duplicate rows.
Each row indicates that a viewer viewed an article, where the article is written by an author, on a specific date.
If author_id and viewer_id are the same, it indicates that the author viewed their own article.

Problem:

Write a query to find the IDs of authors who have viewed at least one of their own articles.

Return the result table sorted by author_id in ascending order.

Example:

Input:

Views table:

article_id	author_id	viewer_id	view_date
1	3	5	2019-08-01
1	3	6	2019-08-02
2	7	7	2019-08-01
2	7	6	2019-08-02
4	7	1	2019-07-22
3	4	4	2019-07-21
3	4	4	2019-07-21

Output:

id
4
7

Go to Leetcode 🔗

SHOW CODE

SELECT DISTINCT author_id AS id FROM Views
WHERE author_id = viewer_id
ORDER BY id ASC;

SHOW NOTES

In SQL, the DISTINCT keyword is used to remove duplicate records from the result set, ensuring that only unique values are returned. It is commonly used in scenarios such as aggregating distinct values and ensuring uniqueness in join queries.

# Aggregating unique values
SELECT DISTINCT product_id
FROM Order_Items;

# Ensuring unique results in join queries
SELECT DISTINCT c.customer_id, c.name
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id;

Baisc Joins

LEFT JOIN: Replace Employee ID With The Unique Identifier

In MySQL, LEFT JOIN is used to combine rows from two or more tables based on a related column between them. It returns all rows from the left table and the matching rows from the right table. If there is no match, the result will include NULL values for the column from the right table.

SHOW PROBLEM

Tables:

Employees

Column Name	Type
id	int
name	varchar

id is the primary key for this table.
Each row represents an employee in the company, containing their id and name.

EmployeeUNI

Column Name	Type
id	int
unique_id	int

(id, unique_id) is the primary key, meaning the combination of id and unique_id is unique across the table.
Each row represents an employee’s id and their corresponding unique_id in the company.

Task:

Write a SQL query that returns the unique_id of each employee from the Employees table. If an employee does not have a corresponding unique_id, return NULL for that employee.

Query Requirements:

If an employee has a unique_id, show it.
If an employee does not have a unique_id, return NULL.
The result should display the unique_id and the name of each employee.

The results can be returned in any order.

Example:

Input:

Employees table:

id	name
1	Alice
7	Bob
11	Meir
90	Winston
3	Jonathan

EmployeeUNI table:

id	unique_id
3	1
11	2
90	3

Output:

unique_id	name
NULL	Alice
NULL	Bob
2	Meir
3	Winston
1	Jonathan

Explanation:

Alice and Bob do not have a unique_id, so NULL is shown for both.
Meir has a unique_id of 2.
Winston has a unique_id of 3.
Jonathan has a unique_id of 1.

Go to Leetcode 🔗

SHOW CODE

SELECT euni.unique_id, e.name
FROM Employees e
LEFT JOIN EmployeeUNI euni
ON e.id = euni.id;

LEFT JOIN: Confirmation Rate

SHOW PROBLEM

Table: Signups

Column Name	Type
user_id	int
time_stamp	datetime

user_id is a unique identifier for each user.
Each row represents the signup time for the user identified by user_id.

Table: Confirmations

Column Name	Type
user_id	int
time_stamp	datetime
action	ENUM

The primary key is a combination of user_id and time_stamp.
user_id is a foreign key referencing the Signups table.
action is an ENUM with two possible values: 'confirmed' or 'timeout'.
Each row represents a confirmation request made by the user with user_id, indicating whether the confirmation message was confirmed ('confirmed') or timed out ('timeout').

Problem Description

The confirmation rate of a user is calculated as the ratio of 'confirmed' actions to the total number of confirmation requests. If a user has not made any confirmation requests, the confirmation rate is 0. The rate should be rounded to two decimal places.

Goal

Write a SQL query to find the confirmation rate for each user.

Return the result in any order.

Input

Signups table:

user_id	time_stamp
3	2020-03-21 10:16:13
7	2020-01-04 13:57:59
2	2020-07-29 23:09:44
6	2020-12-09 10:39:37

Confirmations table:

user_id	time_stamp	action
3	2021-01-06 03:30:46	timeout
3	2021-07-14 14:00:00	timeout
7	2021-06-12 11:57:29	confirmed
7	2021-06-13 12:58:28	confirmed
7	2021-06-14 13:59:27	confirmed
2	2021-01-22 00:00:00	confirmed
2	2021-02-28 23:59:59	timeout

Output

user_id	confirmation_rate
6	0.00
3	0.00
7	1.00
2	0.50

Explanation:

User 6 did not request any confirmation messages, so their confirmation rate is 0.
User 3 made two confirmation requests, but both timed out, so their confirmation rate is 0.
User 7 made three requests, and all were confirmed, so their confirmation rate is 1.00.
User 2 made two requests: one confirmed and one timed out, so their confirmation rate is 1/2 = 0.50.

Go to Leetcode 🔗

SHOW CODE

SELECT
    user_id,
    ROUND(IFNULL(SUM(action = 'confirmed') / COUNT(1), 0), 2) AS confirmation_rate
FROM
    SignUps
    LEFT JOIN Confirmations USING (user_id)
GROUP BY 1;

SELECT s.user_id, 
    CASE WHEN c.user_id IS NULL THEN 0.00 
        ELSE ROUND(SUM(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END) / COUNT(*),2) 
    END 
    AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON c.user_id = s.user_id
GROUP BY s.user_id

SHOW NOTES

SUM(action = 'confirmed'): action = 'confirmed' returns 1 for confirmed actions and 0 for 'timeout' action.
COUNT(1): Counts the total number of rows. (Use COUNT(*) for counting rows is more clear and idiomatic).
LEFT JOIN Confirmations USING (user_id): Joins the Signups table with the Confirmations table on the user_id column.
GROUP BY 1: A shorthand for GROUP BY user_id. It groups the results by the first column in the SELECT list, which is user_id in this case.

INNER JOIN: Product Sales Analysis I

In MySQL, the INNER JOIN is used to combine rows from two or more tables based on a related column between them. It only returns rows where there is a match in all the joined tables.

SHOW PROBLEM

Tables:

Sales

Column Name	Type
sale_id	int
product_id	int
year	int
quantity	int
price	int

(sale_id, year) is the primary key, meaning this combination of columns is unique.
product_id is a foreign key referencing the Product table.
Each row represents a sale of a product identified by product_id in a particular year.
The price is per unit of the product.

Product

Column Name	Type
product_id	int
product_name	varchar

product_id is the primary key for this table.
Each row represents a product with its product_name.

Task:

Write a SQL query to report the product_name, year, and price for each sale in the Sales table.

Query Requirements:

Return the product_name from the Product table, corresponding to the product_id in the Sales table.
Include the year and price from the Sales table for each sale.
The result can be returned in any order.

Example:

Input:

Sales table:

sale_id	product_id	year	quantity	price
1	100	2008	10	5000
2	100	2009	12	5000
7	200	2011	15	9000

Product table:

product_id	product_name
100	Nokia
200	Apple
300	Samsung

Output:

product_name	year	price
Nokia	2008	5000
Nokia	2009	5000
Apple	2011	9000

Explanation:

From sale_id = 1, we can conclude that Nokia was sold for 5000 in 2008.
From sale_id = 2, we can conclude that Nokia was sold for 5000 in 2009.
From sale_id = 7, we can conclude that Apple was sold for 9000 in 2011.

Go to Leetcode 🔗

SHOW CODE

SELECT p.product_name, s.year, s.price
FROM Sales s
INNER JOIN Product p
ON s.product_id = p.product_id;

LEFT JOIN: Customer Who Visited but Did Not Make Any Transactions

SHOW PROBLEM

Table: Visits

Column Name	Type
visit_id	int
customer_id	int

visit_id is the unique identifier for each visit.
This table contains information about the customers who visited the mall.

Table: Transactions

Column Name	Type
transaction_id	int
visit_id	int
amount	int

transaction_id is the unique identifier for each transaction.
This table contains information about the transactions made during each visit.

Task:

Write a SQL query to find the IDs of customers who visited the mall without making any transactions, and count how many times they made these visits.

The result should include:

customer_id — the ID of the customer.
count_no_trans — the number of visits where no transaction was made.

Query Requirements:

Return the result sorted in any order.

Input:

Visits table:

visit_id	customer_id
1	23
2	9
4	30
5	54
6	96
7	54
8	54

Transactions table:

transaction_id	visit_id	amount
2	5	310
3	5	300
9	5	200
12	1	910
13	2	970

Output:

customer_id	count_no_trans
54	2
30	1
96	1

Explanation:

Customer with id = 23 visited once and made a transaction during the visit with id = 12.
Customer with id = 9 visited once and made a transaction during the visit with id = 13.
Customer with id = 30 visited once and did not make any transactions.
Customer with id = 54 visited three times. During two visits, they did not make any transactions, and during one visit, they made three transactions.
Customer with id = 96 visited once and did not make any transactions.

In the output, customers with id = 30, id = 96, and id = 54 (for two of their visits) made visits without transactions.

Go to Leetcode 🔗

SHOW CODE

SELECT v.customer_id, COUNT(v.visit_id) as count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;

SHOW NOTES

In the above SQL query, MySQL first identifies the Visits table and performs a LEFT JOIN with the Transactions table based on the condition v.visit_id = t.visit_id. The LEFT JOIN operation combines the two tables and fills NULL in rows where there is no match in the Transactions table. The result looks like this:

SELECT *
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id;

visit_id	customer_id	transaction_id	visit_id	amount
1	23	12	1	910
2	9	13	2	970
4	30	null	null	null
5	54	9	5	200
5	54	3	5	300
5	54	2	5	310
6	96	null	null	null
7	54	null	null	null
8	54	null	null	null

After joining the tables, MySQL applies the WHERE clause to filter rows where t.transaction_id IS NULL, which means visits without any transactions.

SELECT *
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL;

visit_id	customer_id	transaction_id	visit_id	amount
4	30	null	null	null
6	96	null	null	null
7	54	null	null	null
8	54	null	null	null

Next, MySQL groups the filtered rows by customer_id and calculates the number of visits per customer using the COUNT(v.visit_id) function.

SELECT *, COUNT(v.visit_id)
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;

visit_id	customer_id	transaction_id	visit_id	amount	COUNT(v.visit_id)
4	30	null	null	null	1
6	96	null	null	null	1
7	54	null	null	null	2

Finally, MySQL selects the desired columns and renames the COUNT(v.visit_id) column using the AS clause.

SELECT v.customer_id, COUNT(v.visit_id) AS count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;

customer_id	count_no_trans
30	1
96	1
54	2

SELF JOIN: Rising Temperature

In MySQL, a self join is an operation where a table joins itself. It is commonly used when comparing rows within the same table based on a specific condition or when working with hierarchical data, such as relating employees to their managers.

SHOW PROBLEM

Table: Weather

Column Name	Type
id	int
recordDate	date
temperature	int

id is a column with unique values.
There are no duplicate rows for the same recordDate.
This table contains information about the temperature on specific dates.

Problem Statement

Write a solution to find the id of all dates where the temperature is higher compared to the previous day (yesterday).

Return

Return the result table in any order.

Input:

Weather Table:

id	recordDate	temperature
1	2015-01-01	10
2	2015-01-02	25
3	2015-01-03	20
4	2015-01-04	30

Output:

id
2
4

Explanation:

On 2015-01-02, the temperature was higher than the previous day (10 -> 25).
On 2015-01-04, the temperature was higher than the previous day (20 -> 30).

Go to Leetcode 🔗

SHOW CODE

SELECT w1.id
FROM Weather w1
JOIN Weather w2
  ON DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate
WHERE w1.temperature > w2.temperature;

SHOW NOTES

In the above query, the DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate condition ensures that the recordDate of w1 matches the day following w2’s recordDate.

SELECT *
FROM Weather w1
JOIN Weather w2
  ON DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate

id	recordDate	temperature	id	recordDate	temperature
1	2015-01-01	10
2	2015-01-02	25	1	2015-01-01	10
3	2015-01-03	20	2	2015-01-02	25
4	2015-01-04	30	3	2015-01-03	20
			4	2015-01-04	30

Result of the Join:

id	recordDate	temperature	id	recordDate	temperature
2	2015-01-02	25	1	2015-01-01	10
3	2015-01-03	20	2	2015-01-02	25
4	2015-01-04	30	3	2015-01-03	20

when id equals to 2, 25 > 10, so it is selcted.
when id equals to 3, 20 < 25, so it is not selcted.
when id equals to 4, 25 > 10, so it is selcted.

Therefore, the result will be:

id
2
4

Example: Relating Employees to Their Managers

Employees Table:

id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Charlie	1
4	David	2

SELECT e1.name AS Employee, e2.name AS Manager
FROM Employees e1
JOIN Employees e2
  ON e1.manager_id = e2.id;

Result of the Join:

id	name	manager_id	id	name	manager_id
2	Bob	1	1	Alice	NULL
3	Charlie	1	1	Alice	NULL
4	David	2	2	Bob	1

Final Result:

Employee	Manager
Bob	Alice
Charlie	Alice
David	Bob

SELF JOIN: Managers with at Least 5 Direct Reports

SHOW PROBLEM

Problem Description

Table: Employee

Column Name	Data Type	Description
id	int	Unique identifier for each employee.
name	varchar	Name of the employee.
department	varchar	Department to which the employee belongs.
managerId	int	ID of the employee’s manager. If `managerId` is null, the employee does not have a manager.

id is the primary key, ensuring each value is unique.
Each row represents an employee, including their name, department, and their manager’s ID.
If managerId is null, the employee does not report to any manager.
An employee cannot be their own manager.

Task

Write a query to find the managers who have at least five direct reports.

Return the result in any order.

Input:

Employee Table:

id	name	department	managerId
101	John	A	null
102	Dan	A	101
103	James	A	101
104	Amy	A	101
105	Anne	A	101
106	Ron	B	101

Output:

name
John

Go to Leetcode 🔗

SHOW CODE

SELECT b.name
FROM Employee a
JOIN Employee b ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

SHOW NOTES

Table: Employee:

id	name	department	managerId
101	John	A	null
102	Dan	A	101
103	James	A	101
104	Amy	A	101
105	Anne	A	101
106	Ron	B	101

Perform a self join, the table becomes:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id

Table: Slef Joined Result:

id	name	department	managerId	id	name	department	managerId
106	Ron	B	101	101	John	A	null
105	Anne	A	101	101	John	A	null
104	Amy	A	101	101	John	A	null
103	James	A	101	101	John	A	null
102	Dan	A	101	101	John	A	null

Group the self joined result:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id

Table: Grouped Result:

id	name	department	managerId	id	name	department	managerId
106	Ron	B	101	101	John	A	null

Filter grouped result:

SELECT *
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

Table: Filtered Grouped Result:

id	name	department	managerId	id	name	department	managerId
106	Ron	B	101	101	John	A	null

Select the specified column(s):

SELECT b.name
FROM Employee a
JOIN Employee b 
ON a.managerId = b.id
GROUP BY b.id
HAVING COUNT(*) >= 5

Final Result:

name
John

SELF JOIN: Average Time of Process per Machine

SHOW PROBLEM

Table: Activity

Column Name	Type
machine_id	int
process_id	int
activity_type	enum
timestamp	float

The table records user activities for machines on a factory website.
The combination of (machine_id, process_id, activity_type) is the primary key, ensuring uniqueness in the table.
machine_id is the ID of the machine.
process_id is the ID of the process running on the machine with ID machine_id.
activity_type is an ENUM type with values 'start' and 'end', representing the start and end of a process.
timestamp is a float value representing the time (in seconds) the event occurred.
The 'start' timestamp is always earlier than the 'end' timestamp for each (machine_id, process_id) pair.
It is guaranteed that each (machine_id, process_id) pair has a corresponding 'start' and 'end' timestamp.

Problem:

There are several machines on a factory website, and each machine runs the same number of processes. Your task is to write a SQL query that calculates the average time each machine takes to complete a process.

The time to complete a process is the difference between the 'end' timestamp and the 'start' timestamp.
The average time for each machine is calculated by dividing the total time for all processes on that machine by the number of processes.

The result should contain the following columns:

machine_id — the ID of the machine.
processing_time — the average processing time, rounded to 3 decimal places.

Input:

Activity table:
|------------|------------|---------------|-----------|
| machine_id | process_id | activity_type | timestamp |
|------------|------------|---------------|-----------|
| 0          | 0          | start         | 0.712     |
| 0          | 0          | end           | 1.520     |
| 0          | 1          | start         | 3.140     |
| 0          | 1          | end           | 4.120     |
| 1          | 0          | start         | 0.550     |
| 1          | 0          | end           | 1.550     |
| 1          | 1          | start         | 0.430     |
| 1          | 1          | end           | 1.420     |
| 2          | 0          | start         | 4.100     |
| 2          | 0          | end           | 4.512     |
| 2          | 1          | start         | 2.500     |
| 2          | 1          | end           | 5.000     |
|------------|------------|---------------|-----------|

Output:

|------------|-----------------|
| machine_id | processing_time |
|------------|-----------------|
| 0          | 0.894           |
| 1          | 0.995           |
| 2          | 1.456           |
|------------|-----------------|

Explanation:

Machine 0:
- Process 0: End time 1.520, Start time 0.712 → Time taken = 1.520 - 0.712 = 0.808
- Process 1: End time 4.120, Start time 3.140 → Time taken = 4.120 - 3.140 = 0.980
- Average time = (0.808 + 0.980) / 2 = 0.894
Machine 1:
- Process 0: End time 1.550, Start time 0.550 → Time taken = 1.550 - 0.550 = 1.000
- Process 1: End time 1.420, Start time 0.430 → Time taken = 1.420 - 0.430 = 0.990
- Average time = (1.000 + 0.990) / 2 = 0.995
Machine 2:
- Process 0: End time 4.512, Start time 4.100 → Time taken = 4.512 - 4.100 = 0.412
- Process 1: End time 5.000, Start time 2.500 → Time taken = 5.000 - 2.500 = 2.500
- Average time = (0.412 + 2.500) / 2 = 1.456

Go to Leetcode 🔗

SHOW CODE

SELECT machine_id,
       ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
    SELECT a.machine_id,
           a.process_id,
           a.timestamp AS start_time,
           b.timestamp AS end_time
    FROM Activity a
    JOIN Activity b ON a.machine_id = b.machine_id
                    AND a.process_id = b.process_id
                    AND a.activity_type = 'start'
                    AND b.activity_type = 'end'
) AS process_times
GROUP BY machine_id;

SHOW NOTES

In the following SQL query, the condition a.activity_type = 'start' and b.activity_type = 'end' ensures that the start and end timestamp are correctly matched for the same process on the same machine.

SELECT *
FROM Activity a
JOIN Activity b ON a.machine_id = b.machine_id
                AND a.process_id = b.process_id
                AND a.activity_type = 'start'
                AND b.activity_type = 'end'

Result of the Join:

machine_id	process_id	activity_type	timestamp	machine_id	process_id	activity_type	timestamp
0	0	start	0.712	0	0	end	1.52
0	1	start	3.14	0	1	end	4.12
1	0	start	0.55	1	0	end	1.55
1	1	start	0.43	1	1	end	1.42
2	0	start	4.1	2	0	end	4.512
2	1	start	2.5	2	1	end	5

The result is selected from of the joined result:

SELECT machine_id,
       ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
    SELECT a.machine_id,
           a.process_id,
           a.timestamp AS start_time,
           b.timestamp AS end_time
    FROM Activity a
    JOIN Activity b ON a.machine_id = b.machine_id
                    AND a.process_id = b.process_id
                    AND a.activity_type = 'start'
                    AND b.activity_type = 'end'
) AS process_times
GROUP BY machine_id;

Final Result:

machine_id	processing_time
0	0.894
1	0.995
2	1.456

CROSS JOIN: Students and Examinations

In MySQL, a CROSS JOIN is a type of join that returns the Cartesian product of two tables. It combines each row from the first table with every row from the second table, without the need for a join condition. For example, consider two tables, Students and Subjects:

Table 1: Students

student_id	student_name
1	Alice
2	Bob

Table 2: Subjects

subject_name
Math
Physics

When a Cross Join is performed between these two tables, the result will be:

Table: Result of Cross Join

student_id	student_name	subject_name
1	Alice	Math
1	Alice	Physics
2	Bob	Math
2	Bob	Physics

SHOW PROBLEM

Table: Students

Column Name	Type
student_id	int
student_name	varchar

student_id is the primary key (unique values) for this table.
Each row represents a student with their unique ID and name.

Table: Subjects

Column Name	Type
subject_name	varchar

subject_name is the primary key (unique values) for this table.
Each row represents the name of a subject offered in the school.

Table: Examinations

Column Name	Type
student_id	int
subject_name	varchar

There is no primary key for this table, and it may contain duplicates.
Each row indicates that a student with student_id attended the exam for the subject subject_name.
Every student takes every course from the Subjects table.

Task:

Write a query to find the number of times each student attended each exam, ordered by student_id and subject_name.

Input:

Students Table:

student_id	student_name
1	Alice
2	Bob
13	John
6	Alex

Subjects Table:

subject_name
Math
Physics
Programming

Examinations Table:

student_id	subject_name
1	Math
1	Physics
1	Programming
2	Programming
1	Physics
1	Math
13	Math
13	Programming
13	Physics
2	Math
1	Math

Output:

student_id	student_name	subject_name	attended_exams
1	Alice	Math	3
1	Alice	Physics	2
1	Alice	Programming	1
2	Bob	Math	1
2	Bob	Physics	0
2	Bob	Programming	1
6	Alex	Math	0
6	Alex	Physics	0
6	Alex	Programming	0
13	John	Math	1
13	John	Physics	1
13	John	Programming	1

Explanation:

The result table contains all students and all subjects.

Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.
Alex did not attend any exams.
John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    s.student_id,
    s.student_name,
    sub.subject_name,
    COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name
ORDER BY 
    s.student_id, 
    sub.subject_name;

SHOW NOTES

Table: Students

student_id	student_name
1	Alice
2	Bob
13	John
6	Alex

Table: Subjects:

subject_name
Math
Physics
Programming

When a CROSS JOIN is performed between the Students and Subjects tables, the result will be:

SELECT *
FROM 
    Students s
CROSS JOIN 
    Subjects sub

Table: Result of Cross Join:

student_id	student_name	subject_name
1	Alice	Programming
1	Alice	Physics
1	Alice	Math
2	Bob	Programming
2	Bob	Physics
2	Bob	Math
13	John	Programming
13	John	Physics
13	John	Math
6	Alex	Programming
6	Alex	Physics
6	Alex	Math

Table: Examniations:

student_id	subject_name
1	Math
1	Physics
1	Programming
2	Programming
1	Physics
1	Math
13	Math
13	Programming
13	Physics
2	Math
1	Math

Perform a LEFT JOIN on the cross-joined tables with the Examinations table:

SELECT *
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name

Table: Left Join Result:

student_id	student_name	subject_name	student_id	subject_name
1	Alice	Programming	1	Programming
1	Alice	Physics	1	Physics
1	Alice	Physics	1	Physics
1	Alice	Math	1	Math
1	Alice	Math	1	Math
1	Alice	Math	1	Math
2	Bob	Programming	2	Programming
2	Bob	Physics	null	null
2	Bob	Math	2	Math
13	John	Programming	13	Programming
13	John	Physics	13	Physics
13	John	Math	13	Math
6	Alex	Programming	null	null
6	Alex	Physics	null	null
6	Alex	Math	null	null

After grouping the left joined result and count the e.studentId, the table will look like:

SELECT *, COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name

Table: Grouped Result with Count:

student_id	student_name	subject_name	student_id	subject_name	attended_exams
1	Alice	Programming	1	Programming	1
1	Alice	Physics	1	Physics	2
1	Alice	Math	1	Math	3
2	Bob	Programming	2	Programming	1
2	Bob	Physics	null	null	0
2	Bob	Math	2	Math	1
13	John	Programming	13	Programming	1
13	John	Physics	13	Physics	1
13	John	Math	13	Math	1
6	Alex	Programming	null	null	0
6	Alex	Physics	null	null	0
6	Alex	Math	null	null	0

After selecting the specified columns, the result table becomes:

SELECT 
    s.student_id,
    s.student_name,
    sub.subject_name,
    COUNT(e.student_id) AS attended_exams
FROM 
    Students s
CROSS JOIN 
    Subjects sub
LEFT JOIN 
    Examinations e 
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY 
    s.student_id, 
    s.student_name, 
    sub.subject_name

student_id	student_name	subject_name	attended_exams
1	Alice	Programming	1
1	Alice	Physics	2
1	Alice	Math	3
2	Bob	Programming	1
2	Bob	Physics	0
2	Bob	Math	1
13	John	Programming	1
13	John	Physics	1
13	John	Math	1
6	Alex	Programming	0
6	Alex	Physics	0
6	Alex	Math	0

Sort the selcted table based on the condition s.student_id and sub.subject_name, return the final result:

Table: Final Result:

student_id	student_name	subject_name	attended_exams
1	Alice	Math	3
1	Alice	Physics	2
1	Alice	Programming	1
2	Bob	Math	1
2	Bob	Physics	0
2	Bob	Programming	1
6	Alex	Math	0
6	Alex	Physics	0
6	Alex	Programming	0
13	John	Math	1
13	John	Physics	1
13	John	Programming	1

Basic Aggregate Functions

Average Selling Price

SHOW PROBLEM

Table: Prices

Column Name	Type
product_id	int
start_date	date
end_date	date
price	int

The primary key for this table is the combination of (product_id, start_date, end_date), ensuring unique periods for each product.
Each row represents the price of a product for a specific period, from start_date to end_date.
For each product, no two periods will overlap.

Table: UnitsSold

Column Name	Type
product_id	int
purchase_date	date
units	int

This table may contain duplicate rows.
Each row represents the date, number of units, and product_id for a product sold on that date.

Problem:

Write a SQL query to find the average selling price for each product. The average_price should be rounded to two decimal places. If a product does not have any sold units, its average selling price should be considered 0.

Return the result table in any order.

Input:

Prices table:

product_id	start_date	end_date	price
1	2019-02-17	2019-02-28	5
1	2019-03-01	2019-03-22	20
2	2019-02-01	2019-02-20	15
2	2019-02-21	2019-03-31	30

UnitsSold table:

product_id	purchase_date	units
1	2019-02-25	100
1	2019-03-01	15
2	2019-02-10	200
2	2019-03-22	30

Output:

product_id	average_price
1	6.96
2	16.96

Explanation:

For product 1:
- From 2019-02-17 to 2019-02-28, 100 units were sold at a price of 5.
- From 2019-03-01 to 2019-03-22, 15 units were sold at a price of 20.
- Average price is calculated based on the total price and total units sold.
For product 2:
- From 2019-02-01 to 2019-02-20, 200 units were sold at a price of 15.
- From 2019-02-21 to 2019-03-31, 30 units were sold at a price of 30.
- Average price is calculated based on the total price and total units sold.

Go to Leetcode 🔗

SHOW CODE

SELECT
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM
    Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id 
    AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

SHOW NOTES: Right Answer

MySQL Query Walkthrough:

Table: Prices

product_id	start_date	end_date	price
1	2019-02-17	2019-02-28	5
1	2019-03-01	2019-03-22	20
2	2019-02-01	2019-02-20	15
2	2019-02-21	2019-03-31	30
3	2019-02-21	2019-03-31	30

Table: UnitsSold

product_id	purchase_date	units
1	2019-02-25	100
1	2019-03-01	15
2	2019-02-10	200
2	2019-03-22	30

Join the two tables based on the product_id and the purchase_date, where the purchase_date falls between the start_date and the end_date:

SELECT * 
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date;

product_id	start_date	end_date	price	product_id	purchase_date	units
1	2019-02-17	2019-02-28	5	1	2019-02-25	100
1	2019-03-01	2019-03-22	20	1	2019-03-01	15
2	2019-02-01	2019-02-20	15	2	2019-02-10	200
2	2019-02-21	2019-03-31	30	2	2019-03-22	30
3	2019-02-21	2019-03-31	30	null	null	null

Group the joined result and then use aggregation functions to calculate the average price:

SELECT 
    *,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

product_id	start_date	end_date	price	product_id	purchase_date	units	average_price
1	2019-02-17	2019-02-28	5	1	2019-02-25	100	6.96
2	2019-02-01	2019-02-20	15	2	2019-02-10	200	16.96
3	2019-02-21	2019-03-31	30	null	null	null	0

Select the desired columns:

SELECT 
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

product_id	average_price
1	6.96
2	16.96
3	0

SHOW NOTES: Wrong Answer

# Wrong Answer
SELECT
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM
    Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

Join the two tables based on the condition p.product_id = u.product_id:

SELECT *
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id;

product_id	start_date	end_date	price	product_id	purchase_date	units
1	2019-02-17	2019-02-28	5	1	2019-03-01	15
1	2019-02-17	2019-02-28	5	1	2019-02-25	100
1	2019-03-01	2019-03-22	20	1	2019-03-01	15
1	2019-03-01	2019-03-22	20	1	2019-02-25	100
2	2019-02-01	2019-02-20	15	2	2019-03-22	30
2	2019-02-01	2019-02-20	15	2	2019-02-10	200
2	2019-02-21	2019-03-31	30	2	2019-03-22	30
2	2019-02-21	2019-03-31	30	2	2019-02-10	200
3	2019-02-21	2019-03-31	30	null	null	null

Filter out the rows that do not fall between start_date and end_date:

SELECT *
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date;

product_id	start_date	end_date	price	product_id	purchase_date	units
1	2019-02-17	2019-02-28	5	1	2019-02-25	100
1	2019-03-01	2019-03-22	20	1	2019-03-01	15
2	2019-02-01	2019-02-20	15	2	2019-02-10	200
2	2019-02-21	2019-03-31	30	2	2019-03-22	30

Group the filtered result and then use aggregation functions to calculate average price:

SELECT 
    *,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

product_id	start_date	end_date	price	product_id	purchase_date	units	average_price
1	2019-02-17	2019-02-28	5	1	2019-02-25	100	6.96
2	2019-02-01	2019-02-20	15	2	2019-02-10	200	16.96

Select the desired columns:

SELECT 
    p.product_id,
    ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFT JOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY 1;

product_id	average_price
1	6.96
2	16.96

The correct answer is:

product_id	average_price
1	6.96
2	16.96
3	0

Summary: The LEFT JOIN operation will return NULL results for rows without without matching records in the UnitsSold table. In the condition p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date, the date range check is applied during the join operation, while WHERE purchase_date BETWEEN start_date AND end_date is applied after the join operation. The WHERE clause exlcudes rows with NULL values, which means products with no matching sales records (i.e., where purchase_date doesn’t match) will be excluded from the result. When grouping the result, no rows containing NULL values remain, leading to an incorrect result (missing product_id = 3 with average_price = 0).

Percentage of Users Attended a Contest

SHOW PROBLEM

Table: Users

Column Name	Type
user_id	int
user_name	varchar

user_id is the primary key (a unique identifier) for this table.
Each row in this table represents a user, with their unique ID and name.

Table: Register

Column Name	Type
contest_id	int
user_id	int

(contest_id, user_id) is the primary key (a unique combination of columns) for this table.
Each row in this table represents the registration of a user in a specific contest.

Task

Write a solution to calculate the percentage of users registered for each contest, rounded to two decimal places.

Return the result table sorted by percentage in descending order. If there is a tie in percentage, order by contest_id in ascending order.

Input:

Users table:

user_id	user_name
6	Alice
2	Bob
7	Alex

Register table:

contest_id	user_id
215	6
209	2
208	2
210	6
208	6
209	7
209	6
215	7
208	7
210	2
207	2
210	7

Output:

contest_id	percentage
208	100.0
209	100.0
210	100.0
215	66.67
207	33.33

Explanation:

Contests 208, 209, and 210 had 100% user registration. The results are sorted by contest_id in ascending order.
Contest 215 had a registration rate of 66.67%, as Alice and Alex registered, out of a total of three users.
Contest 207 had a registration rate of 33.33%, as only Bob registered, out of a total of three users.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    contest_id,
    ROUND(COUNT(user_id) * 100 / (SELECT COUNT(user_id) FROM Users),2) percentage
FROM Register a 
GROUP BY contest_id
ORDER BY percentage DESC,contest_id ASC;

SHOW NOTES

$$ \text{percentage} = \frac{\text{number of users attended in the contest}}{\text{total number of users}} \times 100 \% $$

Compute the total number of users: SELECT COUNT(user_id) FROM Users
Compute the number of users attended in the contest: Group the table based on the column contest_id, then count the number of users in each group.

Queries Quality and Percentage

SHOW PROBLEM

Table: Queries

Column Name	Type
query_name	varchar
result	varchar
position	int
rating	int

This table may contain duplicate rows.
It contains information collected from various queries executed on a database.
The position column has a value ranging from 1 to 500.
The rating column has a value between 1 and 5. Queries with a rating less than 3 are considered poor queries.

Definitions:

Query quality: The average of the ratio between the query’s rating and its position.
Poor query percentage: The percentage of queries with a rating less than 3.

Objective: Write a solution to find:

The query_name,
The quality of each query (rounded to 2 decimal places),
The poor_query_percentage for each query (rounded to 2 decimal places).

Input: Queries table:

query_name	result	position	rating
Dog	Golden Retriever	1	5
Dog	German Shepherd	2	5
Dog	Mule	200	1
Cat	Shirazi	5	2
Cat	Siamese	3	3
Cat	Sphynx	7	4

Output:

query_name	quality	poor_query_percentage
Dog	2.50	33.33
Cat	0.66	33.33

Explanation:

Dog queries:
- Quality:
  $$ \left( \frac{5}{1} + \frac{5}{2} + \frac{1}{200} \right) / 3 = 2.50 $$
- Poor query percentage:
  $$ \frac{1}{3} \times 100 = 33.33 $$
Cat queries:
- Quality:
  $$ \left( \frac{2}{5} + \frac{3}{3} + \frac{4}{7} \right) / 3 = 0.66 $$
- Poor query percentage:
  $$ \frac{1}{3} \times 100 = 33.33 $$

Go to Leetcode 🔗

SHOW CODE

SELECT query_name,
       ROUND(AVG(rating / position), 2) AS quality,
       ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS poor_query_percentage
FROM Queries
GROUP BY query_name;

Monthly Transactions I

SHOW PROBLEM

Table: Transactions

Column Name	Type
id	int
country	varchar
state	enum
amount	int
trans_date	date

id is the primary key of this table.
The table stores information about incoming transactions.
The state column is an enum with values [“approved”, “declined”].

Task:

For each month and country, find the following information:

The total number of transactions (trans_count).
The total amount of all transactions (trans_total_amount).
The number of approved transactions (approved_count).
The total amount of approved transactions (approved_total_amount).

The results should be returned in any order.

Example Input:

id	country	state	amount	trans_date
121	US	approved	1000	2018-12-18
122	US	declined	2000	2018-12-19
123	US	approved	2000	2019-01-01
124	DE	approved	2000	2019-01-07

Example Output:

month	country	trans_count	approved_count	trans_total_amount	approved_total_amount
2018-12	US	2	1	3000	1000
2019-01	US	1	1	2000	2000
2019-01	DE	1	1	2000	2000

Go to Leetcode 🔗

SHOW CODE

SELECT
    DATE_FORMAT(trans_date, '%Y-%m') AS month,
    country,
    COUNT(*) AS trans_count,
    SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
    SUM(amount) AS trans_total_amount,
    SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country;

SELECT
    DATE_FORMAT(trans_date, '%Y-%m') AS month,
    country,
    COUNT(*) AS trans_count,
    COUNT(IF(state = 'approved', 1, NULL)) AS approved_count,
    SUM(amount) AS trans_total_amount,
    SUM(IF(state = 'approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country;

SHOW NOTES

The DATE_FORMAT function in SQL is used to format a date into a specific string format. For example, DATE_FORMAT('2024-12-19', '%Y-%m') will output '2024-12', extracting the year and month from the date.
The IF function in SQL returns one value if the condition is true and another value if the condition is false. For example, IF(amount > 1000, 1, 0) will return 1 if the amount is greater than 1000, otherwise, it will return 0.
The CASE...WHEN...THEN...ELSE...END statement is similar to the switch statement in other programming languages like Java. It works like the IF function, but with the ability to handle multiple conditions. It checks each condition in sequence and returns the corresponding result for the first true condition.

Immediate Food Delivery II

SHOW PROBLEM

Table: Delivery

Column Name	Type
delivery_id	int
customer_id	int
order_date	date
customer_pref_delivery_date	date

delivery_id is the unique identifier for each delivery.
The table contains information about food deliveries, where customers place orders on a specific date and specify a preferred delivery date (either on the same day or later).
If the customer_pref_delivery_date is the same as the order_date, the order is considered immediate; otherwise, it is scheduled.
The first order of a customer is defined as the one with the earliest order_date. Each customer has exactly one first order.

Task:

Find the percentage of immediate orders among the first orders of all customers, rounded to two decimal places.

The result should follow the format shown below:

Example Input:

delivery_id	customer_id	order_date	customer_pref_delivery_date
1	1	2019-08-01	2019-08-02
2	2	2019-08-02	2019-08-02
3	1	2019-08-11	2019-08-12
4	3	2019-08-24	2019-08-24
5	3	2019-08-21	2019-08-22
6	2	2019-08-11	2019-08-13
7	4	2019-08-09	2019-08-09

Example Output:

immediate_percentage
50.00

Explanation:

Customer 1 has their first order with delivery_id 1, which is scheduled.
Customer 2 has their first order with delivery_id 2, which is immediate.
Customer 3 has their first order with delivery_id 5, which is scheduled.
Customer 4 has their first order with delivery_id 7, which is immediate.

Hence, 50% of the first orders are immediate.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    ROUND((SUM(IF(order_date = customer_pref_delivery_date, 1, 0)) / COUNT(*)) * 100, 2) AS immediate_percentage
FROM 
    (SELECT 
        customer_id, 
        MIN(order_date) AS first_order_date
    FROM Delivery
    GROUP BY customer_id) AS first_orders
INNER JOIN Delivery AS d 
    ON first_orders.customer_id = d.customer_id 
    AND first_orders.first_order_date = d.order_date;

Game Play Analysis IV

SHOW PROBLEM

Table: Activity

Column Name	Type
player_id	int
device_id	int
event_date	date
games_played	int

The combination of (player_id, event_date) is the primary key of this table, ensuring each player can only log in once per day.
This table tracks the activity of players in games, where each record represents a player’s login and the number of games played (which could be zero) before logging out on a specific day using a particular device.

Task:

Calculate the fraction of players who logged in again on the day after their first login date. The result should be rounded to two decimal places.

In other words, identify the players who logged in on consecutive days starting from their first login date, then divide that count by the total number of players.

Example Input:

player_id	device_id	event_date	games_played
1	2	2016-03-01	5
1	2	2016-03-02	6
2	3	2017-06-25	1
3	1	2016-03-02	0
3	4	2018-07-03	5

Example Output:

fraction
0.33

Explanation:

Player 1 logged in on 2016-03-01 and 2016-03-02, meaning they logged in for at least two consecutive days, starting from their first login.
Player 2 did not log in again on the day after their first login.
Player 3 did not log in on consecutive days, as there was a gap between their logins.

Thus, only player 1 satisfies the condition, and the fraction is calculated as 1/3 = 0.33.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    ROUND(
        SUM(IF(DATEDIFF(event_date, min_event_date) = 1, 1, 0)) / COUNT(DISTINCT player_id), 2) AS fraction
FROM (
    SELECT 
        player_id,
        event_date,
        MIN(event_date) OVER (PARTITION BY player_id) AS min_event_date
    FROM activity
) AS activity_with_min_date;

SHOW NOTES

The DATEDIFF function in MySQL calculates the difference in days between two dates. For example, DATEDIFF('2024-12-30', '2024-12-25') will output 5, indicating there are 5 days between the two dates.
The PARTITION BY clause is used in window functions to divide the result ser into partitions (groups) based on a specified column. It applies window functions, such as RANK(), SUM(), ROW_NUMBER(), and others, to each partition. This works similarly to GROUP BY, but unlike GROUP BY, the PARTITION BY clause allows retaining the row-level data while applying the window functions.

Input Table: employees

department_id	employee_id	salary
101	1	5000
101	2	4000
101	3	6000
102	4	4500
102	5	5500
102	6	4800

SELECT 
    department_id, 
    employee_id, 
    salary, 
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees;

Output (Result of Query):

department_id	employee_id	salary	salary_rank
101	3	6000	1
101	1	5000	2
101	2	4000	3
102	5	5500	1
102	6	4800	2
102	4	4500	3

Sorting and Grouping

User Activity for the Past 30 Days I

SHOW PROBLEM

Table: Activity

Column Name	Type
user_id	int
session_id	int
activity_date	date
activity_type	enum

The activity_type column is an ENUM with values: ‘open_session’, ’end_session’, ‘scroll_down’, ‘send_message’.
This table logs user activities for a social media website, where each session is linked to exactly one user.
The table may contain duplicate rows.

Task:

You need to find the count of unique active users per day for the 30-day period ending on 2019-07-27, inclusive. A user is considered active on a particular day if they perform at least one activity on that day.

Example:

Input: Activity table

user_id	session_id	activity_date	activity_type
1	1	2019-07-20	open_session
1	1	2019-07-20	scroll_down
1	1	2019-07-20	end_session
2	4	2019-07-20	open_session
2	4	2019-07-21	send_message
2	4	2019-07-21	end_session
3	2	2019-07-21	open_session
3	2	2019-07-21	send_message
3	2	2019-07-21	end_session
4	3	2019-06-25	open_session
4	3	2019-06-25	end_session

Output:

day	active_users
2019-07-20	2
2019-07-21	2

Explanation:

Only the days with active users (those who performed at least one activity) are included in the output.
For 2019-07-20, users 1 and 2 were active, and for 2019-07-21, users 2 and 3 were active.

Go to Leetcode 🔗

SHOW CODE

SELECT activity_date AS day,
       COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND DATEDIFF('2019-07-27', activity_date) < 30
GROUP BY activity_date

SELECT activity_date AS day,
       COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND activity_date> DATE_SUB('2019-07-27',INTERVAL 30 day)
GROUP BY activity_date

SHOW NOTES

In MySQL, both DATEDIFF and DATE_SUB are used for date calculations. DATEDIFF calculates the difference between two dates, returning the result in days. On the other hand, DATE_SUB is used to subtract a specified time interval (such as days, months, or years) from a given date.

SELECT DATEDIFF('2023-12-31', '2023-01-01'); 
-- Result: 364

SELECT DATE_SUB('2023-12-31', INTERVAL 30 DAY); 
-- Result: '2023-12-01'

SELECT DATE_SUB('2023-12-31', INTERVAL 2 MONTH);
-- Result: '2023-10-31'

Product Sales Analysis III

SHOW PROBLEM

Table: Sales

Column Name	Type
sale_id	int
product_id	int
year	int
quantity	int
price	int

The combination of sale_id and year is the primary key of this table, ensuring each record is unique for a given sale.
product_id is a foreign key referencing the Product table.
Each row in this table represents a sale of a specific product (product_id) in a given year.
The price refers to the price per unit of the product.

Table: Product

Column Name	Type
product_id	int
product_name	varchar

product_id is the primary key of this table, ensuring unique product identifiers.
Each row represents the name of a product associated with the corresponding product_id.

Task: Write a query to select the product_id, year, quantity, and price for the first year in which each product was sold.

Return the result in any order.

Example:

Input: Sales table

sale_id	product_id	year	quantity	price
1	100	2008	10	5000
2	100	2009	12	5000
7	200	2011	15	9000

Product table:

product_id	product_name
100	Nokia
200	Apple
300	Samsung

Output:

product_id	first_year	quantity	price
100	2008	10	5000
200	2011	15	9000

Explanation:

For product_id = 100, the first sale occurred in 2008 with a quantity of 10 and a price of 5000.
For product_id = 200, the first sale occurred in 2011 with a quantity of 15 and a price of 9000.

The query should return the first sale year for each product, along with the corresponding quantity and price.

Go to Leetcode 🔗

SHOW CODE

SELECT
    product_id,
    year AS first_year,
    quantity,
    price
FROM Sales
WHERE
    (product_id, year) IN (
        SELECT
            product_id,
            MIN(year) AS year
        FROM Sales
        GROUP BY product_id
    );

SELECT product_id, first_year, quantity, price
FROM (
    SELECT 
        product_id,
        year AS first_year,
        quantity,
        price,
        RANK() OVER (PARTITION BY product_id ORDER BY year) AS row_num
    FROM Sales
) subquery
WHERE row_num = 1;

SELECT s.product_id,
       s.year AS first_year,
       s.quantity,
       s.price
FROM Sales s
JOIN (
    SELECT product_id, MIN(year) AS first_year
    FROM Sales
    GROUP BY product_id
) first_sale ON s.product_id = first_sale.product_id
             AND s.year = first_sale.first_year;

SHOW NOTES

Biggest Single Number

SHOW PROBLEM

Table: MyNumbers

Column Name	Type
num	int

This table may contain duplicates (i.e., there is no primary key in the SQL table). Each row contains an integer.

Problem Description

A single number is a number that appears only once in the MyNumbers table.

The task is to find the largest single number. If there is no single number, return null.

Example 1:

Input: MyNumbers table:

num
8
8
3
3
1
4
5
6

Output:

num
6

Explanation:
The single numbers are 1, 4, 5, and 6. Since 6 is the largest single number, we return it.

Example 2:

Input: MyNumbers table:

num
8
8
7
7
3
3
3

Output:

num
null

Explanation:
There are no single numbers in the input table, so we return null.

Go to Leetcode 🔗

SHOW CODE

SELECT MAX(num) AS num
FROM
    (
        SELECT num
        FROM MyNumbers
        GROUP BY 1
        HAVING COUNT(1) = 1
    ) AS t;

SELECT
    IF(COUNT(num) = 1, num, null) AS num
FROM MyNumbers
GROUP BY num
ORDER BY 1 DESC
LIMIT 1;

Advanced Select & Joins

The Number of Employees Which Report to Each Employee

SHOW PROBLEM

Table: Employees

Column Name	Type
employee_id	int
name	varchar
reports_to	int
age	int

employee_id is the unique identifier for each employee in the table.
This table stores information about employees and the ID of the manager they report to. Some employees may not report to anyone (reports_to is null).
A manager is defined as an employee who has at least one other employee reporting to them.

Problem Description

You are required to write a solution that reports the following information for each manager:

The manager’s employee_id and name.
The number of employees directly reporting to the manager.
The average age of these employees, rounded to the nearest integer.

The results should be ordered by employee_id.

Example 1:

Input:

Employees table:

employee_id	name	reports_to	age
9	Hercy	null	43
6	Alice	9	41
4	Bob	9	36
2	Winston	null	37

Output:

employee_id	name	reports_count	average_age
9	Hercy	2	39

Explanation:

Hercy manages 2 employees (Alice and Bob).
The average age of Alice and Bob is (41 + 36) / 2 = 38.5, which rounds to 39.

Example 2:

Input:

Employees table:

employee_id	name	reports_to	age
1	Michael	null	45
2	Alice	1	38
3	Bob	1	42
4	Charlie	2	34
5	David	2	40
6	Eve	3	37
7	Frank	null	50
8	Grace	null	48

Output:

employee_id	name	reports_count	average_age
1	Michael	2	40
2	Alice	2	37
3	Bob	1	37

Explanation:

Michael manages Alice and Bob. The average age of Alice and Bob is (38 + 42) / 2 = 40.
Alice manages Charlie and David, with an average age of (34 + 40) / 2 = 37.
Bob manages Eve, with an average age of 37.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    e1.employee_id,
    e1.name, 
    COUNT(e1.employee_id) AS reports_count,
    ROUND(AVG(e2.age)) AS average_age
FROM Employees e1
INNER JOIN Employees e2
ON e1.employee_id = e2.reports_to
GROUP BY e1.employee_id
ORDER BY e1.employee_id;

Primary Department for Each Employee

SHOW PROBLEM

Table: Employee

Column Name	Type
employee_id	int
department_id	int
primary_flag	varchar

The combination of (employee_id, department_id) is the primary key for this table, meaning each employee can belong to multiple departments, but each (employee_id, department_id) pair is unique.
employee_id: The ID of the employee.
department_id: The ID of the department to which the employee belongs.
primary_flag: A flag indicating whether the department is the primary department for the employee. It can be one of the following:
- 'Y': The department is the primary department.
- 'N': The department is not the primary department.

Problem Description

Employees may belong to multiple departments, and when they do, they must designate one department as their primary. If an employee belongs to only one department, the primary_flag for that department will be 'N'.

You are tasked with reporting the primary department for each employee. If an employee has only one department, report that department as their primary.

Input:

Employee table:

employee_id	department_id	primary_flag
1	1	N
2	1	Y
2	2	N
3	3	N
4	2	N
4	3	Y
4	4	N

Output:

employee_id	department_id
1	1
2	1
3	3
4	3

Explanation:

For employee 1, their only department is department 1, so it is reported as their primary department.
Employee 2 belongs to two departments (1 and 2). The primary_flag for department 1 is 'Y', so department 1 is their primary department.
Employee 3 only belongs to department 3, so it is reported as their primary department.
Employee 4 belongs to three departments (2, 3, and 4). The primary_flag for department 3 is 'Y', so department 3 is reported as their primary department.

Go to Leetcode 🔗

SHOW CODE

SELECT employee_id, department_id
FROM Employee
WHERE primary_flag = 'Y'
UNION
SELECT employee_id, department_id
FROM Employee
GROUP BY employee_id
HAVING COUNT(1) = 1;

SHOW NOTES

The UNION operator is used to combine the results of two or more SELECT queries into a single result set. By default, UNION removes duplicate rows, ensuring that the final result contains only distinct records. In contrast, UNION ALL includes all rows, even if they are duplicates. It’s important to note that each SELECT statement involved in a UNION operation must contain the same number of columns, and the corresponding columns must have compatable data types.

Triangle Judgement

SHOW PROBLEM

Table: Triangle

Column Name	Type
x	int
y	int
z	int

Primary Key: (x, y, z)
Each row in this table represents the lengths of three line segments.

Task:
Determine whether the three line segments (x, y, z) from each row can form a triangle.
Return the result table in any order, including a column that specifies if the segments form a triangle.

Triangle Formation Rule:
Three segments can form a triangle if and only if the following conditions are met:

$x + y > z$
$x + z > y$
$y + z > x$

Example:

Input:

**Triangle table: **

x	y	z
13	15	30
10	20	15

Output:

x	y	z	triangle
13	15	30	No
10	20	15	Yes

Go to Leetcode 🔗

SHOW CODE

SELECT
    *,
    IF(x + y > z AND x + z > y AND y + z > x, 'Yes', 'No') AS triangle
FROM
Triangle;

SELECT 
    *, 
    CASE 
        WHEN (x + y > z) AND (x + z > y) AND (y + z > x) THEN 'Yes'
        ELSE 'No'
    END AS triangle
FROM 
Triangle;

Consecutive Numbers

SHOW PROBLEM

Table: Logs

Column Name	Type
id	int
num	varchar

id is the primary key for this table.
The id column is an auto-increment column that starts from 1.

Problem Statement:

Find all numbers (num) that appear at least three times consecutively in the table.

Return the result table in any order.

Example:

Input:

Logs table:

id	num
1	1
2	1
3	1
4	2
5	1
6	2
7	2

Output:

ConsecutiveNums
1

Explanation:

The number 1 is the only value that appears at least three times consecutively in the table.

Go to Leetcode 🔗

SHOW CODE

SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1
JOIN Logs l2 ON l1.id = l2.id - 1
JOIN Logs l3 ON l2.id = l3.id - 1
WHERE l1.num = l2.num AND l2.num = l3.num;

SELECT DISTINCT l2.num AS ConsecutiveNums
FROM Logs AS l1
JOIN Logs AS l2 ON l1.id = l2.id - 1 AND l1.num = l2.num
JOIN Logs AS l3 ON l2.id = l3.id - 1 AND l2.num = l3.num;

Product Price at a Given Date

SHOW PROBLEM

Table: Products

Column Name	Type
product_id	int
new_price	int
change_date	date

(product_id, change_date) is the primary key (a combination of columns with unique values) of this table.
Each row indicates that the price of a product was changed to a new price on a specific date.

Problem Statement

Write a query to find the prices of all products on 2019-08-16. Assume the price of all products before any change is 10.

Return the result in any order.

Example:

Input:

Products table:

product_id	new_price	change_date
1	20	2019-08-14
2	50	2019-08-14
1	30	2019-08-15
1	35	2019-08-16
2	65	2019-08-17
3	20	2019-08-18

Output:

product_id	price
2	50
1	35
3	10

Go to Leetcode 🔗

SHOW CODE

WITH
    T AS (SELECT DISTINCT product_id FROM Products),
    P AS (
        SELECT product_id, new_price AS price
        FROM Products
        WHERE
            (product_id, change_date) IN (
                SELECT product_id, MAX(change_date) AS change_date
                FROM Products
                WHERE change_date <= '2019-08-16'
                GROUP BY 1
            )
    )

SELECT product_id, IFNULL(price, 10) AS price
FROM T
LEFT JOIN P USING (product_id);

WITH
    P AS (
        SELECT p1.product_id, new_price, change_date
        FROM
            (
                SELECT DISTINCT product_id
                FROM Products
            ) AS p1
            LEFT JOIN Products AS p2
                ON p1.product_id = p2.product_id AND p2.change_date <= '2019-08-16'
    ),
    T AS (
        SELECT
            *,
            RANK() OVER (
                PARTITION BY product_id
                ORDER BY change_date DESC
            ) AS rk
        FROM P
    )
SELECT product_id, IFNULL(new_price, 10) AS price
FROM T
WHERE rk = 1;

SHOW NOTES

Last Person to Fit in the Bus

SHOW PROBLEM

Table: Queue

Column Name	Type
person_id	int
person_name	varchar
weight	int
turn	int

The person_id column contains unique values.
This table holds information about people waiting to board a bus.
The person_id and turn columns contain values from 1 to n, where n is the number of rows in the table.
The turn column determines the order in which people will board the bus. A turn value of 1 indicates the first person to board, and turn = n indicates the last person to board.
The weight column represents the person’s weight in kilograms.

The bus has a weight limit of 1000 kilograms, so some people may not be able to board if the total weight exceeds this limit. The goal is to identify the name of the last person who can board the bus without exceeding the weight limit.

Only one person can board the bus at any given time, based on their turn.

Input:

Queue table:

person_id	person_name	weight	turn
5	Alice	250	1
4	Bob	175	5
3	Alex	350	2
6	John Cena	400	3
1	Winston	500	6
2	Marie	200	4

Output:

person_name
John Cena

Explanation:

The following table shows the order in which people board the bus, ordered by the turn column for simplicity:

Turn	person_id	person_name	weight	Total Weight
1	5	Alice	250	250
2	3	Alex	350	600
3	6	John Cena	400	1000
4	2	Marie	200	1200
5	4	Bob	175	___
6	1	Winston	500	___

John Cena is the last person who can board the bus without exceeding the weight limit of 1000 kilograms.

Go to Leetcode 🔗

SHOW CODE

WITH
    T AS (
        SELECT
            person_name,
            SUM(weight) OVER (ORDER BY turn) AS acumulated_weight
        FROM Queue
    )
SELECT person_name
FROM T
WHERE acumulated_weight <= 1000
ORDER BY acumulated_weight DESC
LIMIT 1;

Count Salary Categories

SHOW PROBLEM

Table: Accounts

Column Name	Type
account_id	int
income	int

The account_id column is the primary key, meaning it contains unique values for each row.
Each row represents the monthly income of a specific bank account.

Task:

Write a solution to calculate the number of bank accounts in each salary category. The salary categories are defined as:

“Low Salary”: Salaries strictly less than $20,000.
“Average Salary”: Salaries in the inclusive range [$20,000, $50,000].
“High Salary”: Salaries strictly greater than $50,000.

The result table should contain all three categories. If no accounts fall into a category, return 0 for that category.

The result can be returned in any order.

Input:

Accounts table:

account_id	income
3	108939
2	12747
8	87709
6	91796

Output:

category	accounts_count
Low Salary	1
Average Salary	0
High Salary	3

Explanation:

Low Salary: Account 2 has an income of $12,747, which is below $20,000.
Average Salary: There are no accounts with an income between $20,000 and $50,000.
High Salary: Accounts 3, 6, and 8 have incomes above $50,000.

Go to Leetcode 🔗

SHOW CODE

WITH
    S AS (
        SELECT 'Low Salary' AS category
        UNION
        SELECT 'Average Salary'
        UNION
        SELECT 'High Salary'
    ),
    T AS (
        SELECT
            CASE
                WHEN income < 20000 THEN 'Low Salary'
                WHEN income > 50000 THEN 'High Salary'
                ELSE 'Average Salary'
            END AS category,
            COUNT(1) AS accounts_count
        FROM Accounts
        GROUP BY category
    )
SELECT category, IFNULL(accounts_count, 0) AS accounts_count
FROM S
LEFT JOIN T USING (category);

Subqueries

Exchange Seats

SHOW PROBLEM

Table: Seat

Column Name	Type
id	int
student	varchar

id is the primary key for this table.
Each row represents a student’s name and their seat ID. The id starts from 1 and increases consecutively.

Problem Description:

Write a query to swap the seat IDs of every two consecutive students in the Seat table. If the number of students is odd, the last student’s seat ID should remain unchanged.

The result should be ordered by the id column in ascending order.

Input:

Seat table:

id	student
1	Abbot
2	Doris
3	Emerson
4	Green
5	Jeames

Output:

id	student
1	Doris
2	Abbot
3	Green
4	Emerson
5	Jeames

Explanation:

The students Abbot and Doris swap places, as well as Emerson and Green.
Since there are an odd number of students, Jeames’s seat remains unchanged.

Go to Leetcode 🔗

SHOW CODE

SELECT 
    CASE
        WHEN id % 2 = 1 AND id < (SELECT MAX(id) FROM Seat) THEN id + 1
        WHEN id % 2 = 0 THEN id - 1
        ELSE id
    END AS id,
    student
FROM Seat
ORDER BY id;

Movie Rating

SHOW PROBLEM

Table: Movies

Column Name	Type
movie_id	int
title	varchar

movie_id is the primary key (column with unique values) for this table. title is the name of the movie.

Table: Users

Column Name	Type
user_id	int
name	varchar

user_id is the primary key (column with unique values) for this table. The column ’name’ has unique values.

Table: MovieRating

Column Name	Type
movie_id	int
user_id	int
rating	int
created_at	date

(movie_id, user_id) is the primary key (column with unique values) for this table. This table contains the rating of a movie by a user in their review. created_at is the user’s review date.

Problem:

Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.

Result format:

The result should contain two columns:

results – containing the user name with the greatest number of movie ratings.
results – containing the movie name with the highest average rating in February 2020.

Input:

Movies table:

movie_id	title
1	Avengers
2	Frozen 2
3	Joker

Users table:

user_id	name
1	Daniel
2	Monica
3	Maria
4	James

MovieRating table:

movie_id	user_id	rating	created_at
1	1	3	2020-01-12
1	2	4	2020-02-11
1	3	2	2020-02-12
1	4	1	2020-01-01
2	1	5	2020-02-17
2	2	2	2020-02-01
2	3	2	2020-03-01
3	1	3	2020-02-22
3	2	4	2020-02-25

Output:

results
Daniel
Frozen 2

Explanation:

Daniel and Monica have rated 3 movies (“Avengers”, “Frozen 2” and “Joker”), but Daniel is lexicographically smaller.
“Frozen 2” and “Joker” both have an average rating of 3.5 in February, but “Frozen 2” is lexicographically smaller.

Go to Leetcode 🔗

SHOW CODE

(
    SELECT name AS results
    FROM
        Users
        JOIN MovieRating USING (user_id)
    GROUP BY user_id
    ORDER BY COUNT(1) DESC, name
    LIMIT 1
)
UNION ALL
(
    SELECT title
    FROM
        MovieRating
        JOIN Movies USING (movie_id)
    WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
    GROUP BY movie_id
    ORDER BY AVG(rating) DESC, title
    LIMIT 1
);

Index

In MySQL, an index is a database structure that improves the speed of data retrival operations on a table. Indexes are created on one or more columns of a table and allow the database to quickly locate data without scanning the entire table. Indexes are typically implemented using data structures, with the B+ tree being the most common.

There are four main types of indexes in MySQL: primary index, unique index, ordinary index, and full-text index.

A primary index is a unique index that identifies each row in a table. It is automatically created when a primary key is defined.
A unique index ensures that all values in the indexed columns(s) are unique.
An ordinary index improves query performance but does not enforce uniqueness.
A full-text index is used for full-text searches on text-based columns.

SHOW CODE

-- Primary Index
CREATE TABLE employees (
    id INT PRIMARY KEY, -- Primary index
    name VARCHAR(100)
);

-- Unique Index
CREATE UNIQUE INDEX idx_email ON employees(email);

-- Ordinary Index
CREATE INDEX idx_name ON employees(name);

-- Full Text Index
CREATE FULLTEXT INDEX idx_description ON products(description);
SELECT * FROM products
WHERE MATCH(description) AGAINST('database');

Clustered Index vs. Secondary Index

In MySQL, indexes can be categorized into two types: clustered index and secondary index.

A clustered index determines the physical order of data rows in a table. The table data is stored in the order of the clustered index.
A secondary index does not affect the physical order of data rows in a table. Instead, it stores a separate structure that points to the actual data rows.

SHOW CODE

CREATE TABLE employees (
    id INT PRIMARY KEY, -- Clustered index
    name VARCHAR(100),
    salary DECIMAL(10, 2),
    INDEX idx_name (name) -- Secondary index
);

Back-to-Table Queries

SQL Performance Analysis:

SQL Execution Frequency
Slow Query Log
show profiles
explain

Stored Procedure

A stored procedure is a precompiled SQL code block that is stored in a database and can be executed repeatedly. It is particularly useful in scenarios involving complex SQL logic, as it helps streamline operations, improve performance, and enhance code reusability.

SHOW CODE

-- Create a Stored Procedure:
DELIMITER //
CREATE PROCEDURE get_all_users()
BEGIN
    SELECT * FROM users;
END //
DELIMITER ;

-- Call a Procedure**:
CALL get_all_users();

-- List all procedures
show procedure status where db = 'mydatabase';

-- View the specific stored procedure
show create procedure get_all_users;

-- Delete a Stored Procedure:
drop procedure get_all_users;

Variables

Variables in MySQL can be categorized into four types based on their scope: global, session, user-defined, and local.

Global variables and session variables are used to control server behavior.
- Global variables affect all sessions connected to the server.
- Session variables apply only to the current session.
User-defined variables are defined within a session and persist until the session ends.
Local variables are used within stored procedures and have a scope limited to the procedure in which they are declared.

SHOW CODE

# Global Variables
SET GLOBAL max_connections = 200;

-- Session Variables
SET SESSION sql_mode = 'STRICT_TRANS_TABLES';

-- User-Defined Variables
SET @my_var = 100;
SELECT @my_var;

-- Local Variables
DELIMITER //
CREATE PROCEDURE test_proc()
BEGIN
    DECLARE my_local_var INT DEFAULT 10;
    SELECT my_local_var;
END //
DELIMITER ;

Control Flow Statements

MySQL supports various control flow statements, including IF, CASE...WHEN, WHILE, REPEAT, and LOOP.

SHOW CODE

# IF statement
DELIMITER //
CREATE PROCEDURE check_salary(IN emp_id INT)
BEGIN
    DECLARE emp_salary INT;
    SELECT salary INTO emp_salary FROM employees WHERE id = emp_id;
    
    IF emp_salary > 5000 THEN
        SELECT 'High Salary';
    ELSEIF emp_salary BETWEEN 3000 AND 5000 THEN
        SELECT 'Medium Salary';
    ELSE
        SELECT 'Low Salary';
    END IF;
END //
DELIMITER ;

-- CASE WHEN statement
DELIMITER //
CREATE PROCEDURE check_grade(IN marks INT)
BEGIN
    DECLARE grade CHAR(1);
    
    CASE 
        WHEN marks >= 90 THEN SET grade = 'A';
        WHEN marks >= 80 THEN SET grade = 'B';
        WHEN marks >= 70 THEN SET grade = 'C';
        ELSE SET grade = 'F';
    END CASE;
    
    SELECT grade;
END //
DELIMITER ;

-- WHILE
DELIMITER //
CREATE PROCEDURE count_down()
BEGIN
    DECLARE x INT DEFAULT 5;
    
    WHILE x > 0 DO
        SELECT x;
        SET x = x - 1;
    END WHILE;
END //
DELIMITER ;

-- REPEAT (similar to do...while in Java)
DELIMITER //
CREATE PROCEDURE repeat_example()
BEGIN
    DECLARE x INT DEFAULT 1;

    REPEAT
        SELECT x;
        SET x = x + 1;
    UNTIL x > 5
    END REPEAT;
END //
DELIMITER ;

-- LOOP (Use an explicit LEAVE to exit)
DELIMITER //
CREATE PROCEDURE loop_example()
BEGIN
    DECLARE x INT DEFAULT 1;
    
    my_loop: LOOP
        SELECT x;
        SET x = x + 1;
        
        IF x > 5 THEN
            LEAVE my_loop;
        END IF;
    END LOOP;
END //
DELIMITER ;

Curosor and Handler

In MySQL, a cursor is a database object that enables row-by-row processing of a result set. On the other hand, a handler is used to manage exceptions or errors that may occur during execution. Cursors and handlers are commonly used in stored procedures when dealing with result sets that contain multiple rows and require row-by-row processing.

SHOW CODE

DELIMITER //
CREATE PROCEDURE fetch_salaries()
BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE emp_name VARCHAR(100);
    DECLARE emp_salary INT;

    DECLARE emp_cursor CURSOR FOR SELECT name, salary FROM employees;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    OPEN emp_cursor;

    read_loop: LOOP
        FETCH emp_cursor INTO emp_name, emp_salary;
        IF done THEN
            LEAVE read_loop;
        END IF;

        SELECT emp_name, emp_salary;
    END LOOP;

    CLOSE emp_cursor;
END //
DELIMITER ;

Trigger

In MySQL, a trigger is a data object that automatically executes a specified set of sql statements before or after certain events occurred, such as INSERT, UPDATE, or DELETE. It is commonly used in audit logging. MySQL only supports row-level trigger, meaning a trigger is activated FOR EACH ROW affected by INSERT, UPDATE, or DELETE operation.

SHOW CODE

-- Create a trigger
CREATE TRIGGER before_employee_update
BEFORE UPDATE ON employees
FOR EACH ROW
BEGIN
    INSERT INTO audit_log (employee_id, action, change_date)
    VALUES (OLD.id, 'UPDATE', NOW());
END;

-- View triggers
SHOW TRIGGERS;
SHOW TRIGGERS FROM database_name;

-- Delete a trigger
DROP TRIGGER IF EXISTS before_employee_update;

Locking

In MySQL, locking is a mechanism used to control database access, ensuring data integrity and consistency when multiple sessions interact with the databases. MySQL supports three main types of locks: global-level locks, table-level locks, and low-level locks.

Global-Level Lock

The global-level lock is a coarse-grained lock that blocks the entire database. It is commonly used during operations like data backups. Due to its coarse granularity, any updates to the database are blocked while the lock is active. Additionally, when the global lock is in place, secondary databases cannot read the binary log (binlog) from the primary database, leading to replication delays between the primary and secondary databases.

For InnoDB tables, it is generally recommended to use the --single-transaction option during data backups. This approach ensures data consistency without acquiring a global read lock, allowing updates to continue uninterrupted.

SHOW CODE

-- From mysql client
FLUSH TABLES WITH READ LOCK;

-- From bash
mysqldump -u username -p --single-transaction database_name > backup.sql

UNLOCK TABLES;

Table-Level Lock

The table-level lock is a mechanism that locks an entire table to control database access when multiple sessions interact with the database. There are three main types of table-level locks: table locks, metadata locks, and intention locks.

Table locks include read locks and write locks:
- When a session acquires a read lock, other sessions can also read data but cannot modify it.
- When a session acquire a write lock, other sessions cannot read or write data from the locked table.
Metadata lock is a type of lock that protects the structure of a database object when it is being accessed or modified by a session. It it automatically managed by MySQL. There are two main types of metadata locks: shared metadata locks and exclusive metadata locks.
- Shared metadata locks include SHARED_READ and SHARED_WRITE locks. These are acquired by queries, modification, or transactions.
  - A SHARED_READ lock allows other sessions to read the locked data.
  - A SHARED_WRITE lock allows other sessions to read but not modify the locked data.
- The exclusive metadata lock is acquired by operations that modify the table’s structure, such as ALTER TABLE. It prevents other sessions from accessing the table until the operation completes.
An intention lock is a type of lock used by the InnoDB storage engine to indicate a transaction’s intention to acquire row-level locks at a finer granularity. It works in conjunction with row-level locks, helping to avoid confilicts between transactions that might lock the same table at different levels (e.g., table-level vs. row-level locks). There are two main types of intention locks: Intention Shared Lock (IS) and Intention Exclusive Lock (IX).
- The Intention Shared Lock (IS) indicates that a transaction intends to place shared wor-level locks on some rows in the table. It allows other transactions to acquire IS locks or table-level shared locks but blocks table-level exclusive locks.
- The Intention Exclusive Lock (IX) indicates that a transaction intends to place exlcusive row-level locks on some rows in the table. It allows other transactions to acquire IS locks but blocks table-level shared locks and table-level exclusive locks.
  
  Lock Type IS IX S X
  
  IS ✅ ✅ ✅ ❌
  
  IX ✅ ✅ ❌ ❌
  
  S (Shared) ✅ ❌ ✅ ❌
  
  X (Exclusive) ❌ ❌ ❌ ❌

SHOW CODE

--- TABLE LOCK ---
-- Read lock
LOCK TABLES sales READ;
SELECT * FROM sales WHERE sale_date = '2023-10-01';
UNLOCK TABLES;

-- Write lock
LOCK TABLES products WRITE;
UPDATE products SET price = price * 1.1; 
UNLOCK TABLES;

--- META DATA LOCK ---
-- Session 1 starts a transaction and reads from a table
START TRANSACTION;
SELECT * FROM employees;
-- Session 2 attempts to alter the table
ALTER TABLE employees ADD COLUMN salary DECIMAL(10, 2);
-- Session 2 will block until Session 1 releases its shared metadata lock

-- Session 1 starts an `ALTER TABLE` operation
ALTER TABLE employees ADD COLUMN department VARCHAR(50);
-- session 2 attempts to query the table
SELECT * FROM employees;
-- Session 2 will block until Session 1 comoletes the `ALTER TABLE` operation

--- INTENSION LOCK ---
-- Transaction A
-- Transaction A acquires an Intention Shared Lock (IS) on the employees table.
-- It then acquires a Shared Row-Level Lock (S) on the row with id = 1.
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction B
-- Transaction B acquires an Intention Exclusive Lock (IX) on the employees table.
-- It then acquires an Exclusive Row-Level Lock (X) on the row with id = 2.
START TRANSACTION;
UPDATE employees SET salary = 65000 WHERE id = 2;

-- Transaction C
-- Transaction C attempts to acquire an Exclusive Table-Level Lock (X) on the employees table.
-- It will block until Transaction A and Transaction B release their locks.
ALTER TABLE employees ADD COLUMN department VARCHAR(50);

Row-Level Lock

In MySQL, a row-level lock is a fine-grained locking mechanism that locks individual rows instead of the entire table. There are two main types of row-level locks: Shared Lock and Exclusive Lock.

A Shared Lock allows other transactions to read the same row but prevents them from modifying it.
An **Exclusive Lock prevents other transactions from reading or modifying the same row until the lock is released.

Row-level locks are supported by the InnoDB storage engine to ensure data consistency and integrity. They are automatically acquired and released based on the transaction’s operations.

SHOW CODE

-- Transaction A reads a row with a shared lock
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction B can also reads the same row
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;

-- Transaction C cannot modify the same row until the lock is released
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1; -- Blocks

-- Transaction A updates a row with an exclusive lock
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1;

-- Transaction B cannot read or modify the same row until the lock is released
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE; -- Blocks

InnoDB provides additional locking mechanisms, such as gap locks and next-key locks, to ensure data consistency and integrity and prevent issues like phantom reads. By default, InnoDB operates at the REPEATABLE READ isolation level, where it uses next-key locks for searches and index scans to avoid phantom reads.

A gap lock locks the intervals between rows in an index, preventing new rows from being inserted into those gaps. It does not lock the rows themselves but rather the gaps between them. The primary purpose of a gap lock is to prevent other transactions from inseting data into the blocked gap. Gap locks are coexistent, meaning they do not block other transactions from locking the same gap.
A next-key lock is a combination of a row lock and a gap lock. It locks both the row and the gap before it.

SHOW CODE

| id  | name      | salary  |
|-----|-----------|---------|
| 1   | John Doe  | 50000   |
| 3   | Jane Smith| 60000   |
| 5   | Alice Lee | 55000   |

SELECT * FROM employees WHERE id BETWEEN 2 AND 4 FOR UPDATE;
-- The gap lock will be placed on the interval (1, 3) and (3, 5)

SELECT * FROM employees WHERE id = 3 FOR UPDATE;
-- The next-key lock will be placed on the row with id = 3 and the gap (1, 3)

Lock Type	IS	IX	S	X
IS	✅	✅	✅	❌
IX	✅	✅	❌	❌
S (Shared)	✅	❌	✅	❌
X (Exclusive)	❌	❌	❌	❌

Introduction to DBMS#

MySQL#

SQL Commands#

SQL Query Execution Order#

Select#

Find Customer Referee#

Article Views I#

Baisc Joins#

LEFT JOIN: Replace Employee ID With The Unique Identifier#

LEFT JOIN: Confirmation Rate#

INNER JOIN: Product Sales Analysis I#

LEFT JOIN: Customer Who Visited but Did Not Make Any Transactions#

SELF JOIN: Rising Temperature#

SELF JOIN: Managers with at Least 5 Direct Reports#

SELF JOIN: Average Time of Process per Machine#

CROSS JOIN: Students and Examinations#

Basic Aggregate Functions#

Average Selling Price#

Percentage of Users Attended a Contest#

Queries Quality and Percentage#

Monthly Transactions I#

Immediate Food Delivery II#

Game Play Analysis IV#

Sorting and Grouping#

User Activity for the Past 30 Days I#

Product Sales Analysis III#

Biggest Single Number#

Advanced Select & Joins#

The Number of Employees Which Report to Each Employee#

Primary Department for Each Employee#

Triangle Judgement#

Consecutive Numbers#

Product Price at a Given Date#

Last Person to Fit in the Bus#

Count Salary Categories#

Subqueries#

Exchange Seats#

Movie Rating#

Index#

Clustered Index vs. Secondary Index#

Back-to-Table Queries#

Stored Procedure#

Variables#

Control Flow Statements#

Curosor and Handler#

Trigger#

Locking#

Global-Level Lock#

Table-Level Lock#

Row-Level Lock#

Introduction to DBMS

MySQL

SQL Commands

SQL Query Execution Order

Select

Find Customer Referee

Article Views I

Baisc Joins

LEFT JOIN: Replace Employee ID With The Unique Identifier

LEFT JOIN: Confirmation Rate

INNER JOIN: Product Sales Analysis I

LEFT JOIN: Customer Who Visited but Did Not Make Any Transactions

SELF JOIN: Rising Temperature

SELF JOIN: Managers with at Least 5 Direct Reports

SELF JOIN: Average Time of Process per Machine

CROSS JOIN: Students and Examinations

Basic Aggregate Functions

Average Selling Price

Percentage of Users Attended a Contest

Queries Quality and Percentage

Monthly Transactions I

Immediate Food Delivery II

Game Play Analysis IV

Sorting and Grouping

User Activity for the Past 30 Days I

Product Sales Analysis III

Biggest Single Number

Advanced Select & Joins

The Number of Employees Which Report to Each Employee

Primary Department for Each Employee

Triangle Judgement

Consecutive Numbers

Product Price at a Given Date

Last Person to Fit in the Bus

Count Salary Categories

Subqueries

Exchange Seats

Movie Rating

Index

Clustered Index vs. Secondary Index

Back-to-Table Queries

Stored Procedure

Variables

Control Flow Statements

Curosor and Handler

Trigger

Locking

Global-Level Lock

Table-Level Lock

Row-Level Lock