A Database Management System (DBMS) is software that provides an interface for users to interact with databases. It enables the management, storage, and manipulation of data within databases. DBMS can be broadly classified into two main types: Relational DBMS and NoSQL DBMS.
In a Relational DBMS, data is stored in structured tables and is manipulated using SQL (Structured Query Language). Examples of relational DBMS inlcude MySQL, Oracle, and PostgreSQL.
In a NoSQL DBMS, data is typically unstructured or semi-structured, offering flexibility to handle diverse data types. Examples of NoSQL DBMS include MongoDB and Redis.
MySQL is a relational database management system (RDBMS) that uses Structured Query Language (SQL) to interact with databases. It stores data in tables made up of rows and columns, allowing for efficient organization, retrival, and manipulation of data.
SQL Commands are used to interact with relational databases. These commands can be classified into five types: DDL, DQL, DML, DCL, and TCL.
SQL Commands
DDL (Data Definition Language): Used to define and manage the structure of database instances, such as creating, altering, and deleting tables, and defining constraints.
DQL (Data Query Language): Used to retrive data from the database, primarily through the SELECT statement.
DML (Data Manipulation Language): Used to manipulate data within tables, such as inserting, updating, and deleting records.
DCL (Data Control Language): Used to control access to data, manage user priviledges, and ensure data security. It includes commands like GRANT and REVOKE.
TCL (Transaction Control Language): Used to manage transactions, ensuring data consistency. It includes commands like COMMIT, ROLLBACK, and SAVEPOINT.
MySQL Execution Order: In MySQL, the execution order of an SQL query differs from the order in which components (e.g., SELECT, FROM, WHERE) are written in the query. MySQL begins by identifying the tables or views specified in the FROM clause. If there are any JOIN operation, MySQL performs the joins first and applies the ON condition. After completing the join operations, MySQL filters the based on the conditions specified in the WHERE clause.
If a GROUP BY clause is present, MySQL groups the rows based on the specified column(s) and then executes any aggregation functions (e.g., SUM(), COUNT(), etc.). Following the grouping, if a HAVING clause is provided, MySQL further filters the grouped results based on the specified condition.
After filtering grouped results, MySQL selects the specified columns in the SELECT clause. If the DISTINCT keyword is used, MySQL removes duplicate rows from the selected result. Once the rows are selected and duplicates are removed, MySQL sorts the result set according to the rules defined in the ORDER BY clause.
Finally, if a LIMIT or OFFSET clause is included, MySQL limits the number of rows returned, starting from the specified OFFSET.
Each row represents a customer, their name, and the id of the customer who referred them. If a customer wasn’t referred by anyone, the referee_id is NULL.
Problem:
Find the names of customers who were not referred by the customer with id = 2.
SELECT name FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL;
SHOW NOTES
In SQL, NULL is not equal to any value, including 2. Therefore, the condition referee_id != 2 will exclude records where referee_id is NULL, because comparisons with NULL always result in an unknown NULL value, which is neither TRUE nor FALSE. It is crucial to consider the handling of NULL values when working with databases.
SELECT DISTINCT author_id AS id FROM Views
WHERE author_id = viewer_id
ORDER BY id ASC;
SHOW NOTES
In SQL, the DISTINCT keyword is used to remove duplicate records from the result set, ensuring that only unique values are returned. It is commonly used in scenarios such as aggregating distinct values and ensuring uniqueness in join queries.
# Aggregating uniquevaluesSELECTDISTINCT product_id
FROM Order_Items;
# Ensuring unique results injoin queries
SELECTDISTINCTc.customer_id, c.name
FROM Customers cJOIN Orders o ONc.customer_id = o.customer_id;
LEFT JOIN: Replace Employee ID With The Unique Identifier#
In MySQL, LEFT JOIN is used to combine rows from two or more tables based on a related column between them. It returns all rows from the left table and the matching rows from the right table. If there is no match, the result will include NULL values for the column from the right table.
SHOW PROBLEM
Tables:
Employees
Column Name
Type
id
int
name
varchar
id is the primary key for this table.
Each row represents an employee in the company, containing their id and name.
EmployeeUNI
Column Name
Type
id
int
unique_id
int
(id, unique_id) is the primary key, meaning the combination of id and unique_id is unique across the table.
Each row represents an employee’s id and their corresponding unique_id in the company.
Task:
Write a SQL query that returns the unique_id of each employee from the Employees table. If an employee does not have a corresponding unique_id, return NULL for that employee.
Query Requirements:
If an employee has a unique_id, show it.
If an employee does not have a unique_id, return NULL.
The result should display the unique_id and the name of each employee.
The results can be returned in any order.
Example:
Input:
Employees table:
id
name
1
Alice
7
Bob
11
Meir
90
Winston
3
Jonathan
EmployeeUNI table:
id
unique_id
3
1
11
2
90
3
Output:
unique_id
name
NULL
Alice
NULL
Bob
2
Meir
3
Winston
1
Jonathan
Explanation:
Alice and Bob do not have a unique_id, so NULL is shown for both.
Each row represents the signup time for the user identified by user_id.
Table: Confirmations
Column Name
Type
user_id
int
time_stamp
datetime
action
ENUM
The primary key is a combination of user_id and time_stamp.
user_id is a foreign key referencing the Signups table.
action is an ENUM with two possible values: 'confirmed' or 'timeout'.
Each row represents a confirmation request made by the user with user_id, indicating whether the confirmation message was confirmed ('confirmed') or timed out ('timeout').
Problem Description
The confirmation rate of a user is calculated as the ratio of 'confirmed' actions to the total number of confirmation requests. If a user has not made any confirmation requests, the confirmation rate is 0. The rate should be rounded to two decimal places.
Goal
Write a SQL query to find the confirmation rate for each user.
Return the result in any order.
Input
Signups table:
user_id
time_stamp
3
2020-03-21 10:16:13
7
2020-01-04 13:57:59
2
2020-07-29 23:09:44
6
2020-12-09 10:39:37
Confirmations table:
user_id
time_stamp
action
3
2021-01-06 03:30:46
timeout
3
2021-07-14 14:00:00
timeout
7
2021-06-12 11:57:29
confirmed
7
2021-06-13 12:58:28
confirmed
7
2021-06-14 13:59:27
confirmed
2
2021-01-22 00:00:00
confirmed
2
2021-02-28 23:59:59
timeout
Output
user_id
confirmation_rate
6
0.00
3
0.00
7
1.00
2
0.50
Explanation:
User 6 did not request any confirmation messages, so their confirmation rate is 0.
User 3 made two confirmation requests, but both timed out, so their confirmation rate is 0.
User 7 made three requests, and all were confirmed, so their confirmation rate is 1.00.
User 2 made two requests: one confirmed and one timed out, so their confirmation rate is 1/2 = 0.50.
SELECT
user_id,
ROUND(IFNULL(SUM(action = 'confirmed') / COUNT(1), 0), 2) AS confirmation_rate
FROM
SignUps
LEFT JOIN Confirmations USING (user_id)
GROUP BY 1;
SELECT s.user_id,
CASE WHEN c.user_id IS NULL THEN 0.00
ELSE ROUND(SUM(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END) / COUNT(*),2)
END
AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON c.user_id = s.user_id
GROUP BY s.user_id
SHOW NOTES
SUM(action = 'confirmed'): action = 'confirmed' returns 1 for confirmed actions and 0 for 'timeout' action.
COUNT(1): Counts the total number of rows. (Use COUNT(*) for counting rows is more clear and idiomatic).
LEFT JOIN Confirmations USING (user_id): Joins the Signups table with the Confirmations table on the user_id column.
GROUP BY 1: A shorthand for GROUP BY user_id. It groups the results by the first column in the SELECT list, which is user_id in this case.
In MySQL, the INNER JOIN is used to combine rows from two or more tables based on a related column between them. It only returns rows where there is a match in all the joined tables.
SHOW PROBLEM
Tables:
Sales
Column Name
Type
sale_id
int
product_id
int
year
int
quantity
int
price
int
(sale_id, year) is the primary key, meaning this combination of columns is unique.
product_id is a foreign key referencing the Product table.
Each row represents a sale of a product identified by product_id in a particular year.
The price is per unit of the product.
Product
Column Name
Type
product_id
int
product_name
varchar
product_id is the primary key for this table.
Each row represents a product with its product_name.
Task:
Write a SQL query to report the product_name, year, and price for each sale in the Sales table.
Query Requirements:
Return the product_name from the Product table, corresponding to the product_id in the Sales table.
Include the year and price from the Sales table for each sale.
The result can be returned in any order.
Example:
Input:
Sales table:
sale_id
product_id
year
quantity
price
1
100
2008
10
5000
2
100
2009
12
5000
7
200
2011
15
9000
Product table:
product_id
product_name
100
Nokia
200
Apple
300
Samsung
Output:
product_name
year
price
Nokia
2008
5000
Nokia
2009
5000
Apple
2011
9000
Explanation:
From sale_id = 1, we can conclude that Nokia was sold for 5000 in 2008.
From sale_id = 2, we can conclude that Nokia was sold for 5000 in 2009.
From sale_id = 7, we can conclude that Apple was sold for 9000 in 2011.
SELECT v.customer_id, COUNT(v.visit_id) as count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id IS NULL
GROUP BY v.customer_id;
SHOW NOTES
In the above SQL query, MySQL first identifies the Visits table and performs a LEFT JOIN with the Transactions table based on the condition v.visit_id = t.visit_id. The LEFT JOIN operation combines the two tables and fills NULL in rows where there is no match in the Transactions table. The result looks like this:
SELECT*FROM Visits v
LEFTJOIN Transactions t
ON v.visit_id = t.visit_id;
visit_id
customer_id
transaction_id
visit_id
amount
1
23
12
1
910
2
9
13
2
970
4
30
null
null
null
5
54
9
5
200
5
54
3
5
300
5
54
2
5
310
6
96
null
null
null
7
54
null
null
null
8
54
null
null
null
After joining the tables, MySQL applies the WHERE clause to filter rows where t.transaction_id IS NULL, which means visits without any transactions.
SELECT*FROM Visits v
LEFTJOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id ISNULL;
visit_id
customer_id
transaction_id
visit_id
amount
4
30
null
null
null
6
96
null
null
null
7
54
null
null
null
8
54
null
null
null
Next, MySQL groups the filtered rows by customer_id and calculates the number of visits per customer using the COUNT(v.visit_id) function.
SELECT*, COUNT(v.visit_id)
FROM Visits v
LEFTJOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id ISNULLGROUPBY v.customer_id;
visit_id
customer_id
transaction_id
visit_id
amount
COUNT(v.visit_id)
4
30
null
null
null
1
6
96
null
null
null
1
7
54
null
null
null
2
Finally, MySQL selects the desired columns and renames the COUNT(v.visit_id) column using the AS clause.
SELECT v.customer_id, COUNT(v.visit_id) AS count_no_trans
FROM Visits v
LEFTJOIN Transactions t
ON v.visit_id = t.visit_id
WHERE t.transaction_id ISNULLGROUPBY v.customer_id;
In MySQL, a self join is an operation where a table joins itself. It is commonly used when comparing rows within the same table based on a specific condition or when working with hierarchical data, such as relating employees to their managers.
SHOW PROBLEM
Table: Weather
Column Name
Type
id
int
recordDate
date
temperature
int
id is a column with unique values.
There are no duplicate rows for the same recordDate.
This table contains information about the temperature on specific dates.
Problem Statement
Write a solution to find the id of all dates where the temperature is higher compared to the previous day (yesterday).
Return
Return the result table in any order.
Input:
Weather Table:
id
recordDate
temperature
1
2015-01-01
10
2
2015-01-02
25
3
2015-01-03
20
4
2015-01-04
30
Output:
id
2
4
Explanation:
On 2015-01-02, the temperature was higher than the previous day (10 -> 25).
On 2015-01-04, the temperature was higher than the previous day (20 -> 30).
SELECT w1.id
FROM Weather w1
JOIN Weather w2
ON DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate
WHERE w1.temperature > w2.temperature;
SHOW NOTES
In the above query, the DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate condition ensures that the recordDate of w1 matches the day following w2’s recordDate.
The table records user activities for machines on a factory website.
The combination of (machine_id, process_id, activity_type) is the primary key, ensuring uniqueness in the table.
machine_id is the ID of the machine.
process_id is the ID of the process running on the machine with ID machine_id.
activity_type is an ENUM type with values 'start' and 'end', representing the start and end of a process.
timestamp is a float value representing the time (in seconds) the event occurred.
The 'start' timestamp is always earlier than the 'end' timestamp for each (machine_id, process_id) pair.
It is guaranteed that each (machine_id, process_id) pair has a corresponding 'start' and 'end' timestamp.
Problem:
There are several machines on a factory website, and each machine runs the same number of processes. Your task is to write a SQL query that calculates the average time each machine takes to complete a process.
The time to complete a process is the difference between the 'end' timestamp and the 'start' timestamp.
The average time for each machine is calculated by dividing the total time for all processes on that machine by the number of processes.
The result should contain the following columns:
machine_id โ the ID of the machine.
processing_time โ the average processing time, rounded to 3 decimal places.
SELECT machine_id,
ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
SELECT a.machine_id,
a.process_id,
a.timestamp AS start_time,
b.timestamp AS end_time
FROM Activity a
JOIN Activity b ON a.machine_id = b.machine_id
AND a.process_id = b.process_id
AND a.activity_type = 'start'
AND b.activity_type = 'end'
) AS process_times
GROUP BY machine_id;
SHOW NOTES
In the following SQL query, the condition a.activity_type = 'start' and b.activity_type = 'end' ensures that the start and end timestamp are correctly matched for the same process on the same machine.
SELECT*FROM Activity a
JOIN Activity b ON a.machine_id = b.machine_id
AND a.process_id = b.process_id
AND a.activity_type ='start'AND b.activity_type ='end'
Result of the Join:
machine_id
process_id
activity_type
timestamp
machine_id
process_id
activity_type
timestamp
0
0
start
0.712
0
0
end
1.52
0
1
start
3.14
0
1
end
4.12
1
0
start
0.55
1
0
end
1.55
1
1
start
0.43
1
1
end
1.42
2
0
start
4.1
2
0
end
4.512
2
1
start
2.5
2
1
end
5
The result is selected from of the joined result:
SELECT machine_id,
ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
SELECT a.machine_id,
a.process_id,
a.timestampAS start_time,
b.timestampAS end_time
FROM Activity a
JOIN Activity b ON a.machine_id = b.machine_id
AND a.process_id = b.process_id
AND a.activity_type ='start'AND b.activity_type ='end') AS process_times
GROUPBY machine_id;
In MySQL, a CROSS JOIN is a type of join that returns the Cartesian product of two tables. It combines each row from the first table with every row from the second table, without the need for a join condition. For example, consider two tables, Students and Subjects:
Table 1: Students
student_id
student_name
1
Alice
2
Bob
Table 2: Subjects
subject_name
Math
Physics
When a Cross Join is performed between these two tables, the result will be:
Table: Result of Cross Join
student_id
student_name
subject_name
1
Alice
Math
1
Alice
Physics
2
Bob
Math
2
Bob
Physics
SHOW PROBLEM
Table: Students
Column Name
Type
student_id
int
student_name
varchar
student_id is the primary key (unique values) for this table.
Each row represents a student with their unique ID and name.
Table: Subjects
Column Name
Type
subject_name
varchar
subject_name is the primary key (unique values) for this table.
Each row represents the name of a subject offered in the school.
Table: Examinations
Column Name
Type
student_id
int
subject_name
varchar
There is no primary key for this table, and it may contain duplicates.
Each row indicates that a student with student_id attended the exam for the subject subject_name.
Every student takes every course from the Subjects table.
Task:
Write a query to find the number of times each student attended each exam, ordered by student_id and subject_name.
Input:
Students Table:
student_id
student_name
1
Alice
2
Bob
13
John
6
Alex
Subjects Table:
subject_name
Math
Physics
Programming
Examinations Table:
student_id
subject_name
1
Math
1
Physics
1
Programming
2
Programming
1
Physics
1
Math
13
Math
13
Programming
13
Physics
2
Math
1
Math
Output:
student_id
student_name
subject_name
attended_exams
1
Alice
Math
3
1
Alice
Physics
2
1
Alice
Programming
1
2
Bob
Math
1
2
Bob
Physics
0
2
Bob
Programming
1
6
Alex
Math
0
6
Alex
Physics
0
6
Alex
Programming
0
13
John
Math
1
13
John
Physics
1
13
John
Programming
1
Explanation:
The result table contains all students and all subjects.
Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.
Alex did not attend any exams.
John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.
SELECT
s.student_id,
s.student_name,
sub.subject_name,
COUNT(e.student_id) AS attended_exams
FROM
Students s
CROSS JOIN
Subjects sub
LEFT JOIN
Examinations e
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUP BY
s.student_id,
s.student_name,
sub.subject_name
ORDER BY
s.student_id,
sub.subject_name;
SHOW NOTES
Table: Students
student_id
student_name
1
Alice
2
Bob
13
John
6
Alex
Table: Subjects:
subject_name
Math
Physics
Programming
When a CROSS JOIN is performed between the Students and Subjects tables, the result will be:
SELECT*FROM Students s
CROSSJOIN Subjects sub
Table: Result of Cross Join:
student_id
student_name
subject_name
1
Alice
Programming
1
Alice
Physics
1
Alice
Math
2
Bob
Programming
2
Bob
Physics
2
Bob
Math
13
John
Programming
13
John
Physics
13
John
Math
6
Alex
Programming
6
Alex
Physics
6
Alex
Math
Table: Examniations:
student_id
subject_name
1
Math
1
Physics
1
Programming
2
Programming
1
Physics
1
Math
13
Math
13
Programming
13
Physics
2
Math
1
Math
Perform a LEFT JOIN on the cross-joined tables with the Examinations table:
SELECT*FROM Students s
CROSSJOIN Subjects sub
LEFTJOIN Examinations e
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
Table: Left Join Result:
student_id
student_name
subject_name
student_id
subject_name
1
Alice
Programming
1
Programming
1
Alice
Physics
1
Physics
1
Alice
Physics
1
Physics
1
Alice
Math
1
Math
1
Alice
Math
1
Math
1
Alice
Math
1
Math
2
Bob
Programming
2
Programming
2
Bob
Physics
null
null
2
Bob
Math
2
Math
13
John
Programming
13
Programming
13
John
Physics
13
Physics
13
John
Math
13
Math
6
Alex
Programming
null
null
6
Alex
Physics
null
null
6
Alex
Math
null
null
After grouping the left joined result and count the e.studentId, the table will look like:
SELECT*, COUNT(e.student_id) AS attended_exams
FROM Students s
CROSSJOIN Subjects sub
LEFTJOIN Examinations e
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUPBY s.student_id,
s.student_name,
sub.subject_name
Table: Grouped Result with Count:
student_id
student_name
subject_name
student_id
subject_name
attended_exams
1
Alice
Programming
1
Programming
1
1
Alice
Physics
1
Physics
2
1
Alice
Math
1
Math
3
2
Bob
Programming
2
Programming
1
2
Bob
Physics
null
null
0
2
Bob
Math
2
Math
1
13
John
Programming
13
Programming
1
13
John
Physics
13
Physics
1
13
John
Math
13
Math
1
6
Alex
Programming
null
null
0
6
Alex
Physics
null
null
0
6
Alex
Math
null
null
0
After selecting the specified columns, the result table becomes:
SELECT s.student_id,
s.student_name,
sub.subject_name,
COUNT(e.student_id) AS attended_exams
FROM Students s
CROSSJOIN Subjects sub
LEFTJOIN Examinations e
ON s.student_id = e.student_id AND sub.subject_name = e.subject_name
GROUPBY s.student_id,
s.student_name,
sub.subject_name
student_id
student_name
subject_name
attended_exams
1
Alice
Programming
1
1
Alice
Physics
2
1
Alice
Math
3
2
Bob
Programming
1
2
Bob
Physics
0
2
Bob
Math
1
13
John
Programming
1
13
John
Physics
1
13
John
Math
1
6
Alex
Programming
0
6
Alex
Physics
0
6
Alex
Math
0
Sort the selcted table based on the condition s.student_id and sub.subject_name, return the final result:
The primary key for this table is the combination of (product_id, start_date, end_date), ensuring unique periods for each product.
Each row represents the price of a product for a specific period, from start_date to end_date.
For each product, no two periods will overlap.
Table: UnitsSold
Column Name
Type
product_id
int
purchase_date
date
units
int
This table may contain duplicate rows.
Each row represents the date, number of units, and product_id for a product sold on that date.
Problem:
Write a SQL query to find the average selling price for each product. The average_price should be rounded to two decimal places. If a product does not have any sold units, its average selling price should be considered 0.
Return the result table in any order.
Input:
Prices table:
product_id
start_date
end_date
price
1
2019-02-17
2019-02-28
5
1
2019-03-01
2019-03-22
20
2
2019-02-01
2019-02-20
15
2
2019-02-21
2019-03-31
30
UnitsSold table:
product_id
purchase_date
units
1
2019-02-25
100
1
2019-03-01
15
2
2019-02-10
200
2
2019-03-22
30
Output:
product_id
average_price
1
6.96
2
16.96
Explanation:
For product 1:
From 2019-02-17 to 2019-02-28, 100 units were sold at a price of 5.
From 2019-03-01 to 2019-03-22, 15 units were sold at a price of 20.
Average price is calculated based on the total price and total units sold.
For product 2:
From 2019-02-01 to 2019-02-20, 200 units were sold at a price of 15.
From 2019-02-21 to 2019-03-31, 30 units were sold at a price of 30.
Average price is calculated based on the total price and total units sold.
SELECT
p.product_id,
ROUND(IFNULL(SUM(price * units) / SUM(units), 0), 2) AS average_price
FROM
Prices AS p
LEFT JOIN UnitsSold AS u
ON p.product_id = u.product_id
AND purchase_date BETWEEN start_date AND end_date
GROUP BY 1;
SHOW NOTES: Right Answer
MySQL Query Walkthrough:
Table: Prices
product_id
start_date
end_date
price
1
2019-02-17
2019-02-28
5
1
2019-03-01
2019-03-22
20
2
2019-02-01
2019-02-20
15
2
2019-02-21
2019-03-31
30
3
2019-02-21
2019-03-31
30
Table: UnitsSold
product_id
purchase_date
units
1
2019-02-25
100
1
2019-03-01
15
2
2019-02-10
200
2
2019-03-22
30
Join the two tables based on the product_id and the purchase_date, where the purchase_date falls between the start_date and the end_date:
SELECT*FROM Prices AS p
LEFTJOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date;
product_id
start_date
end_date
price
product_id
purchase_date
units
1
2019-02-17
2019-02-28
5
1
2019-02-25
100
1
2019-03-01
2019-03-22
20
1
2019-03-01
15
2
2019-02-01
2019-02-20
15
2
2019-02-10
200
2
2019-02-21
2019-03-31
30
2
2019-03-22
30
3
2019-02-21
2019-03-31
30
null
null
null
Group the joined result and then use aggregation functions to calculate the average price:
SELECT*,
ROUND(IFNULL(SUM(price * units) /SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFTJOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUPBY1;
product_id
start_date
end_date
price
product_id
purchase_date
units
average_price
1
2019-02-17
2019-02-28
5
1
2019-02-25
100
6.96
2
2019-02-01
2019-02-20
15
2
2019-02-10
200
16.96
3
2019-02-21
2019-03-31
30
null
null
null
0
Select the desired columns:
SELECT p.product_id,
ROUND(IFNULL(SUM(price * units) /SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFTJOIN UnitsSold AS u
ON p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date
GROUPBY1;
product_id
average_price
1
6.96
2
16.96
3
0
SHOW NOTES: Wrong Answer
# Wrong Answer
SELECT p.product_id,
ROUND(IFNULL(SUM(price * units) /SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFTJOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUPBY1;
Join the two tables based on the condition p.product_id = u.product_id:
SELECT*FROM Prices AS p
LEFTJOIN UnitsSold AS u ON p.product_id = u.product_id;
product_id
start_date
end_date
price
product_id
purchase_date
units
1
2019-02-17
2019-02-28
5
1
2019-03-01
15
1
2019-02-17
2019-02-28
5
1
2019-02-25
100
1
2019-03-01
2019-03-22
20
1
2019-03-01
15
1
2019-03-01
2019-03-22
20
1
2019-02-25
100
2
2019-02-01
2019-02-20
15
2
2019-03-22
30
2
2019-02-01
2019-02-20
15
2
2019-02-10
200
2
2019-02-21
2019-03-31
30
2
2019-03-22
30
2
2019-02-21
2019-03-31
30
2
2019-02-10
200
3
2019-02-21
2019-03-31
30
null
null
null
Filter out the rows that do not fall between start_date and end_date:
SELECT*FROM Prices AS p
LEFTJOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date;
product_id
start_date
end_date
price
product_id
purchase_date
units
1
2019-02-17
2019-02-28
5
1
2019-02-25
100
1
2019-03-01
2019-03-22
20
1
2019-03-01
15
2
2019-02-01
2019-02-20
15
2
2019-02-10
200
2
2019-02-21
2019-03-31
30
2
2019-03-22
30
Group the filtered result and then use aggregation functions to calculate average price:
SELECT*,
ROUND(IFNULL(SUM(price * units) /SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFTJOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUPBY1;
product_id
start_date
end_date
price
product_id
purchase_date
units
average_price
1
2019-02-17
2019-02-28
5
1
2019-02-25
100
6.96
2
2019-02-01
2019-02-20
15
2
2019-02-10
200
16.96
Select the desired columns:
SELECT p.product_id,
ROUND(IFNULL(SUM(price * units) /SUM(units), 0), 2) AS average_price
FROM Prices AS p
LEFTJOIN UnitsSold AS u ON p.product_id = u.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUPBY1;
product_id
average_price
1
6.96
2
16.96
The correct answer is:
product_id
average_price
1
6.96
2
16.96
3
0
Summary: The LEFT JOIN operation will return NULL results for rows without without matching records in the UnitsSold table. In the condition p.product_id = u.product_id AND purchase_date BETWEEN start_date AND end_date, the date range check is applied during the join operation, while WHERE purchase_date BETWEEN start_date AND end_date is applied after the join operation. The WHERE clause exlcudes rows with NULL values, which means products with no matching sales records (i.e., where purchase_date doesn’t match) will be excluded from the result. When grouping the result, no rows containing NULL values remain, leading to an incorrect result (missing product_id = 3 with average_price = 0).
SELECT
contest_id,
ROUND(COUNT(user_id) * 100 / (SELECT COUNT(user_id) FROM Users),2) percentage
FROM Register a
GROUP BY contest_id
ORDER BY percentage DESC,contest_id ASC;
SHOW NOTES
$$
\text{percentage} = \frac{\text{number of users attended in the contest}}{\text{total number of users}} \times 100 \%
$$
Compute the total number of users: SELECT COUNT(user_id) FROM Users
Compute the number of users attended in the contest: Group the table based on the column contest_id, then count the number of users in each group.
SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality,
ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS poor_query_percentage
FROM Queries
GROUP BY query_name;
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country;
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
COUNT(IF(state = 'approved', 1, NULL)) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(IF(state = 'approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country;
SHOW NOTES
The DATE_FORMAT function in SQL is used to format a date into a specific string format. For example, DATE_FORMAT('2024-12-19', '%Y-%m') will output '2024-12', extracting the year and month from the date.
The IF function in SQL returns one value if the condition is true and another value if the condition is false. For example, IF(amount > 1000, 1, 0) will return 1 if the amount is greater than 1000, otherwise, it will return 0.
The CASE...WHEN...THEN...ELSE...END statement is similar to the switch statement in other programming languages like Java. It works like the IF function, but with the ability to handle multiple conditions. It checks each condition in sequence and returns the corresponding result for the first true condition.
delivery_id is the unique identifier for each delivery.
The table contains information about food deliveries, where customers place orders on a specific date and specify a preferred delivery date (either on the same day or later).
If the customer_pref_delivery_date is the same as the order_date, the order is considered immediate; otherwise, it is scheduled.
The first order of a customer is defined as the one with the earliest order_date. Each customer has exactly one first order.
Task:
Find the percentage of immediate orders among the first orders of all customers, rounded to two decimal places.
The result should follow the format shown below:
Example Input:
delivery_id
customer_id
order_date
customer_pref_delivery_date
1
1
2019-08-01
2019-08-02
2
2
2019-08-02
2019-08-02
3
1
2019-08-11
2019-08-12
4
3
2019-08-24
2019-08-24
5
3
2019-08-21
2019-08-22
6
2
2019-08-11
2019-08-13
7
4
2019-08-09
2019-08-09
Example Output:
immediate_percentage
50.00
Explanation:
Customer 1 has their first order with delivery_id 1, which is scheduled.
Customer 2 has their first order with delivery_id 2, which is immediate.
Customer 3 has their first order with delivery_id 5, which is scheduled.
Customer 4 has their first order with delivery_id 7, which is immediate.
SELECT
ROUND((SUM(IF(order_date = customer_pref_delivery_date, 1, 0)) / COUNT(*)) * 100, 2) AS immediate_percentage
FROM
(SELECT
customer_id,
MIN(order_date) AS first_order_date
FROM Delivery
GROUP BY customer_id) AS first_orders
INNER JOIN Delivery AS d
ON first_orders.customer_id = d.customer_id
AND first_orders.first_order_date = d.order_date;
The combination of (player_id, event_date) is the primary key of this table, ensuring each player can only log in once per day.
This table tracks the activity of players in games, where each record represents a player’s login and the number of games played (which could be zero) before logging out on a specific day using a particular device.
Task:
Calculate the fraction of players who logged in again on the day after their first login date. The result should be rounded to two decimal places.
In other words, identify the players who logged in on consecutive days starting from their first login date, then divide that count by the total number of players.
Example Input:
player_id
device_id
event_date
games_played
1
2
2016-03-01
5
1
2
2016-03-02
6
2
3
2017-06-25
1
3
1
2016-03-02
0
3
4
2018-07-03
5
Example Output:
fraction
0.33
Explanation:
Player 1 logged in on 2016-03-01 and 2016-03-02, meaning they logged in for at least two consecutive days, starting from their first login.
Player 2 did not log in again on the day after their first login.
Player 3 did not log in on consecutive days, as there was a gap between their logins.
Thus, only player 1 satisfies the condition, and the fraction is calculated as 1/3 = 0.33.
SELECT
ROUND(
SUM(IF(DATEDIFF(event_date, min_event_date) = 1, 1, 0)) / COUNT(DISTINCT player_id), 2) AS fraction
FROM (
SELECT
player_id,
event_date,
MIN(event_date) OVER (PARTITION BY player_id) AS min_event_date
FROM activity
) AS activity_with_min_date;
SHOW NOTES
The DATEDIFF function in MySQL calculates the difference in days between two dates. For example, DATEDIFF('2024-12-30', '2024-12-25') will output 5, indicating there are 5 days between the two dates.
The PARTITION BY clause is used in window functions to divide the result ser into partitions (groups) based on a specified column. It applies window functions, such as RANK(), SUM(), ROW_NUMBER(), and others, to each partition. This works similarly to GROUP BY, but unlike GROUP BY, the PARTITION BY clause allows retaining the row-level data while applying the window functions.
Input Table: employees
department_id
employee_id
salary
101
1
5000
101
2
4000
101
3
6000
102
4
4500
102
5
5500
102
6
4800
SELECT department_id,
employee_id,
salary,
RANK() OVER (PARTITION BY department_id ORDERBY salary DESC) AS salary_rank
FROM employees;
The activity_type column is an ENUM with values: ‘open_session’, ’end_session’, ‘scroll_down’, ‘send_message’.
This table logs user activities for a social media website, where each session is linked to exactly one user.
The table may contain duplicate rows.
Task:
You need to find the count of unique active users per day for the 30-day period ending on 2019-07-27, inclusive. A user is considered active on a particular day if they perform at least one activity on that day.
Example:
Input: Activity table
user_id
session_id
activity_date
activity_type
1
1
2019-07-20
open_session
1
1
2019-07-20
scroll_down
1
1
2019-07-20
end_session
2
4
2019-07-20
open_session
2
4
2019-07-21
send_message
2
4
2019-07-21
end_session
3
2
2019-07-21
open_session
3
2
2019-07-21
send_message
3
2
2019-07-21
end_session
4
3
2019-06-25
open_session
4
3
2019-06-25
end_session
Output:
day
active_users
2019-07-20
2
2019-07-21
2
Explanation:
Only the days with active users (those who performed at least one activity) are included in the output.
For 2019-07-20, users 1 and 2 were active, and for 2019-07-21, users 2 and 3 were active.
SELECT activity_date AS day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND DATEDIFF('2019-07-27', activity_date) < 30
GROUP BY activity_date
SELECT activity_date AS day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date <= '2019-07-27' AND activity_date> DATE_SUB('2019-07-27',INTERVAL 30 day)
GROUP BY activity_date
SHOW NOTES
In MySQL, both DATEDIFF and DATE_SUB are used for date calculations. DATEDIFF calculates the difference between two dates, returning the result in days. On the other hand, DATE_SUB is used to subtract a specified time interval (such as days, months, or years) from a given date.
SELECT
product_id,
year AS first_year,
quantity,
price
FROM Sales
WHERE
(product_id, year) IN (
SELECT
product_id,
MIN(year) AS year
FROM Sales
GROUP BY product_id
);
SELECT product_id, first_year, quantity, price
FROM (
SELECT
product_id,
year AS first_year,
quantity,
price,
RANK() OVER (PARTITION BY product_id ORDER BY year) AS row_num
FROM Sales
) subquery
WHERE row_num = 1;
SELECT s.product_id,
s.year AS first_year,
s.quantity,
s.price
FROM Sales s
JOIN (
SELECT product_id, MIN(year) AS first_year
FROM Sales
GROUP BY product_id
) first_sale ON s.product_id = first_sale.product_id
AND s.year = first_sale.first_year;
SELECT MAX(num) AS num
FROM
(
SELECT num
FROM MyNumbers
GROUP BY 1
HAVING COUNT(1) = 1
) AS t;
SELECT
IF(COUNT(num) = 1, num, null) AS num
FROM MyNumbers
GROUP BY num
ORDER BY 1 DESC
LIMIT 1;
SELECT
e1.employee_id,
e1.name,
COUNT(e1.employee_id) AS reports_count,
ROUND(AVG(e2.age)) AS average_age
FROM Employees e1
INNER JOIN Employees e2
ON e1.employee_id = e2.reports_to
GROUP BY e1.employee_id
ORDER BY e1.employee_id;
The combination of (employee_id, department_id) is the primary key for this table, meaning each employee can belong to multiple departments, but each (employee_id, department_id) pair is unique.
employee_id: The ID of the employee.
department_id: The ID of the department to which the employee belongs.
primary_flag: A flag indicating whether the department is the primary department for the employee. It can be one of the following:
'Y': The department is the primary department.
'N': The department is not the primary department.
Problem Description
Employees may belong to multiple departments, and when they do, they must designate one department as their primary. If an employee belongs to only one department, the primary_flag for that department will be 'N'.
You are tasked with reporting the primary department for each employee. If an employee has only one department, report that department as their primary.
Input:
Employee table:
employee_id
department_id
primary_flag
1
1
N
2
1
Y
2
2
N
3
3
N
4
2
N
4
3
Y
4
4
N
Output:
employee_id
department_id
1
1
2
1
3
3
4
3
Explanation:
For employee 1, their only department is department 1, so it is reported as their primary department.
Employee 2 belongs to two departments (1 and 2). The primary_flag for department 1 is 'Y', so department 1 is their primary department.
Employee 3 only belongs to department 3, so it is reported as their primary department.
Employee 4 belongs to three departments (2, 3, and 4). The primary_flag for department 3 is 'Y', so department 3 is reported as their primary department.
SELECT employee_id, department_id
FROM Employee
WHERE primary_flag = 'Y'
UNION
SELECT employee_id, department_id
FROM Employee
GROUP BY employee_id
HAVING COUNT(1) = 1;
SHOW NOTES
The UNION operator is used to combine the results of two or more SELECT queries into a single result set. By default, UNION removes duplicate rows, ensuring that the final result contains only distinct records. In contrast, UNION ALL includes all rows, even if they are duplicates. It’s important to note that each SELECT statement involved in a UNION operation must contain the same number of columns, and the corresponding columns must have compatable data types.
Primary Key: (x, y, z)
Each row in this table represents the lengths of three line segments.
Task:
Determine whether the three line segments (x, y, z) from each row can form a triangle.
Return the result table in any order, including a column that specifies if the segments form a triangle.
Triangle Formation Rule:
Three segments can form a triangle if and only if the following conditions are met:
SELECT
*,
IF(x + y > z AND x + z > y AND y + z > x, 'Yes', 'No') AS triangle
FROM
Triangle;
SELECT
*,
CASE
WHEN (x + y > z) AND (x + z > y) AND (y + z > x) THEN 'Yes'
ELSE 'No'
END AS triangle
FROM
Triangle;
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1
JOIN Logs l2 ON l1.id = l2.id - 1
JOIN Logs l3 ON l2.id = l3.id - 1
WHERE l1.num = l2.num AND l2.num = l3.num;
SELECT DISTINCT l2.num AS ConsecutiveNums
FROM Logs AS l1
JOIN Logs AS l2 ON l1.id = l2.id - 1 AND l1.num = l2.num
JOIN Logs AS l3 ON l2.id = l3.id - 1 AND l2.num = l3.num;
WITH
T AS (SELECT DISTINCT product_id FROM Products),
P AS (
SELECT product_id, new_price AS price
FROM Products
WHERE
(product_id, change_date) IN (
SELECT product_id, MAX(change_date) AS change_date
FROM Products
WHERE change_date <= '2019-08-16'
GROUP BY 1
)
)
SELECT product_id, IFNULL(price, 10) AS price
FROM T
LEFT JOIN P USING (product_id);
WITH
P AS (
SELECT p1.product_id, new_price, change_date
FROM
(
SELECT DISTINCT product_id
FROM Products
) AS p1
LEFT JOIN Products AS p2
ON p1.product_id = p2.product_id AND p2.change_date <= '2019-08-16'
),
T AS (
SELECT
*,
RANK() OVER (
PARTITION BY product_id
ORDER BY change_date DESC
) AS rk
FROM P
)
SELECT product_id, IFNULL(new_price, 10) AS price
FROM T
WHERE rk = 1;
This table holds information about people waiting to board a bus.
The person_id and turn columns contain values from 1 to n, where n is the number of rows in the table.
The turn column determines the order in which people will board the bus. A turn value of 1 indicates the first person to board, and turn = n indicates the last person to board.
The weight column represents the personโs weight in kilograms.
The bus has a weight limit of 1000 kilograms, so some people may not be able to board if the total weight exceeds this limit. The goal is to identify the name of the last person who can board the bus without exceeding the weight limit.
Only one person can board the bus at any given time, based on their turn.
Input:
Queue table:
person_id
person_name
weight
turn
5
Alice
250
1
4
Bob
175
5
3
Alex
350
2
6
John Cena
400
3
1
Winston
500
6
2
Marie
200
4
Output:
person_name
John Cena
Explanation:
The following table shows the order in which people board the bus, ordered by the turn column for simplicity:
Turn
person_id
person_name
weight
Total Weight
1
5
Alice
250
250
2
3
Alex
350
600
3
6
John Cena
400
1000
4
2
Marie
200
1200
5
4
Bob
175
___
6
1
Winston
500
___
John Cena is the last person who can board the bus without exceeding the weight limit of 1000 kilograms.
WITH
T AS (
SELECT
person_name,
SUM(weight) OVER (ORDER BY turn) AS acumulated_weight
FROM Queue
)
SELECT person_name
FROM T
WHERE acumulated_weight <= 1000
ORDER BY acumulated_weight DESC
LIMIT 1;
WITH
S AS (
SELECT 'Low Salary' AS category
UNION
SELECT 'Average Salary'
UNION
SELECT 'High Salary'
),
T AS (
SELECT
CASE
WHEN income < 20000 THEN 'Low Salary'
WHEN income > 50000 THEN 'High Salary'
ELSE 'Average Salary'
END AS category,
COUNT(1) AS accounts_count
FROM Accounts
GROUP BY category
)
SELECT category, IFNULL(accounts_count, 0) AS accounts_count
FROM S
LEFT JOIN T USING (category);
Each row represents a student’s name and their seat ID. The id starts from 1 and increases consecutively.
Problem Description:
Write a query to swap the seat IDs of every two consecutive students in the Seat table. If the number of students is odd, the last student’s seat ID should remain unchanged.
The result should be ordered by the id column in ascending order.
Input:
Seat table:
id
student
1
Abbot
2
Doris
3
Emerson
4
Green
5
Jeames
Output:
id
student
1
Doris
2
Abbot
3
Green
4
Emerson
5
Jeames
Explanation:
The students Abbot and Doris swap places, as well as Emerson and Green.
Since there are an odd number of students, Jeames’s seat remains unchanged.
SELECT
CASE
WHEN id % 2 = 1 AND id < (SELECT MAX(id) FROM Seat) THEN id + 1
WHEN id % 2 = 0 THEN id - 1
ELSE id
END AS id,
student
FROM Seat
ORDER BY id;
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.
Table: Users
Column Name
Type
user_id
int
name
varchar
user_id is the primary key (column with unique values) for this table.
The column ’name’ has unique values.
Table: MovieRating
Column Name
Type
movie_id
int
user_id
int
rating
int
created_at
date
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user’s review date.
Problem:
Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
Result format:
The result should contain two columns:
results โ containing the user name with the greatest number of movie ratings.
results โ containing the movie name with the highest average rating in February 2020.
Input:
Movies table:
movie_id
title
1
Avengers
2
Frozen 2
3
Joker
Users table:
user_id
name
1
Daniel
2
Monica
3
Maria
4
James
MovieRating table:
movie_id
user_id
rating
created_at
1
1
3
2020-01-12
1
2
4
2020-02-11
1
3
2
2020-02-12
1
4
1
2020-01-01
2
1
5
2020-02-17
2
2
2
2020-02-01
2
3
2
2020-03-01
3
1
3
2020-02-22
3
2
4
2020-02-25
Output:
results
Daniel
Frozen 2
Explanation:
Daniel and Monica have rated 3 movies (“Avengers”, “Frozen 2” and “Joker”), but Daniel is lexicographically smaller.
“Frozen 2” and “Joker” both have an average rating of 3.5 in February, but “Frozen 2” is lexicographically smaller.
(
SELECT name AS results
FROM
Users
JOIN MovieRating USING (user_id)
GROUP BY user_id
ORDER BY COUNT(1) DESC, name
LIMIT 1
)
UNION ALL
(
SELECT title
FROM
MovieRating
JOIN Movies USING (movie_id)
WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
GROUP BY movie_id
ORDER BY AVG(rating) DESC, title
LIMIT 1
);
In MySQL, an index is a database structure that improves the speed of data retrival operations on a table. Indexes are created on one or more columns of a table and allow the database to quickly locate data without scanning the entire table. Indexes are typically implemented using data structures, with the B+ tree being the most common.
There are four main types of indexes in MySQL: primary index, unique index, ordinary index, and full-text index.
A primary index is a unique index that identifies each row in a table. It is automatically created when a primary key is defined.
A unique index ensures that all values in the indexed columns(s) are unique.
An ordinary index improves query performance but does not enforce uniqueness.
A full-text index is used for full-text searches on text-based columns.
SHOW CODE
-- Primary Index
CREATE TABLE employees (
id INT PRIMARY KEY, -- Primary index
name VARCHAR(100)
);
-- Unique Index
CREATE UNIQUE INDEX idx_email ON employees(email);
-- Ordinary Index
CREATE INDEX idx_name ON employees(name);
-- Full Text Index
CREATE FULLTEXT INDEX idx_description ON products(description);
SELECT * FROM products
WHERE MATCH(description) AGAINST('database');
In MySQL, indexes can be categorized into two types: clustered index and secondary index.
A clustered index determines the physical order of data rows in a table. The table data is stored in the order of the clustered index.
A secondary index does not affect the physical order of data rows in a table. Instead, it stores a separate structure that points to the actual data rows.
Clustered Index vs. Secondary Index
SHOW CODE
CREATE TABLE employees (
id INT PRIMARY KEY, -- Clustered index
name VARCHAR(100),
salary DECIMAL(10, 2),
INDEX idx_name (name) -- Secondary index
);
A stored procedure is a precompiled SQL code block that is stored in a database and can be executed repeatedly. It is particularly useful in scenarios involving complex SQL logic, as it helps streamline operations, improve performance, and enhance code reusability.
SHOW CODE
-- Create a Stored Procedure:
DELIMITER //
CREATE PROCEDURE get_all_users()
BEGIN
SELECT * FROM users;
END //
DELIMITER ;
-- Call a Procedure**:
CALL get_all_users();
-- List all procedures
show procedure status where db = 'mydatabase';
-- View the specific stored procedure
show create procedure get_all_users;
-- Delete a Stored Procedure:
drop procedure get_all_users;
Variables in MySQL can be categorized into four types based on their scope: global, session, user-defined, and local.
Global variables and session variables are used to control server behavior.
Global variables affect all sessions connected to the server.
Session variables apply only to the current session.
User-defined variables are defined within a session and persist until the session ends.
Local variables are used within stored procedures and have a scope limited to the procedure in which they are declared.
SHOW CODE
# Global Variables
SET GLOBAL max_connections = 200;
-- Session Variables
SET SESSION sql_mode = 'STRICT_TRANS_TABLES';
-- User-Defined Variables
SET @my_var = 100;
SELECT @my_var;
-- Local Variables
DELIMITER //
CREATE PROCEDURE test_proc()
BEGIN
DECLARE my_local_var INT DEFAULT 10;
SELECT my_local_var;
END //
DELIMITER ;
MySQL supports various control flow statements, including IF, CASE...WHEN, WHILE, REPEAT, and LOOP.
SHOW CODE
# IF statement
DELIMITER //
CREATE PROCEDURE check_salary(IN emp_id INT)
BEGIN
DECLARE emp_salary INT;
SELECT salary INTO emp_salary FROM employees WHERE id = emp_id;
IF emp_salary > 5000 THEN
SELECT 'High Salary';
ELSEIF emp_salary BETWEEN 3000 AND 5000 THEN
SELECT 'Medium Salary';
ELSE
SELECT 'Low Salary';
END IF;
END //
DELIMITER ;
-- CASE WHEN statement
DELIMITER //
CREATE PROCEDURE check_grade(IN marks INT)
BEGIN
DECLARE grade CHAR(1);
CASE
WHEN marks >= 90 THEN SET grade = 'A';
WHEN marks >= 80 THEN SET grade = 'B';
WHEN marks >= 70 THEN SET grade = 'C';
ELSE SET grade = 'F';
END CASE;
SELECT grade;
END //
DELIMITER ;
-- WHILE
DELIMITER //
CREATE PROCEDURE count_down()
BEGIN
DECLARE x INT DEFAULT 5;
WHILE x > 0 DO
SELECT x;
SET x = x - 1;
END WHILE;
END //
DELIMITER ;
-- REPEAT (similar to do...while in Java)
DELIMITER //
CREATE PROCEDURE repeat_example()
BEGIN
DECLARE x INT DEFAULT 1;
REPEAT
SELECT x;
SET x = x + 1;
UNTIL x > 5
END REPEAT;
END //
DELIMITER ;
-- LOOP (Use an explicit LEAVE to exit)
DELIMITER //
CREATE PROCEDURE loop_example()
BEGIN
DECLARE x INT DEFAULT 1;
my_loop: LOOP
SELECT x;
SET x = x + 1;
IF x > 5 THEN
LEAVE my_loop;
END IF;
END LOOP;
END //
DELIMITER ;
In MySQL, a cursor is a database object that enables row-by-row processing of a result set. On the other hand, a handler is used to manage exceptions or errors that may occur during execution. Cursors and handlers are commonly used in stored procedures when dealing with result sets that contain multiple rows and require row-by-row processing.
SHOW CODE
DELIMITER //
CREATE PROCEDURE fetch_salaries()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE emp_name VARCHAR(100);
DECLARE emp_salary INT;
DECLARE emp_cursor CURSOR FOR SELECT name, salary FROM employees;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN emp_cursor;
read_loop: LOOP
FETCH emp_cursor INTO emp_name, emp_salary;
IF done THEN
LEAVE read_loop;
END IF;
SELECT emp_name, emp_salary;
END LOOP;
CLOSE emp_cursor;
END //
DELIMITER ;
In MySQL, a trigger is a data object that automatically executes a specified set of sql statements before or after certain events occurred, such as INSERT, UPDATE, or DELETE. It is commonly used in audit logging. MySQL only supports row-level trigger, meaning a trigger is activated FOR EACH ROW affected by INSERT, UPDATE, or DELETE operation.
SHOW CODE
-- Create a trigger
CREATE TRIGGER before_employee_update
BEFORE UPDATE ON employees
FOR EACH ROW
BEGIN
INSERT INTO audit_log (employee_id, action, change_date)
VALUES (OLD.id, 'UPDATE', NOW());
END;
-- View triggers
SHOW TRIGGERS;
SHOW TRIGGERS FROM database_name;
-- Delete a trigger
DROP TRIGGER IF EXISTS before_employee_update;
In MySQL, locking is a mechanism used to control database access, ensuring data integrity and consistency when multiple sessions interact with the databases. MySQL supports three main types of locks: global-level locks, table-level locks, and low-level locks.
The global-level lock is a coarse-grained lock that blocks the entire database. It is commonly used during operations like data backups. Due to its coarse granularity, any updates to the database are blocked while the lock is active. Additionally, when the global lock is in place, secondary databases cannot read the binary log (binlog) from the primary database, leading to replication delays between the primary and secondary databases.
For InnoDB tables, it is generally recommended to use the --single-transaction option during data backups. This approach ensures data consistency without acquiring a global read lock, allowing updates to continue uninterrupted.
SHOW CODE
-- From mysql client
FLUSH TABLES WITH READ LOCK;
-- From bash
mysqldump -u username -p --single-transaction database_name > backup.sql
UNLOCK TABLES;
The table-level lock is a mechanism that locks an entire table to control database access when multiple sessions interact with the database. There are three main types of table-level locks: table locks, metadata locks, and intention locks.
Table locks include read locks and write locks:
When a session acquires a read lock, other sessions can also read data but cannot modify it.
When a session acquire a write lock, other sessions cannot read or write data from the locked table.
Metadata lock is a type of lock that protects the structure of a database object when it is being accessed or modified by a session. It it automatically managed by MySQL. There are two main types of metadata locks: shared metadata locks and exclusive metadata locks.
Shared metadata locks include SHARED_READ and SHARED_WRITE locks. These are acquired by queries, modification, or transactions.
A SHARED_READ lock allows other sessions to read the locked data.
A SHARED_WRITE lock allows other sessions to read but not modify the locked data.
The exclusive metadata lock is acquired by operations that modify the table’s structure, such as ALTER TABLE. It prevents other sessions from accessing the table until the operation completes.
An intention lock is a type of lock used by the InnoDB storage engine to indicate a transaction’s intention to acquire row-level locks at a finer granularity. It works in conjunction with row-level locks, helping to avoid confilicts between transactions that might lock the same table at different levels (e.g., table-level vs. row-level locks). There are two main types of intention locks: Intention Shared Lock (IS) and Intention Exclusive Lock (IX).
The Intention Shared Lock (IS) indicates that a transaction intends to place shared wor-level locks on some rows in the table. It allows other transactions to acquire IS locks or table-level shared locks but blocks table-level exclusive locks.
The Intention Exclusive Lock (IX) indicates that a transaction intends to place exlcusive row-level locks on some rows in the table. It allows other transactions to acquire IS locks but blocks table-level shared locks and table-level exclusive locks.
--- META DATA LOCK ---
-- Session 1 starts a transaction and reads from a table
START TRANSACTION;
SELECT * FROM employees;
-- Session 2 attempts to alter the table
ALTER TABLE employees ADD COLUMN salary DECIMAL(10, 2);
-- Session 2 will block until Session 1 releases its shared metadata lock
-- Session 1 starts an `ALTER TABLE` operation
ALTER TABLE employees ADD COLUMN department VARCHAR(50);
-- session 2 attempts to query the table
SELECT * FROM employees;
-- Session 2 will block until Session 1 comoletes the `ALTER TABLE` operation
--- INTENSION LOCK ---
-- Transaction A
-- Transaction A acquires an Intention Shared Lock (IS) on the employees table.
-- It then acquires a Shared Row-Level Lock (S) on the row with id = 1.
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;
-- Transaction B
-- Transaction B acquires an Intention Exclusive Lock (IX) on the employees table.
-- It then acquires an Exclusive Row-Level Lock (X) on the row with id = 2.
START TRANSACTION;
UPDATE employees SET salary = 65000 WHERE id = 2;
-- Transaction C
-- Transaction C attempts to acquire an Exclusive Table-Level Lock (X) on the employees table.
-- It will block until Transaction A and Transaction B release their locks.
ALTER TABLE employees ADD COLUMN department VARCHAR(50);
In MySQL, a row-level lock is a fine-grained locking mechanism that locks individual rows instead of the entire table. There are two main types of row-level locks: Shared Lock and Exclusive Lock.
A Shared Lock allows other transactions to read the same row but prevents them from modifying it.
An **Exclusive Lock prevents other transactions from reading or modifying the same row until the lock is released.
Row-level locks are supported by the InnoDB storage engine to ensure data consistency and integrity. They are automatically acquired and released based on the transaction’s operations.
SHOW CODE
-- Transaction A reads a row with a shared lock
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;
-- Transaction B can also reads the same row
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE;
-- Transaction C cannot modify the same row until the lock is released
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1; -- Blocks
-- Transaction A updates a row with an exclusive lock
START TRANSACTION;
UPDATE employees SET salary = 55000 WHERE id = 1;
-- Transaction B cannot read or modify the same row until the lock is released
START TRANSACTION;
SELECT * FROM employees WHERE id = 1 LOCK IN SHARE MODE; -- Blocks
InnoDB provides additional locking mechanisms, such as gap locks and next-key locks, to ensure data consistency and integrity and prevent issues like phantom reads. By default, InnoDB operates at the REPEATABLE READ isolation level, where it uses next-key locks for searches and index scans to avoid phantom reads.
A gap lock locks the intervals between rows in an index, preventing new rows from being inserted into those gaps. It does not lock the rows themselves but rather the gaps between them. The primary purpose of a gap lock is to prevent other transactions from inseting data into the blocked gap. Gap locks are coexistent, meaning they do not block other transactions from locking the same gap.
A next-key lock is a combination of a row lock and a gap lock. It locks both the row and the gap before it.
SHOW CODE
| id | name | salary |
|-----|-----------|---------|
| 1 | John Doe | 50000 |
| 3 | Jane Smith| 60000 |
| 5 | Alice Lee | 55000 |
SELECT * FROM employees WHERE id BETWEEN 2 AND 4 FOR UPDATE;
-- The gap lock will be placed on the interval (1, 3) and (3, 5)
SELECT * FROM employees WHERE id = 3 FOR UPDATE;
-- The next-key lock will be placed on the row with id = 3 and the gap (1, 3)