In this topic, I'll talk about the procedure of execution for a SQL SELECT statement.
Basically there're 6 steps for a query:
1. Get Data - FROM, JOIN
2. Filter Row - WHERE
3. Group - GROUP BY
4. Group Filter - HAVING
5. Return Selected Result - SELECT
6. Order or Page - ORDER BY/LIMIT/OFFSET
0. Prepare the data.
Here I will use two tables to demonstrate the process.
CREATE DATABASE dbtest;
USE dbtest;
CREATE TABLE employees(employee_name VARCHAR(30) NOT NULL, department_id INT);
CREATE TABLE departments(department_id INT NOT NULL, department_name VARCHAR(30));
INSERT INTO departments VALUES(1, 'HR');
INSERT INTO departments VALUES(2, 'R&D');
INSERT INTO departments VALUES(3, 'Marketing');
INSERT INTO departments VALUES(4, 'Sales');
INSERT INTO employees VALUES('Andy', 1);
INSERT INTO employees VALUES('Bob', 2);
INSERT INTO employees VALUES('Candy',3);
INSERT INTO employees VALUES('David',1);
INSERT INTO employees VALUES('Ella',2);
INSERT INTO employees VALUES('Fann',1);
INSERT INTO employees VALUES('Gary',4);
INSERT INTO employees VALUES('Henry',4);
INSERT INTO employees VALUES('Ivy',4);
INSERT INTO employees VALUES('Jack',4);
We use the below statement to explain the process.
SELECT d.department_name AS "Department Name"
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
WHERE d.department_name != 'Marketing'
GROUP BY d.department_name
HAVING COUNT(*) >= 3
ORDER BY d.department_name DESC
LIMIT 1;
1. Get data
The first step is to get the data by FROM and JOIN. It will return the cardisian product. We have 10 * 4 = 40 rows in this result. It's equal to:
SELECT *
FROM employees e
JOIN departments d;
employee_name | department_id | department_id | department_name |
Andy | 1 | 4 | Sales |
Andy | 1 | 3 | Marketing |
Andy | 1 | 2 | R&D |
Andy | 1 | 1 | HR |
Bob | 2 | 4 | Sales |
Bob | 2 | 3 | Marketing |
Bob | 2 | 2 | R&D |
Bob | 2 | 1 | HR |
Candy | 3 | 4 | Sales |
Candy | 3 | 3 | Marketing |
Candy | 3 | 2 | R&D |
Candy | 3 | 1 | HR |
David | 1 | 4 | Sales |
David | 1 | 3 | Marketing |
David | 1 | 2 | R&D |
David | 1 | 1 | HR |
Ella | 2 | 4 | Sales |
Ella | 2 | 3 | Marketing |
Ella | 2 | 2 | R&D |
Ella | 2 | 1 | HR |
Fann | 1 | 4 | Sales |
Fann | 1 | 3 | Marketing |
Fann | 1 | 2 | R&D |
Fann | 1 | 1 | HR |
Gary | 4 | 4 | Sales |
Gary | 4 | 3 | Marketing |
Gary | 4 | 2 | R&D |
Gary | 4 | 1 | HR |
Henry | 4 | 4 | Sales |
Henry | 4 | 3 | Marketing |
Henry | 4 | 2 | R&D |
Henry | 4 | 1 | HR |
Ivy | 4 | 4 | Sales |
Ivy | 4 | 3 | Marketing |
Ivy | 4 | 2 | R&D |
Ivy | 4 | 1 | HR |
Jack | 4 | 4 | Sales |
Jack | 4 | 3 | Marketing |
Jack | 4 | 2 | R&D |
Jack | 4 | 1 | HR |
Based on above cardisian product, the ON condition will be applied to it. It will return 10 rows.
It's like the statement.
SELECT *
FROM employees e
JOIN departments d
ON e.department_id = d.department_id;
employee_name | department_id | department_id | department_name |
Andy | 1 | 1 | HR |
Bob | 2 | 2 | R&D |
Candy | 3 | 3 | Marketing |
David | 1 | 1 | HR |
Ella | 2 | 2 | R&D |
Fann | 1 | 1 | HR |
Gary | 4 | 4 | Sales |
Henry | 4 | 4 | Sales |
Ivy | 4 | 4 | Sales |
Jack | 4 | 4 | Sales |
2. Filter Rows by WHERE
Then WHERE clause is applied to get the qualified rows. It will return 9 rows.
SELECT *
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
WHERE d.department_name != 'Marketing';
employee_name | department_id | department_id | department_name |
Andy | 1 | 1 | HR |
Bob | 2 | 2 | R&D |
David | 1 | 1 | HR |
Ella | 2 | 2 | R&D |
Fann | 1 | 1 | HR |
Gary | 4 | 4 | Sales |
Henry | 4 | 4 | Sales |
Ivy | 4 | 4 | Sales |
Jack | 4 | 4 | Sales |
3. Group the data with GROUP BY
GROUP BY d.department_name;
Based on the department name, it will group the data. It also has 3 rows like the way:
employee_name | department_id | department_id | department_name |
Andy | 1 | 1 | HR |
Fann | 1 | 1 | |
David | 1 | 1 | |
Bob | 2 | 2 | R&D |
Ella | 2 | 2 | |
Gary | 4 | 4 | Sales |
Henry | 4 | 4 | |
Ivy | 4 | 4 | |
Jack | 4 | 4 |
4. Group filter by HAVING
HAVING COUNT(*) >= 3
Only HR and Sales which has at least 3 employees are left.
employee_name | department_id | department_id | department_name |
Andy | 1 | 1 | HR |
Fann | 1 | 1 | |
David | 1 | 1 | |
Gary | 4 | 4 | Sales |
Henry | 4 | 4 | |
Ivy | 4 | 4 | |
Jack | 4 | 4 |
5. Return selected result
Then execute SELECT clause:
SELECT d.department_name AS "Department Name"
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
WHERE d.department_name != 'Marketing'
GROUP BY d.department_name
HAVING COUNT(*) >= 3;
Department Name |
HR |
Sales |
6. Order or Page
ORDER BY is applied to the result in Step 5:
SELECT d.department_name AS "Department Name"
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
WHERE d.department_name != 'Marketing'
GROUP BY d.department_name
HAVING COUNT(*) >= 3
ORDER BY d.department_name DESC;
Department Name |
Sales |
HR |
Finally LIMIT is applied. Only the first record is returned after LIMIT 1:
SELECT d.department_name AS "Department Name"
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
WHERE d.department_name != 'Marketing'
GROUP BY d.department_name
HAVING COUNT(*) >= 3
ORDER BY d.department_name DESC
LIMIT 1;
Department Name |
Sales |