Categorizing Numerical Values in Data Sets | MySQL Learning | Leetcode Problem

Intro

        I have ran into a leetcode problem (1907. Count Salary Categories) on data structure which requires me to categorize a numerical columns into given categories. At first glance, it seems easy enough to be solven by a simple CASE Function. However, as i dug deeper into the question, I realize the tricky part is that it requires to report count of each category even if there is no matching sample in the category. That is, the result table must contain all categories. If there are no counts in a category, then report 0.

        I realized that such problem can often occur in our daily working/study, so I have decided to write a blog about it and mark my ideas and approaches in handling such problems.

Problem

        Table:

+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id  | int  |
| income      | int  |
+-------------+------+
account_id is the primary key for this table.
Each row contains information about the monthly income for one bank account.

        Request:

Write an SQL query to report the number of bank accounts of each salary category. The salary categories are:

  • "Low Salary": All the salaries strictly less than $20000.
  • "Average Salary": All the salaries in the inclusive range [$20000, $50000].
  • "High Salary": All the salaries strictly greater than $50000.

The result table must contain all three categories. If there are no accounts in a category, then report 0.

Return the result table in any order.

        Example Outputs: 

Output: 
+----------------+----------------+
| category       | accounts_count |
+----------------+----------------+
| Low Salary     | 1              |
| Average Salary | 0              |
| High Salary    | 3              |
+----------------+----------------+
Explanation: 
Low Salary: Account 2.
Average Salary: No accounts.
High Salary: Accounts 3, 6, and 8.

My Intuition

A simple CASE statement will not do the job given that there exists category which has no matching sample. We will need to CASE WHEN first and then transpose the output for final result.

My Solution:

  •  We would need to categorize the income column as requested, and I decided to build a column for each categroy and using 1 and 0 to label if this sample belong to any of these categories (similar to one-hot encoding).
WITH count AS(
SELECT
*,
CASE WHEN income<20000 THEN 1 ELSE 0 END as`Low Salary`,
CASE WHEN income>=20000 and income<=50000 THEN 1 ELSE 0 END as`Average Salary`,
CASE WHEN income>50000 THEN 1 ELSE 0 END as`High Salary`
FROM Accounts)

Count Table will be something like this:

account_idincomeLow SalaryAverage SalaryHigh Salary
3        108939001
212747100
887709001
69179600 
  • Then, Basically, I decided to manually assign all the categories we have to the output table by the syntax:  SELECT 'col_value' as col_name FROM table .
  • Lastly, we can union all categories and count from the temp table we prepared and sum(counts) for each category.
SELECT
category, sum(count) as accounts_count
FROM(
SELECT 'Low Salary' Category , `Low Salary` as count from count 
union all
SELECT 'Average Salary' Category , `Average Salary` as count from count
union all
SELECT 'High Salary' Category , `High Salary` as count from count
) temp
GROUP BY category

Final Code:

WITH count AS(
SELECT
*,
CASE WHEN income<20000 THEN 1 ELSE 0 END as`Low Salary`,
CASE WHEN income>=20000 and income<=50000 THEN 1 ELSE 0 END as`Average Salary`,
CASE WHEN income>50000 THEN 1 ELSE 0 END as`High Salary`
FROM Accounts)

SELECT
category, sum(count) as accounts_count
FROM(
SELECT 'Low Salary' Category , `Low Salary` as count from count 
union all
SELECT 'Average Salary' Category , `Average Salary` as count from count
union all
SELECT 'High Salary' Category , `High Salary` as count from count
) temp
GROUP BY category

Reflection:

  •  Procedure of constructing a categorical table
    1. Label each sample with the matching category
    2. Apply 'UNION ALL' to create a table containing all the categories we assign and meanwhile, matching the labeled example
    3. Summarize or directly output the category table.

Reference:

- LeetCode 

MySQL CASE Function

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值