【Leetcode 30天Pandas挑战】学习记录上

原创

已于 2023-08-30 17:04:37 修改 · 1k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#leetcode #pandas #学习

于 2023-08-02 16:46:05 首次发布

该博客围绕LeetCode上的SQL题目展开，涵盖条件筛选、字符串函数、数据操作等类型。针对每道题给出题目大意，并分别提供pandas和MySQL的解题思路与实现方法，还补充了SQL查询中子句执行顺序、常见排序方式等知识。

题目列表：

条件筛选：
字符串函数：
数据操作:

条件筛选：

595. Big Countries

原题链接：595. Big Countries
考察：行筛选 or

Table: World

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |
+-------------+---------+

In SQL, name is the primary key column for this table.
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.

A country is big if:

it has an area of at least three million (i.e., 3000000 km2), or
it has a population of at least twenty-five million (i.e., 25000000).
Find the name, population, and area of the big countries.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:

World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+

Output:

+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+

题目大意：
找到所有的大国家，一个大国家需要满足给定的条件A或者条件B

pandas 思路1：
用两个条件进行行筛选，注意是或

pandas 实现1：

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    res = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
    return res[['name', 'population', 'area']]

pandas 思路2：
也是两个条件的筛选，但是用 loc[]

pandas 实现2：

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world.loc[(world['area'] >= 3000000) | (world['population'] >= 25000000), ['name', 'population', 'area']]

MySQL 思路：
用 where 筛选，两个条件是或的关系，用 OR

MySQL 实现：

SELECT
	name,
	population,
	area 
FROM
	World 
WHERE
	area >= 3000000 
	OR population >= 25000000

1757. Recyclable and Low Fat Products

原题链接：1757. Recyclable and Low Fat Products
考察：行筛选 and

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| low_fats    | enum    |
| recyclable  | enum    |
+-------------+---------+
In SQL, product_id is the primary key for this table.
low_fats is an ENUM of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.
recyclable is an ENUM of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.

Find the ids of products that are both low fat and recyclable.

Return the result table in any order.

The result format is in the following example.

Example 1:

input:

Products table:
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+

Output:

+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+

Explanation: Only products 1 and 3 are both low fat and recyclable.

题目大意：
返回既满足条件A也满足条件B的产品编号

pandas 思路：
两个条件进行筛选，和上一题就是 and 和 or 的区别，也可以用 loc

pandas 写法：

import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    res = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
    return res[['product_id']]

MySQL 思路：
用 where 筛选，两个条件是且的关系，用 AND

MySQL 写法：

SELECT
	product_id 
FROM
	Products 
WHERE
	low_fats = 'Y' 
	AND recyclable = 'Y'

183. Customers Who Never Order

原题链接：183. Customers Who Never Order
考察：合并、选取非空、排除条件

Table: Customers

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.

Table: Orders

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
customerId is a foreign key (reference columns) of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.

Write a solution to find all customers who never order anything.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Customers table:
+----+-------+
| id | name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+

Output:

+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

题目大意：
给了两个表，一个是顾客表，一个是订单表，要求返回没有点过单的顾客名称

pandas 思路1：
合并，没有点过单的在 customerId 里会为空，行筛选即可

pandas 实现1：

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    tmp = pd.merge(customers, orders, how='left', left_on='id', right_on='customerId')
    tmp2 = tmp[tmp['customerId'].isna() == True]
    tmp2.rename(columns={
   
   'name':'Customers'}, inplace=True)
    
    return tmp2[['Customers']]

pandas 思路2：
在customers表中选取没有在orders的id中出现过的

pandas 实现2：

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # 选择id没有在orders中出现过的
    df = customers[~customers['id'].isin(orders['customerId'])]

    # 重命名
    df = df[['name']].rename(columns={
   
   'name': 'Customers'})
    return df

MySQL 思路1：
左连接两个表，where筛选 customerId 为空的

MySQL 实现1：

SELECT
	name AS Customers 
FROM
	Customers a
	LEFT JOIN Orders b ON a.id = b.customerId 
WHERE
	b.customerId IS NULL

MySQL 思路2：
子查询orders中的id，然后用 not in

MySQL 实现2：

SELECT
	customers.name AS Customers 
FROM
	customers 
WHERE
	customers.id NOT IN ( SELECT customerid FROM orders )

1148. Article Views I

原题链接：1148. Article Views I
考察：去重、排序

Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some