找到活跃的商家
Write an SQL query to find all active businesses.
An active business is a business that has more than one event type with occurences greater than the average occurences of that event type among all businesses.
Table: Events
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| business_id | int |
| event_type | varchar |
| occurences | int |
+---------------+---------+
(business_id, event_type) is the primary key of this table.
Each row in the table logs the info that an event of some type occured at
some business for a number of times.
例子:
Events table:
+-------------+------------+------------+
| business_id | event_type | occurences |
+-------------+------------+------------+
| 1 | reviews | 7 |
| 3 | reviews | 3 |
| 1 | ads | 11 |
| 2 | ads | 7 |
| 3 | ads | 6 |
| 1 | page views | 3 |
| 2 | page views | 12 |
+-------------+------------+------------+
解法
思路:
- 计算新的列叫做average occurences,代表每个event的平均出现次数
- 筛选大于平均值的数据,即occurences > average occurences
- group by business_id, 找出event type数量大于1的商家
SQL
用window function计算average occurences
with avg_occurences as
(select
*,
avg(occurences) over (partition by event_type) as avg_occurences
from events)
select business_id
from avg_occurences
where occurences > avg_occurences
group by business_id
having count(event_type) > 1
Pandas
用transform计算average occurences
events = pd.DataFrame(columns=['business_id', 'event_type', 'occurences'],
data = [[1, 'reviews', 7],
[3, 'reviews', 3],
[1, 'ads', 11],
[2, 'ads', 7],
[3, 'ads', 6],
[1, 'page views', 3],
[2, 'page views', 12]
])
events['avg_occurences'] = events.groupby('event_type').occurences.transform('mean')
res = events[events.occurences > events.avg_occurences].groupby('business_id').event_type.nunique().reset_index()
res[res.event_type > 1][['business_id']]