mysql 内积_数据分析笔试题系列(5)

本文提供了一系列基于 MySQL 的数据分析笔试题目,包括计算稀疏向量的内积、统计宿舍楼各部门人数、找数列中位数以及处理班级成绩中位数等。通过 SQL 查询展示了如何解决这些问题,涉及到分组、窗口函数和连接操作等技巧。

来几个比较有意思的笔试题,比较注重灵活性;

1、我们把用户对商品的评分用稀疏向量表示,保存在数据库表t里面: t的字段有:uid,goods_id,star; uid是用户id;goods_id是商品id;star是用户对该商品的评分,值为1-5。 现在我们想要计算向量两两之间的内积,内积在这里的语义为:对于不同的用户,如果他们都对同样的一批商品打了分,那么对于这里面的每个人的分数乘起来,并对这些乘积求和。比如g0:2*2+2*4+2*5+4*2+4*4+4*5+5*2+5*4+5*5=121

建表语句:

create table B(uid VARCHAR(12),good_id VARCHAR(12),star int(10));

INSERT INTO B (uid,good_id,star) VALUES('u0','g0',2),('u0','g1',3),('u1','g0',4),('u1','g1',3),('u2','g0',5),('u2','g1',1);

表结构如下:

select t.good_id,sum(xiangchen)

from (

select t1.uid,t1.good_id,t1.star*t2.star as xiangchen --3*6

from (

select *

from B

)t1

inner join (

select *

from B

)t2

on t1.good_id=t2.good_id

)t

group by t.good_id;

2、员工表,宿舍表,部门表,统计出宿舍楼各部门人数表

设员工表为employee,字段为id,employee_name,belong_dormitory_id,belong_department_id;

宿舍表为dormitory,字段为id,dormitory_number;

部门表为department,字段为id,department_name

思路:

select t3.dormitory_name,t2.department_number,count(distinct t1.employee_name)

from (

select *

from employee

)t1

inner join (

select *

from department

)t2

on t1.belong_department_id=t2.id

inner join (

select *

from dormitory

)t3

on t3.id=t1.belong_dormitory_id

group by t3.dormitory_name,t2.department_number

3、给出一堆数和频数的表格,统计这一堆数中位数

设表table中字段为id,number,frequency

create table table_a (id VARCHAR(2),number int(12),frequency int(12));

insert into table_a values ('b',2,4),('c',3,3),('d',4,5),('a',1,2);建立表格时,注意将number,frequency的顺序打乱,以免出现歧义!

思路:

2+3+4+5)+1)/2=7.5;我们需要知道排序在第7,第8位的数字,然后求其平均数,即为需要求的中位数!

分两种情况考虑:

a:1,1,2,2的情况,中位数是1+2)/2=1.5

b:1,1,2,2,3的情况,中位数是2!

select (t1.number+lead(t1.number,1)over(order by t1.number))/2

from (

select number,frequency,sum(frequency) over( order by number) as ran

from table_a

)t1

inner join (

select floor(((sum(frequency)+1) / 2) ) as fre_int

from table_a

)t2

on t1.ran =t2.fre_int

union

select t3.number

from (

select number,frequency,sum(frequency) over( order by number) as ran

from table_a

)t3

inner join (

select floor(((sum(frequency)+1) / 2 ) ) as fre_int

from table_a

)t4

on t3.ran >t4.fre_int

limit 1union 或者union all是用来求并集合的,但是不能与order by 连用!所以我们采用sum()over()借助over中的order by函数完成对原始number的排序!

floor()向下取整!

3、三个班级合在一起的一张成绩单,统计每个班级成绩中位数

设表table中字段为id,class,score

create table all_scores (id VARCHAR(12),class VARCHAR(12),score int(10));

insert into all_scores VALUES('zhang shan','A',96),('li si','C',68),('lao wang','B',100),('zhao liu','A',98),('wu xiao','C',86),('xiao ma','A',84),('xiao pang','B',89);

select * from all_scores

代码如下:(oracle,hive中可以很方便使用row_number()over())

select a.class,

a.score * (1 - b.float_part) + a.next_feature1 * (b.float_part-0) as median

from

(select

class,score

row_number() over( partition by class order by score asc) as rank,

lead(score,1) over(partition by class order by score asc) as next_feature1

from

all_scores

) a

inner join

(

select class,

floor((count(score) + 1) / 2) int_part,

(count(score)+1) / 2 % 1 as float_part

from

all_scores

group by class

) b

on a.rank = b.int_part and a.class=b.class

group by a.class没测试,理论上应该正确!

Mysql版本:

select a.class,

a.score * (1 - b.float_part) + a.next_feature1 * (b.float_part-0) as median

from

(select

if(@class=class,@score:=@score+1,@score:=1) as ran,

@class:=class as class,

score,

lead(score,1) over(partition by class order by score asc) as next_feature1

from

all_scores as a,(select @class:='',@score:=0) as c

order by class

) a

inner join

(

select class,

floor((count(score) + 1) / 2) int_part,

(count(score)+1) / 2 % 1 as float_part

from

all_scores

group by class

) b

on a.ran = b.int_part and a.class=b.class

group by a.class此种写法,有一个瑕疵:当B班只有一个学生的时候,最终结果显示null(实际上B班不可能只有一个学生;当初测试的时候,意外发现这个bug)

各位看官,如果有兴趣可以研究下;

为了对Mysql版本写法有更清晰的认识,我将比较陌生的部分单独抽出来进行说明:

select

if(@class=class,@score:=@score+1,@score:=1) as ran,--class 是分组键(不能有:),score是排序键

@class:=class as class,

score,

lead(score,1) over(partition by class order by score asc) as next_feature1

from

all_scores as a,(select @class:='',@score:=0) as c --一定要起别名

order by class --必不可少row_number的mysql改写:

有set 与case when的结合;也有本文if的写法,二者效果相同;

两种方法我都尝试了,个人感觉if写法更加容易理解!

4、给定如下表结构,请用SQL写出中位数:(四分位数改写?)

自己练习一下吧(比上述两题都简单啦)

参考答案如下:求知鸟:数据分析|容易被忽视的统计值​zhuanlan.zhihu.com5d508bc13e8098ec92296268403c365f.png

5、交易表结构为user_id,id,paid_time,amount

1、写sql查询过去一个月付款用户量(提示 用户量需去重)最高的3天分别是哪几天

2、写sql查询做昨天每个用户最后付款的订单ID及金额

表结构:

create table order_record (id int, user_id int,paid_time datetime,amount bigint);

insert into order_record(id,user_id,paid_time,amount) values(1,1028898,'2018-01-04 22:32:07.0',9600);

insert into order_record(id,user_id,paid_time,amount) values(2,1030621,'2018-02-24 23:04:58.0',5400);

第一问:

Select date(paid_time),amount

From order_record

Where month(paid_time)='07'

Group by date(paid_time)

Order by count(distinct user_id) desc

Limit 3

第二问:

Select t2.id,t2.amount

From (

Select user_id,max(paid_time) as max_date

From order_record

Where datediff(now(),paid_time)=1

Group by user_id

)t1

Inner join (

Select user_id,amount,id,paid_time

From order_record

Where datediff(now(),paid_time)=1

)t2

On t1.user_id=t2.user_id

And t1.max_date=t2.paid_time

看的不尽兴,那就关注我的专栏,第一时间知道笔试题状态数据分析​zhuanlan.zhihu.com82df546db3a72e137e73bd98e701508b.png

看完,点赞再走@假装看到你!!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值