hive sql笔试题

来源于某多社招。

原文:https://blog.youkuaiyun.com/qq_24206673/article/details/108282465

题目:

有一张表记录了一场篮球赛的得分情况,主要有以下字段:队名(team)、队员名(name)、队员号(num)、得分(score)、得分时间 秒级(score_time)。要求用sql/hive

  1. 连续三次为本队得分的球员;

  2. 为本队比分反超的球员,以及对应的完成反超的时刻

思路整理:

拆分问题(1),连续三次得分转换为:将得分表按照升序排列,让每一行记录的上一行,且上上行记录都是同一球员。第一步,可以用lag()over()函数。第二步,因为每个时间的得分球员是一一对应的,类似于一个键值对,这样的话,把第一步得到的结果打标。最后选出3次都是同样的球员即可。

拆分问题(2),计算每一个时间点的两队的总分情况,以及上一个时间点的两队得分情况,比较这两个时间点的得分是否有逆转。意即,上一次A队领先,下一次B队总分领先,反之亦然。

代码:

--建表
use test;
create table basketball_game_score_detail(
team  string,
number int,
score_time  string,
score  int,
name  string
);

--插入测试数据
insert into table basketball_game_score_detail values
('A',1,'2020/8/28 9:01:14',1,'A1')
,('A',5,'2020/8/28 9:02:28',1,'A5')
,('B',4,'2020/8/28 9:03:42',3,'B4')
,('A',4,'2020/8/28 9:04:55',3,'A4')
,('B',1,'2020/8/28 9:06:09',3,'B1')
,('A',3,'2020/8/28 9:07:23',3,'A3')
,('A',4,'2020/8/28 9:08:37',3,'A4')
,('B',1,'2020/8/28 9:09:51',2,'B1')
,('B',2,'2020/8/28 9:11:05',2,'B2')
,('B',4,'2020/8/28 9:12:18',1,'B4')
,('A',1,'2020/8/28 9:13:32',2,'A1')
,('A',1,'2020/8/28 9:14:46',1,'A1')
,('A',4,'2020/8/28 9:16:00',1,'A4')
,('B',3,'2020/8/28 9:17:14',3,'B3')
,('B',2,'2020/8/28 9:18:28',3,'B2')
,('A',2,'2020/8/28 9:19:42',3,'A2')
,('A',1,'2020/8/28 9:20:55',1,'A1')
,('B',3,'2020/8/28 9:22:09',2,'B3')
,('B',3,'2020/8/28 9:23:23',3,'B3')
,('A',5,'2020/8/28 9:24:37',2,'A5')
,('B',1,'2020/8/28 9:25:51',3,'B1')
,('B',2,'2020/8/28 9:27:05',1,'B2')
,('A',3,'2020/8/28 9:28:18',1,'A3')
,('B',4,'2020/8/28 9:29:32',1,'B4')
,('A',1,'2020/8/28 9:30:46',3,'A1')
,('B',1,'2020/8/28 9:32:00',1,'B1')
,('A',4,'2020/8/28 9:33:14',2,'A4')
,('B',1,'2020/8/28 9:34:28',1,'B1')
,('B',5,'2020/8/28 9:35:42',2,'B5')
,('A',1,'2020/8/28 9:36:55',1,'A1')
,('B',1,'2020/8/28 9:38:09',3,'B1')
,('A',1,'2020/8/28 9:39:23',3,'A1')
,('B',2,'2020/8/28 9:40:37',3,'B2')
,('A',3,'2020/8/28 9:41:51',3,'A3')
,('A',1,'2020/8/28 9:43:05',2,'A1')
,('B',3,'2020/8/28 9:44:18',3,'B3')
,('A',5,'2020/8/28 9:45:32',2,'A5')
,('B',5,'2020/8/28 9:46:46',3,'B5');
--第一问
----第一步,求出每个时间点的上一个时间点和上上个时间点
select * 
       ,lag(score_time,1)over(order by score_time asc) as m1
       ,lag(score_time,2)over(order by score_time asc) as m2
       from basketball_game_score_detail;
----第二步,对m1时间点和m2时间点打标上对应的球员名
select a.*,b.name,c.name
from 
(select * 
       ,lag(score_time,1)over(order by score_time asc) as m1
       ,lag(score_time,2)over(order by score_time asc) as m2
       from basketball_game_score_detail

)a left join basketball_game_score_detail b on a.m1=b.score_time
 left join basketball_game_score_detail c on a.m2=c.score_time;

----第三步,求出这三个时间点都是同一名球员的情况
select a.*,b.name,c.name
from 
(select * 
       ,lag(score_time,1)over(order by score_time asc) as m1
       ,lag(score_time,2)over(order by score_time asc) as m2
       from basketball_game_score_detail

)a left join basketball_game_score_detail b on a.m1=b.score_time
left join basketball_game_score_detail c on a.m2=c.score_time
where a.name=b.name and b.name=c.name;

--第二问
----第一步,添加两列,分别是每个时间点的得分,如果时间点1,A队得分那么B队就是0分,反之亦然。
select *,case 
        when team='A' then score else 0 end as a_score
       ,case 
       when team='B' then score 
       else 0 end as b_score 
       from basketball_game_score_detai;
----第二步,对每个时间点各球队的分数汇总
select a.team
    ,a.number
    ,a.score_time
    ,a.score
    ,a.name
    ,sum(a_score)over(partition by team order by score_time asc) as a_score_sum
    ,sum(b_score)over(partition by team order by score_time asc) as b_score_sum
from 
(select *,case 
        when team='A' then score else 0 end as a_score
       ,case 
       when team='B' then score 
       else 0 end as b_score 
       from basketball_game_score_detail
)a order by a.score_time asc;

----第三步,求出每个时间点的本次总分差值和上一次总分差值
select b.*
       ,a_score_sum-b_score_sum as diff_score
       ,lag(a_score_sum-b_score_sum,1)over(order by score_time asc) as last_score_diff
from 
(select a.team
    ,a.number
    ,a.score_time
    ,a.score
    ,a.name
    ,sum(a_score)over(partition by team order by score_time asc) as a_score_sum
    ,sum(b_score)over(partition by team order by score_time asc) as b_score_sum
from 
(select *,case 
        when team='A' then score else 0 end as a_score
       ,case 
       when team='B' then score 
       else 0 end as b_score 
       from basketball_game_score_detail
)a order by a.score_time asc
)b;
----第四步,两个差值的乘积小于0,即意味着有逆转情况发生
select *
from 
(select b.*
       ,a_score_sum-b_score_sum as diff_score
       ,lag(a_score_sum-b_score_sum,1)over(order by score_time asc) as last_score_diff
from 
(select a.team
    ,a.number
    ,a.score_time
    ,a.score
    ,a.name
    ,sum(a_score)over(partition by team order by score_time asc) as a_score_sum
    ,sum(b_score)over(partition by team order by score_time asc) as b_score_sum
from 
(select *,case 
        when team='A' then score else 0 end as a_score
       ,case 
       when team='B' then score 
       else 0 end as b_score 
       from basketball_game_score_detail
)a order by a.score_time asc
)b
)c
where diff_score*last_score_diff<0;





### 大数据开发 HiveSQL 笔试题目及答案 #### 题目 1: 自连接实现 自连接是一种特殊的连接方式,与其自身进行连接。这通常用于查找具有层次结构的数据。 ```sql SELECT e.employee_id, m.manager_name FROM employees AS e JOIN employees AS m ON e.manager_id = m.employee_id; ``` 此查询展示了如何通过员工找到每位员工对应的经理称[^1]。 #### 题目 2: 窗口分析函数实现 窗口函数允许执行聚合计算而不减少输入行的数量。下面的例子展示了一个典型的排场景: ```sql SELECT employee_id, salary, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank FROM salaries; ``` 这段代码按部门分区并根据薪水降序排列来分配密集等级。 #### 题目 3: Top N 查询 为了获取每个类别的前几记录,可以使用窗口函数配合 `LIMIT` 或者子查询的方法: ```sql WITH RankedSales AS ( SELECT product_category, sales_amount, ROW_NUMBER() OVER(PARTITION BY product_category ORDER BY sales_amount DESC) rn FROM sales_data ) SELECT * FROM RankedSales WHERE rn <= 5; -- 获取每种类目的前五销售量商品 ``` 这里利用了CTE(公用达式)以及ROW_NUMBER窗口函数实现了分类下的top n选取. #### 题目 4: 使用 CONCAT 和 CONCAT_WS 函数 这两个字符串拼接函数的区别在于后者可以在多个参数间加入指定分隔符。 ```sql -- 单纯地把两列内容连在一起 SELECT concat(firstname,' ',lastname) full_name FROM users; -- 加入逗作为分隔符链接多列 SELECT concat_ws(',', firstname, lastname, email) info_string FROM users; ``` 上述例子分别演示了两种不同情况下应该如何选用合适的字符串组合方法[^3]. #### 题目 5: LOAD DATA 命令语法 当需要加载外部文件到Hive,可采用如下命令格式: ```sql LOAD DATA LOCAL INPATH '/path/to/local/file' INTO TABLE my_table; ``` 这条语句指定了本地路径中的文件被导入至为my_table的目标格中[^4]. #### 题目 6: WHERE vs HAVING 的区别 理解两者之间的差异对于写出高效的查询至关重要。WHERE适用于过滤单条记录;而HAVING则针对已经完成GROUP BY操作后的结果集做进一步筛选。 ```sql -- 正确用法示范 SELECT region, COUNT(*) num_stores FROM stores WHERE open_date >= '2020-01-01' GROUP BY region HAVING COUNT(*) > 10; ``` 该片段说明了何应该应用WHERE条件而非HAVING来进行初步的数据裁剪[^5].
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值