hive窗口函数练习题

原创

已于 2024-03-24 17:52:54 修改 · 3.1k 阅读

48 ·

CC 4.0 BY-SA版权

爱小刘，爱生活

文章标签：

#hive

于 2020-11-08 12:41:33 首次发布

有作业问题可私信小刘，欢迎探讨

一、第一套练习

需求：
1、求用户明细并统计每天的用户总数
2、计算从第一天到现在的所有 score 大于80分的用户总数
3、计算每个用户到当前日期分数大于80的天数

test_window.txt数据：

20191020,11111,85
20191020,22222,83
20191020,33333,86
20191021,11111,87
20191021,22222,65
20191021,33333,98
20191022,11111,67
20191022,22222,34
20191022,33333,88
20191023,11111,99
20191023,22222,33

建表：

0: jdbc:hive2://hadoop:11240> create table test_window(logday string,userid string,score int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/test_window.txt'
. . . . . . . . . . . . . . > into table test_window;
0: jdbc:hive2://hadoop:11240> select * from test_window;
+---------------------+---------------------+--------------------+
| test_window.logday  | test_window.userid  | test_window.score  |
+---------------------+---------------------+--------------------+
| 20191020            | 11111               | 85                 |
| 20191020            | 22222               | 83                 |
| 20191020            | 33333               | 86                 |
| 20191021            | 11111               | 87                 |
| 20191021            | 22222               | 65                 |
| 20191021            | 33333               | 98                 |
| 20191022            | 11111               | 67                 |
| 20191022            | 22222               | 34                 |
| 20191022            | 33333               | 88                 |
| 20191023            | 11111               | 99                 |
| 20191023            | 22222               | 33                 |
+---------------------+---------------------+--------------------+

1、求用户明细并统计每天的用户总数

0: jdbc:hive2://hadoop:11240> select *,count()over(partition by logday)as day_total from test_window;
+---------------------+---------------------+--------------------+------------+
| test_window.logday  | test_window.userid  | test_window.score  | day_total  |
+---------------------+---------------------+--------------------+------------+
| 20191020            | 33333               | 86                 | 3          |
| 20191020            | 22222               | 83                 | 3          |
| 20191020            | 11111               | 85                 | 3          |
| 20191021            | 33333               | 98                 | 3          |
| 20191021            | 22222               | 65                 | 3          |
| 20191021            | 11111               | 87                 | 3          |
| 20191022            | 33333               | 88                 | 3          |
| 20191022            | 22222               | 34                 | 3          |
| 20191022            | 11111               | 67                 | 3          |
| 20191023            | 22222               | 33                 | 2          |
| 20191023            | 11111               | 99                 | 2          |
+---------------------+---------------------+--------------------+------------+

2、计算从第一天到现在的所有 score 大于80分的用户总数

0: jdbc:hive2://hadoop:11240> select *,count()over(order by logday rows between unbounded preceding and current row) as total from test_window where score>80;
+---------------------+---------------------+--------------------+--------+
| test_window.logday  | test_window.userid  | test_window.score  | total  |
+---------------------+---------------------+--------------------+--------+
| 20191020            | 33333               | 86                 | 1      |
| 20191020            | 22222               | 83                 | 2      |
| 20191020            | 11111               | 85                 | 3      |
| 20191021            | 33333               | 98                 | 4      |
| 20191021            | 11111               | 87                 | 5      |
| 20191022            | 33333               | 88                 | 6      |
| 20191023            | 11111               | 99                 | 7      |
+---------------------+---------------------+--------------------+--------+

3、计算每个用户到当前日期分数大于80的天数

0: jdbc:hive2://hadoop:11240> select *,count()over(partition by userid order by logday rows between unbounded preceding and current row) as total
. . . . . . . . . . . . . . > from test_window where score>80 order by logday,userid;
+---------------------+---------------------+--------------------+--------+
| test_window.logday  | test_window.userid  | test_window.score  | total  |
+--------