有作业问题可私信小刘,欢迎探讨
一、第一套练习
需求:
1、求用户明细并统计每天的用户总数
2、计算从第一天到现在的所有 score 大于80分的用户总数
3、计算每个用户到当前日期分数大于80的天数
test_window.txt数据:
20191020,11111,85
20191020,22222,83
20191020,33333,86
20191021,11111,87
20191021,22222,65
20191021,33333,98
20191022,11111,67
20191022,22222,34
20191022,33333,88
20191023,11111,99
20191023,22222,33
建表:
0: jdbc:hive2://hadoop:11240> create table test_window(logday string,userid string,score int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/test_window.txt'
. . . . . . . . . . . . . . > into table test_window;
0: jdbc:hive2://hadoop:11240> select * from test_window;
+---------------------+---------------------+--------------------+
| test_window.logday | test_window.userid | test_window.score |
+---------------------+---------------------+--------------------+
| 20191020 | 11111 | 85 |
| 20191020 | 22222 | 83 |
| 20191020 | 33333 | 86 |
| 20191021 | 11111 | 87 |
| 20191021 | 22222 | 65 |
| 20191021 | 33333 | 98 |
| 20191022 | 11111 | 67 |
| 20191022 | 22222 | 34 |
| 20191022 | 33333 | 88 |
| 20191023 | 11111 | 99 |
| 20191023 | 22222 | 33 |
+---------------------+---------------------+--------------------+
1、求用户明细并统计每天的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by logday)as day_total from test_window;
+---------------------+---------------------+--------------------+------------+
| test_window.logday | test_window.userid | test_window.score | day_total |
+---------------------+---------------------+--------------------+------------+
| 20191020 | 33333 | 86 | 3 |
| 20191020 | 22222 | 83 | 3 |
| 20191020 | 11111 | 85 | 3 |
| 20191021 | 33333 | 98 | 3 |
| 20191021 | 22222 | 65 | 3 |
| 20191021 | 11111 | 87 | 3 |
| 20191022 | 33333 | 88 | 3 |
| 20191022 | 22222 | 34 | 3 |
| 20191022 | 11111 | 67 | 3 |
| 20191023 | 22222 | 33 | 2 |
| 20191023 | 11111 | 99 | 2 |
+---------------------+---------------------+--------------------+------------+
2、计算从第一天到现在的所有 score 大于80分的用户总数
0: jdbc:hive2://hadoop:11240> select *,count()over(order by logday rows between unbounded preceding and current row) as total from test_window where score>80;
+---------------------+---------------------+--------------------+--------+
| test_window.logday | test_window.userid | test_window.score | total |
+---------------------+---------------------+--------------------+--------+
| 20191020 | 33333 | 86 | 1 |
| 20191020 | 22222 | 83 | 2 |
| 20191020 | 11111 | 85 | 3 |
| 20191021 | 33333 | 98 | 4 |
| 20191021 | 11111 | 87 | 5 |
| 20191022 | 33333 | 88 | 6 |
| 20191023 | 11111 | 99 | 7 |
+---------------------+---------------------+--------------------+--------+
3、计算每个用户到当前日期分数大于80的天数
0: jdbc:hive2://hadoop:11240> select *,count()over(partition by userid order by logday rows between unbounded preceding and current row) as total
. . . . . . . . . . . . . . > from test_window where score>80 order by logday,userid;
+---------------------+---------------------+--------------------+--------+
| test_window.logday | test_window.userid | test_window.score | total |
+--------

最低0.47元/天 解锁文章
3492

被折叠的 条评论
为什么被折叠?



