Hive(三、2)案例之京东店铺访问指标
一、前期准备
1.0、打开集群&服务&客户端
vim $HIVE_HOME/conf/hive-site.xml
<!--
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop11:9083</value>
</property>
-->
nohup hive --service hiveserver2 > /opt/logs/hiveserver2.log &
nohup $HIVE_HOME/bin/hiveserver2 > /opt/logs/hiveserver2.log &
1.1、需求
有50W个京东店铺,每个顾客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为Visit,访客的用户id为user_id,被访问的店铺名称为shop,访问时间为visit_time。数据样例:'huawei','1001','2017-02-10','apple','1001','2017-02-11'……
请统计:
1) 每个店铺的UV(访客数)
2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
1.2、数据表
店铺名称 | 用户id | 访问时间 |
---|
shop | user_id | visit_time |
drop table if exists Visit;
create table Visit(
shop string COMMENT '店铺名称',
user_id string COMMENT '用户id',
visit_time string COMMENT '访问时间'
)
row format delimited fields terminated by '\t';
1.3、插入数据
insert into table Visit values ('huawei','1005','2017-02-10');
insert into table Visit values ('huawei','1005','2017-02-10');
insert into table Visit values ('huawei','1005','2017-02-10');
insert into table Visit values ('huawei','1005','2017-02-10');
insert into table Visit values ('huawei','1004','2017-02-10');
insert into table Visit values ('huawei','1004','2017-02-10');
insert into table Visit values ('huawei','1003','2017-02-10');
insert into table Visit values ('huawei','1003','2017-02-10');
insert into table Visit values ('huawei','1001','2017-02-10');
insert into table Visit values ('huawei','1002','2017-02-10');
insert into table Visit values ('huawei','1006','2017-02-10');
insert into table Visit values ('apple','1001','2017-02-10');
insert into table Visit values ('apple','1001','2017-02-10');
insert into table Visit values ('apple','1001','2017-02-10');
insert into table Visit values ('apple','1001','2017-02-10');
insert into table Visit values ('apple','1002','2017-02-10');
insert into table Visit values ('apple','1002','2017-02-10');
insert into table Visit values ('apple','1005','2017-02-10');
insert into table Visit values ('apple','1005','2017-02-10');
insert into table Visit values ('apple','1006','2017-02-10');
insert into table Visit values ('apple','1004','2017-02-10');
insert into table Visit values ('meizu','1006','2017-02-10');
insert into table Visit values ('meizu','1006','2017-02-10');
insert into table Visit values ('meizu','1006','2017-02-10');
insert into table Visit values ('meizu','1006','2017-02-10');
insert into table Visit values ('meizu','1003','2017-02-10');
insert into table Visit values ('meizu','1003','2017-02-10');
insert into table Visit values ('meizu','1003','2017-02-10');
insert into table Visit values ('meizu','1002','2017-02-10');
insert into table Visit values ('meizu','1002','2017-02-10');
insert into table Visit values ('meizu','1004','2017-02-10');
二、需求实现
2.1、每个店铺的UV(访客数)
select shop,
count(user_id) shop_user_view
from visit
group by shop;
2.2、每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
select shop,
user_id,
count(user_id) shop_user_count
from visit
group by shop, user_id;
select t1.shop,
t1.user_id,
t1.shop_user_count,
rank() over (partition by shop order by shop_user_count desc) shop_user_count_rk
from (
select shop,
user_id,
count(user_id) shop_user_count
from visit
group by shop, user_id
) t1;
select *
from (
select t1.shop,
t1.user_id,
t1.shop_user_count,
row_number() over (partition by shop order by shop_user_count desc) shop_user_count_rk
from (
select shop,
user_id,
count(user_id) shop_user_count
from visit
group by shop, user_id
) t1
) t2
where t2.shop_user_count_rk <= 3;