Hive行列互转

1 多行转多列

  • 数据:
张三	语文	90
张三	数学	85
张三	英语	92
李四	语文	75
李四	数学	90
李四	英语	80
王五	语文	95
王五	数学	100
王五	英语	98
  • 建表:
create table stu(
name string,
subject string,
score int
)
row format delimited
fields terminated by '\t'
;
  • 加载数据:
load data local inpath '/myfile/db1116/stu.txt' into table stu;
  • 查询:
select
name,
sum(case when subject='语文' then score else 0 end ) as Chinese,
sum(case when subject='数学' then score else 0 end ) as Math,
sum(case when subject='英语' then score else 0 end ) as English
from stu
group by name
;
  • 结果:
result:
OK
name    chinese math    english
张三    90      85      92
李四    75      90      80
王五    95      100     98
Time taken: 4.507 seconds, Fetched: 3 row(s)

2 多列转多行 (一张map表映射成两张表)

  • 数据:
datev	uv	newuv	video	newvideo	vv	vip_num	new_vip_num
2019-05-10	5000000	200000	3000000	10000	20000000	500000	80000
  • 建表:
create table video(
datev date,
uv int,
newuv int,
video int,
newvideo int,
vv int,
vip_num int,
new_vip_num int
)
row format delimited
fields terminated by '\t'
;
  • 加载数据:
load data local inpath '/myfile/db1116/video.txt' into table video;
  • 查询:
select
datev,
label,
value
from video
lateral view explode(
map('uv',uv,
'新增UV',newuv,
'视频存量',video,
'新增视频',newvideo,
'播放量',vv,
'会员数',vip_num,
'新增会员数',new_vip_num
) ) k as label,value
;
  • 结果:
date	label	value
2019-05-10	UV	5000000
2019-05-10	新增UV	200000
2019-05-10	视频存量	3000000
2019-05-10	新增视频	10000
2019-05-10	播放量	20000000
2019-05-10	会员数	500000
2019-05-10	新增会员数	80000

3 多行转多列

  • 假设数据表:
row1:
col1   col2    col3
a	c	1
a	d	2
a	e	3
b	c	4
b	d	5
b	e	6
  • 现在要将其转化为:
col1   c      d      e
a      1      2      3
b      4      5      6
  • 建表:
create table row1(
col1 string,
col2 string,
col3 int
)
row format delimited
fields terminated by '\t'
;
  • 加载数据:
load data local inpath '/myfile/db1116/row1.txt' into table row1;
  • 查询:
select
col1,
sum(case when col2='c' then col3 else 0 end) as c,
sum(case when col2='d' then col3 else 0 end) as d,
sum(case when col2='e' then col3 else 0 end) as e
from row1
group by col1
;

注:需要使用到max(case … when … then … else 0 end),仅限于转化的字段为数值类型,且为正值的情况。

4 多列转多行

  • 数据表 :
row2:
col1	c d	e
a	1	2	3
b	4	5	6
  • 现要将其转化为:
col1   col2    col3
a      c       1
a      d       2
a      e       3
b      c       4
b      d       5
b      e       6
  • 建表:
create table row2(
col1 string,
c int,
d int,
e int
)
row format delimited
fields terminated by '\t'
;
  • 加载数据:
load data local inpath '/myfile/db1116/row2.txt' overwrite into table row2;
  • 查询:
select
col1,
'c' as col2,
c as col3
from row2
union
select
col1,
'd' as col2,
d as col3
from row2
union
select
col1,
'e' as col2,
e as col3
from row2
;

注:这里需要使用union进行拼接。

5 订单及订单类型行列互换

  • 数据:
order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10
  • 使用hql获取结果如下:
order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10
  • 建表:
create table deal(
id int,
type string,
order_time string
)
row format delimited 
fields terminated by '\t'
;
  • 加载数据:
load data local inpath '/myfile/db1116/order.txt' into table deal;
  • 查询(方法1):
select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
;
  • 结果:
OK
id      type1   type2   time1   time2
111     N       A       10:00   10:05
111     A       B       10:05   10:10
111     B       NULL    10:10   NULL
Time taken: 1.996 seconds, Fetched: 3 row(s)
  • 查询(方法2):
select *
from (
select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
) t 
where time2 is not null
;
  • 结果:
OK
t.id    t.type1 t.type2 t.time1 t.time2
111     N       A       10:00   10:05
111     A       B       10:05   10:10
Time taken: 2.132 seconds, Fetched: 2 row(s)

行列转换
行转列:
使用case when 查询出多列即可,即可增加列。
列转行:
1、lateral view explode(),使用炸裂函数可以将1列转成多行,被转换列适用于array、map等类型。
lateral view posexplode(数组),如有排序需求,则需要索引。将数组炸开成两行(索引 , 值),需要as 两个别名。
2、case when 结合concat_ws与collect_set/collect_list实现。内层用case when,外层用
collect_set/list收集,对搜集完后用concat_ws分割连接形成列。

6 编写sql实现行列互换

  • 数据:
id userid subject score
1 001 语文 90
2 001 数学 92
3 001 英语 80
4 002 语文 88
5 002 数学 90
6 002 英语 75.5
7 003 语文 70
8 003 数学 85se
9 003 英语 90
10 003 政治 82
  • 建表:
create table if not exists student1(
id int,
userid string,
subject string,
score double
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/student1.txt' into table student1;
  • 查询(方法1):
select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
;
  • 结果:
OK
userid  s1      s2      s3      s4      s5
001     90.0    92.0    80.0    0.0     262.0
002     88.0    90.0    75.5    0.0     253.5
003     70.0    85.0    90.0    82.0    327.0
Time taken: 2.272 seconds, Fetched: 3 row(s)
  • 查询(方法2):
select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
union
select
if(1=1,'total','t'),
sum(tmp.s1),
sum(tmp.s2),
sum(tmp.s3),
sum(tmp.s4),
sum(tmp.s5)
from
(select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid) tmp
;
  • 结果:
OK
_u2.userid      _u2.s1  _u2.s2  _u2.s3  _u2.s4  _u2.s5
001     90.0    92.0    80.0    0.0     262.0
002     88.0    90.0    75.5    0.0     253.5
003     70.0    85.0    90.0    82.0    327.0
total   248.0   267.0   245.5   82.0    842.5
Time taken: 7.138 seconds, Fetched: 4 row(s)

7 用户标签连接查询

  • 数据:
T1表:
Tags
1,2,3
1,2
2,3
T2表:
Id lab
1 A
2 B
3 C
  • 根据T1和T2表的数据,编写sql实现如下结果

ids tags
1,2,3 A,B,C
1,2 A,B
2,3 B,C
  • 建表:
create table t1(
tags string
)
row format delimited 
fields terminated by '\t'
;
create table t2(
id int,
lab string
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/t1.txt' overwrite into table t1;
load data local inpath '/myfile/db1116/t2.txt' overwrite into table t2;
  • 查询:
select
t.tags tags,
concat_ws(',',collect_set(t.lab)) labs
from (
select
tmp1.tags tags,
tmp1.tag tag,
t2.lab lab
from (
select
tags,
tag
from t1
lateral view explode(split(tags,',')) tags as tag
) tmp1
join t2
on tmp1.tag=t2.id
) t
group by t.tags --group by tags
;
  • 结果:
OK
tags    labs
1,2     A,B
1,2,3   A,B,C
2,3     B,C
Time taken: 8.54 seconds, Fetched: 3 row(s)

8 用户标签组合

  • 数据:
t3表:
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
  • 编写sql实现如下结果:
id tag flag
a b 1|2|3
c d 6|8
  • 建表:
create table t3(
id string,
tag string,
flag string
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/t3.txt' overwrite into table t3;
  • 查询:
select
id,
tag,
concat_ws('|',collect_set(flag)) flag
from t3
group by id,tag
;

9 hive实现词频统计

  • 数据:
uid contents
1 i|love|china
2 china|is|good|i|i|like
  • 统计结果如下,如果出现次数一样,则按照content名称排序:
content cnt
i 3
china 2
good 1
like 1
love 1
is 1
  • 建表:
create table t4(
uid int,
contents string
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/t4.txt' overwrite into table t4;
  • 查询:
select 
word word,
count(*) ctime
from t4
lateral view explode(split(contents,'\\|')) wc as word
group by word
order by ctime desc,word asc
;

10 课程行转列

  • 数据:
t5表
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
  • 根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0
  • 建表:
create table t5(
id string,
course string
)
row format delimited 
fields terminated by ','
;
  • 加载数据:
load data local inpath '/myfile/db1116/t5.txt' overwrite into table t5;
  • 查询:
select
id,
sum(case when course='a' then 1 else 0 end) a,
sum(case when course='b' then 1 else 0 end) b,
sum(case when course='c' then 1 else 0 end) c,
sum(case when course='d' then 1 else 0 end) d,
sum(case when course='e' then 1 else 0 end) e,
sum(case when course='f' then 1 else 0 end) f
from t5
group by id
;
  • 结果:
OK
id      a       b       c       d       e       f
1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0
Time taken: 1.93 seconds, Fetched: 3 row(s)
  • 不加sum()函数查询:
select
id,
case when course='a' then 1 else 0 end a,
case when course='b' then 1 else 0 end b,
case when course='c' then 1 else 0 end c,
case when course='d' then 1 else 0 end d,
case when course='e' then 1 else 0 end e,
case when course='f' then 1 else 0 end f
from t5
;
  • 结果:
OK
id      a       b       c       d       e       f
1       1       0       0       0       0       0
1       0       1       0       0       0       0
1       0       0       1       0       0       0
1       0       0       0       0       1       0
2       1       0       0       0       0       0
2       0       0       1       0       0       0
2       0       0       0       1       0       0
2       0       0       0       0       0       1
3       1       0       0       0       0       0
3       0       1       0       0       0       0
3       0       0       1       0       0       0
3       0       0       0       0       1       0
Time taken: 0.135 seconds, Fetched: 12 row(s)

11 成绩课程行列互换

  • 数据:
t6表:
name course score
aa English 75
bb math 85
aa math 90
  • 使用hql输出以下结果:
name English math
aa 75 90
bb 0 85
  • 建表:
create table t6(
name string,
course string,
score int
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/t6.txt' overwrite into table t6;
  • 查询:
select
name,
sum(case when course='English' then score else 0 end) English,
sum(case when course='math' then score else 0 end) Math
from t6
group by name
; 

12 求top3英雄及其pick率

  • 数据:
 id names
1 亚索,挖掘机,艾瑞莉娅,洛,卡莎
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞莉娅,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石
  • 建表:
create table top3(
id int,
names string
)
row format delimited 
fields terminated by ' '
;
  • 加载数据:
load data local inpath '/myfile/db1116/top3.txt' overwrite into table top3;
  • 查询:
select
name,
count(name) c
from (
select 
name
from top3
lateral view explode(split(names,',')) v as name
) tmp 
group by name
order by c desc,name asc
limit 3
;
  • 结果:
OK
name    c
亚索    4
盖伦    3
艾瑞莉娅        2
Time taken: 2.745 seconds, Fetched: 3 row(s)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值