1 多行转多列
- 数据:
张三 语文 90
张三 数学 85
张三 英语 92
李四 语文 75
李四 数学 90
李四 英语 80
王五 语文 95
王五 数学 100
王五 英语 98
- 建表:
create table stu(
name string,
subject string,
score int
)
row format delimited
fields terminated by '\t'
;
- 加载数据:
load data local inpath '/myfile/db1116/stu.txt' into table stu;
- 查询:
select
name,
sum(case when subject='语文' then score else 0 end ) as Chinese,
sum(case when subject='数学' then score else 0 end ) as Math,
sum(case when subject='英语' then score else 0 end ) as English
from stu
group by name
;
- 结果:
result:
OK
name chinese math english
张三 90 85 92
李四 75 90 80
王五 95 100 98
Time taken: 4.507 seconds, Fetched: 3 row(s)
2 多列转多行 (一张map表映射成两张表)
- 数据:
datev uv newuv video newvideo vv vip_num new_vip_num
2019-05-10 5000000 200000 3000000 10000 20000000 500000 80000
- 建表:
create table video(
datev date,
uv int,
newuv int,
video int,
newvideo int,
vv int,
vip_num int,
new_vip_num int
)
row format delimited
fields terminated by '\t'
;
- 加载数据:
load data local inpath '/myfile/db1116/video.txt' into table video;
- 查询:
select
datev,
label,
value
from video
lateral view explode(
map('uv',uv,
'新增UV',newuv,
'视频存量',video,
'新增视频',newvideo,
'播放量',vv,
'会员数',vip_num,
'新增会员数',new_vip_num
) ) k as label,value
;
- 结果:
date label value
2019-05-10 UV 5000000
2019-05-10 新增UV 200000
2019-05-10 视频存量 3000000
2019-05-10 新增视频 10000
2019-05-10 播放量 20000000
2019-05-10 会员数 500000
2019-05-10 新增会员数 80000
3 多行转多列
- 假设数据表:
row1:
col1 col2 col3
a c 1
a d 2
a e 3
b c 4
b d 5
b e 6
- 现在要将其转化为:
col1 c d e
a 1 2 3
b 4 5 6
- 建表:
create table row1(
col1 string,
col2 string,
col3 int
)
row format delimited
fields terminated by '\t'
;
- 加载数据:
load data local inpath '/myfile/db1116/row1.txt' into table row1;
- 查询:
select
col1,
sum(case when col2='c' then col3 else 0 end) as c,
sum(case when col2='d' then col3 else 0 end) as d,
sum(case when col2='e' then col3 else 0 end) as e
from row1
group by col1
;
注:需要使用到max(case … when … then … else 0 end),仅限于转化的字段为数值类型,且为正值的情况。
4 多列转多行
- 数据表 :
row2:
col1 c d e
a 1 2 3
b 4 5 6
- 现要将其转化为:
col1 col2 col3
a c 1
a d 2
a e 3
b c 4
b d 5
b e 6
- 建表:
create table row2(
col1 string,
c int,
d int,
e int
)
row format delimited
fields terminated by '\t'
;
- 加载数据:
load data local inpath '/myfile/db1116/row2.txt' overwrite into table row2;
- 查询:
select
col1,
'c' as col2,
c as col3
from row2
union
select
col1,
'd' as col2,
d as col3
from row2
union
select
col1,
'e' as col2,
e as col3
from row2
;
注:这里需要使用union进行拼接。
5 订单及订单类型行列互换
- 数据:
order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10
- 使用hql获取结果如下:
order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10
- 建表:
create table deal(
id int,
type string,
order_time string
)
row format delimited
fields terminated by '\t'
;
- 加载数据:
load data local inpath '/myfile/db1116/order.txt' into table deal;
- 查询(方法1):
select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
;
- 结果:
OK
id type1 type2 time1 time2
111 N A 10:00 10:05
111 A B 10:05 10:10
111 B NULL 10:10 NULL
Time taken: 1.996 seconds, Fetched: 3 row(s)
- 查询(方法2):
select *
from (
select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
) t
where time2 is not null
;
- 结果:
OK
t.id t.type1 t.type2 t.time1 t.time2
111 N A 10:00 10:05
111 A B 10:05 10:10
Time taken: 2.132 seconds, Fetched: 2 row(s)
行列转换
行转列:
使用case when 查询出多列即可,即可增加列。
列转行:
1、lateral view explode(),使用炸裂函数可以将1列转成多行,被转换列适用于array、map等类型。
lateral view posexplode(数组),如有排序需求,则需要索引。将数组炸开成两行(索引 , 值),需要as 两个别名。
2、case when 结合concat_ws与collect_set/collect_list实现。内层用case when,外层用
collect_set/list收集,对搜集完后用concat_ws分割连接形成列。
6 编写sql实现行列互换
- 数据:
id userid subject score
1 001 语文 90
2 001 数学 92
3 001 英语 80
4 002 语文 88
5 002 数学 90
6 002 英语 75.5
7 003 语文 70
8 003 数学 85se
9 003 英语 90
10 003 政治 82
- 建表:
create table if not exists student1(
id int,
userid string,
subject string,
score double
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/student1.txt' into table student1;
- 查询(方法1):
select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
;
- 结果:
OK
userid s1 s2 s3 s4 s5
001 90.0 92.0 80.0 0.0 262.0
002 88.0 90.0 75.5 0.0 253.5
003 70.0 85.0 90.0 82.0 327.0
Time taken: 2.272 seconds, Fetched: 3 row(s)
- 查询(方法2):
select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
union
select
if(1=1,'total','t'),
sum(tmp.s1),
sum(tmp.s2),
sum(tmp.s3),
sum(tmp.s4),
sum(tmp.s5)
from
(select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid) tmp
;
- 结果:
OK
_u2.userid _u2.s1 _u2.s2 _u2.s3 _u2.s4 _u2.s5
001 90.0 92.0 80.0 0.0 262.0
002 88.0 90.0 75.5 0.0 253.5
003 70.0 85.0 90.0 82.0 327.0
total 248.0 267.0 245.5 82.0 842.5
Time taken: 7.138 seconds, Fetched: 4 row(s)
7 用户标签连接查询
- 数据:
T1表:
Tags
1,2,3
1,2
2,3
T2表:
Id lab
1 A
2 B
3 C
- 根据T1和T2表的数据,编写sql实现如下结果
:
ids tags
1,2,3 A,B,C
1,2 A,B
2,3 B,C
- 建表:
create table t1(
tags string
)
row format delimited
fields terminated by '\t'
;
create table t2(
id int,
lab string
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/t1.txt' overwrite into table t1;
load data local inpath '/myfile/db1116/t2.txt' overwrite into table t2;
- 查询:
select
t.tags tags,
concat_ws(',',collect_set(t.lab)) labs
from (
select
tmp1.tags tags,
tmp1.tag tag,
t2.lab lab
from (
select
tags,
tag
from t1
lateral view explode(split(tags,',')) tags as tag
) tmp1
join t2
on tmp1.tag=t2.id
) t
group by t.tags --group by tags
;
- 结果:
OK
tags labs
1,2 A,B
1,2,3 A,B,C
2,3 B,C
Time taken: 8.54 seconds, Fetched: 3 row(s)
8 用户标签组合
- 数据:
t3表:
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
- 编写sql实现如下结果:
id tag flag
a b 1|2|3
c d 6|8
- 建表:
create table t3(
id string,
tag string,
flag string
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/t3.txt' overwrite into table t3;
- 查询:
select
id,
tag,
concat_ws('|',collect_set(flag)) flag
from t3
group by id,tag
;
9 hive实现词频统计
- 数据:
uid contents
1 i|love|china
2 china|is|good|i|i|like
- 统计结果如下,如果出现次数一样,则按照content名称排序:
content cnt
i 3
china 2
good 1
like 1
love 1
is 1
- 建表:
create table t4(
uid int,
contents string
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/t4.txt' overwrite into table t4;
- 查询:
select
word word,
count(*) ctime
from t4
lateral view explode(split(contents,'\\|')) wc as word
group by word
order by ctime desc,word asc
;
10 课程行转列
- 数据:
t5表
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
- 根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0
- 建表:
create table t5(
id string,
course string
)
row format delimited
fields terminated by ','
;
- 加载数据:
load data local inpath '/myfile/db1116/t5.txt' overwrite into table t5;
- 查询:
select
id,
sum(case when course='a' then 1 else 0 end) a,
sum(case when course='b' then 1 else 0 end) b,
sum(case when course='c' then 1 else 0 end) c,
sum(case when course='d' then 1 else 0 end) d,
sum(case when course='e' then 1 else 0 end) e,
sum(case when course='f' then 1 else 0 end) f
from t5
group by id
;
- 结果:
OK
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0
Time taken: 1.93 seconds, Fetched: 3 row(s)
- 不加sum()函数查询:
select
id,
case when course='a' then 1 else 0 end a,
case when course='b' then 1 else 0 end b,
case when course='c' then 1 else 0 end c,
case when course='d' then 1 else 0 end d,
case when course='e' then 1 else 0 end e,
case when course='f' then 1 else 0 end f
from t5
;
- 结果:
OK
id a b c d e f
1 1 0 0 0 0 0
1 0 1 0 0 0 0
1 0 0 1 0 0 0
1 0 0 0 0 1 0
2 1 0 0 0 0 0
2 0 0 1 0 0 0
2 0 0 0 1 0 0
2 0 0 0 0 0 1
3 1 0 0 0 0 0
3 0 1 0 0 0 0
3 0 0 1 0 0 0
3 0 0 0 0 1 0
Time taken: 0.135 seconds, Fetched: 12 row(s)
11 成绩课程行列互换
- 数据:
t6表:
name course score
aa English 75
bb math 85
aa math 90
- 使用hql输出以下结果:
name English math
aa 75 90
bb 0 85
- 建表:
create table t6(
name string,
course string,
score int
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/t6.txt' overwrite into table t6;
- 查询:
select
name,
sum(case when course='English' then score else 0 end) English,
sum(case when course='math' then score else 0 end) Math
from t6
group by name
;
12 求top3英雄及其pick率
- 数据:
id names
1 亚索,挖掘机,艾瑞莉娅,洛,卡莎
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞莉娅,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石
- 建表:
create table top3(
id int,
names string
)
row format delimited
fields terminated by ' '
;
- 加载数据:
load data local inpath '/myfile/db1116/top3.txt' overwrite into table top3;
- 查询:
select
name,
count(name) c
from (
select
name
from top3
lateral view explode(split(names,',')) v as name
) tmp
group by name
order by c desc,name asc
limit 3
;
- 结果:
OK
name c
亚索 4
盖伦 3
艾瑞莉娅 2
Time taken: 2.745 seconds, Fetched: 3 row(s)