Hive行列互转

最新推荐文章于 2024-01-26 20:28:53 发布

置顶 The Great Bear

最新推荐文章于 2024-01-26 20:28:53 发布

阅读量329

点赞数 2

分类专栏： hive hadoop 大数据工程师文章标签： hive 行转列列转行行列互转

本文链接：https://blog.youkuaiyun.com/weixin_45971146/article/details/103437559

版权

大数据工程师同时被 3 个专栏收录

11 篇文章

订阅专栏

hive

7 篇文章

订阅专栏

hadoop

6 篇文章

订阅专栏

1 多行转多列

数据：

张三	语文	90
张三	数学	85
张三	英语	92
李四	语文	75
李四	数学	90
李四	英语	80
王五	语文	95
王五	数学	100
王五	英语	98

建表：

create table stu(
name string,
subject string,
score int
)
row format delimited
fields terminated by '\t'
;

加载数据：

load data local inpath '/myfile/db1116/stu.txt' into table stu;

查询：

select
name,
sum(case when subject='语文' then score else 0 end ) as Chinese,
sum(case when subject='数学' then score else 0 end ) as Math,
sum(case when subject='英语' then score else 0 end ) as English
from stu
group by name
;

结果：

result:
OK
name    chinese math    english
张三    90      85      92
李四    75      90      80
王五    95      100     98
Time taken: 4.507 seconds, Fetched: 3 row(s)

2 多列转多行 (一张map表映射成两张表)

数据：

datev	uv	newuv	video	newvideo	vv	vip_num	new_vip_num
2019-05-10	5000000	200000	3000000	10000	20000000	500000	80000

建表：

create table video(
datev date,
uv int,
newuv int,
video int,
newvideo int,
vv int,
vip_num int,
new_vip_num int
)
row format delimited
fields terminated by '\t'
;

加载数据：

load data local inpath '/myfile/db1116/video.txt' into table video;

查询：

select
datev,
label,
value
from video
lateral view explode(
map('uv',uv,
'新增UV',newuv,
'视频存量',video,
'新增视频',newvideo,
'播放量',vv,
'会员数',vip_num,
'新增会员数',new_vip_num
) ) k as label,value
;

结果：

date	label	value
2019-05-10	UV	5000000
2019-05-10	新增UV	200000
2019-05-10	视频存量	3000000
2019-05-10	新增视频	10000
2019-05-10	播放量	20000000
2019-05-10	会员数	500000
2019-05-10	新增会员数	80000

3 多行转多列

假设数据表：

row1：
col1   col2    col3
a	c	1
a	d	2
a	e	3
b	c	4
b	d	5
b	e	6

现在要将其转化为：

col1   c      d      e
a      1      2      3
b      4      5      6

建表：

create table row1(
col1 string,
col2 string,
col3 int
)
row format delimited
fields terminated by '\t'
;

加载数据：

load data local inpath '/myfile/db1116/row1.txt' into table row1;

查询：

select
col1,
sum(case when col2='c' then col3 else 0 end) as c,
sum(case when col2='d' then col3 else 0 end) as d,
sum(case when col2='e' then col3 else 0 end) as e
from row1
group by col1
;

注：需要使用到max(case … when … then … else 0 end)，仅限于转化的字段为数值类型，且为正值的情况。

4 多列转多行

数据表 :

row2：
col1	c d	e
a	1	2	3
b	4	5	6

现要将其转化为：

col1   col2    col3
a      c       1
a      d       2
a      e       3
b      c       4
b      d       5
b      e       6

建表：

create table row2(
col1 string,
c int,
d int,
e int
)
row format delimited
fields terminated by '\t'
;

加载数据：

load data local inpath '/myfile/db1116/row2.txt' overwrite into table row2;

查询：

select
col1,
'c' as col2,
c as col3
from row2
union
select
col1,
'd' as col2,
d as col3
from row2
union
select
col1,
'e' as col2,
e as col3
from row2
;

注：这里需要使用union进行拼接。

5 订单及订单类型行列互换

数据：

order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10

使用hql获取结果如下：

order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10

建表：

create table deal(
id int,
type string,
order_time string
)
row format delimited 
fields terminated by '\t'
;

加载数据：

load data local inpath '/myfile/db1116/order.txt' into table deal;

查询（方法1）：

select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
;

结果：

OK
id      type1   type2   time1   time2
111     N       A       10:00   10:05
111     A       B       10:05   10:10
111     B       NULL    10:10   NULL
Time taken: 1.996 seconds, Fetched: 3 row(s)

查询（方法2）：

select *
from (
select
id,
type type1,
lead(type) over(distribute by id sort by order_time) type2,
order_time time1,
lead(order_time) over(partition by id order by order_time) time2
from deal
) t 
where time2 is not null
;

结果：

OK
t.id    t.type1 t.type2 t.time1 t.time2
111     N       A       10:00   10:05
111     A       B       10:05   10:10
Time taken: 2.132 seconds, Fetched: 2 row(s)

行列转换
行转列：
使用case when 查询出多列即可，即可增加列。
列转行:
1、lateral view explode()，使用炸裂函数可以将1列转成多行，被转换列适用于array、map等类型。
lateral view posexplode(数组)，如有排序需求，则需要索引。将数组炸开成两行(索引 , 值),需要as 两个别名。
2、case when 结合concat_ws与collect_set/collect_list实现。内层用case when，外层用
collect_set/list收集，对搜集完后用concat_ws分割连接形成列。

6 编写sql实现行列互换

数据：

id userid subject score
1 001 语文 90
2 001 数学 92
3 001 英语 80
4 002 语文 88
5 002 数学 90
6 002 英语 75.5
7 003 语文 70
8 003 数学 85se
9 003 英语 90
10 003 政治 82

建表：

create table if not exists student1(
id int,
userid string,
subject string,
score double
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/student1.txt' into table student1;

查询（方法1）：

select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
;

结果：

OK
userid  s1      s2      s3      s4      s5
001     90.0    92.0    80.0    0.0     262.0
002     88.0    90.0    75.5    0.0     253.5
003     70.0    85.0    90.0    82.0    327.0
Time taken: 2.272 seconds, Fetched: 3 row(s)

查询（方法2）：

select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid
union
select
if(1=1,'total','t'),
sum(tmp.s1),
sum(tmp.s2),
sum(tmp.s3),
sum(tmp.s4),
sum(tmp.s5)
from
(select
userid,
sum(case when subject='语文' then score else 0 end) s1,
sum(case when subject='数学' then score else 0 end) s2,
sum(case when subject='英语' then score else 0 end) s3,
sum(case when subject='政治' then score else 0 end) s4,
sum(score) s5
from
student1
group by userid) tmp
;

结果：

OK
_u2.userid      _u2.s1  _u2.s2  _u2.s3  _u2.s4  _u2.s5
001     90.0    92.0    80.0    0.0     262.0
002     88.0    90.0    75.5    0.0     253.5
003     70.0    85.0    90.0    82.0    327.0
total   248.0   267.0   245.5   82.0    842.5
Time taken: 7.138 seconds, Fetched: 4 row(s)

7 用户标签连接查询

数据：

T1表:
Tags
1,2,3
1,2
2,3

T2表:
Id lab
1 A
2 B
3 C

根据T1和T2表的数据，编写sql实现如下结果

：

ids tags
1,2,3 A,B,C
1,2 A,B
2,3 B,C

建表：

create table t1(
tags string
)
row format delimited 
fields terminated by '\t'
;

create table t2(
id int,
lab string
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/t1.txt' overwrite into table t1;
load data local inpath '/myfile/db1116/t2.txt' overwrite into table t2;

查询：

select
t.tags tags,
concat_ws(',',collect_set(t.lab)) labs
from (
select
tmp1.tags tags,
tmp1.tag tag,
t2.lab lab
from (
select
tags,
tag
from t1
lateral view explode(split(tags,',')) tags as tag
) tmp1
join t2
on tmp1.tag=t2.id
) t
group by t.tags --group by tags
;

结果：

OK
tags    labs
1,2     A,B
1,2,3   A,B,C
2,3     B,C
Time taken: 8.54 seconds, Fetched: 3 row(s)

8 用户标签组合

数据：

t3表：
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8

编写sql实现如下结果：

id tag flag
a b 1|2|3
c d 6|8

建表：

create table t3(
id string,
tag string,
flag string
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/t3.txt' overwrite into table t3;

查询：

select
id,
tag,
concat_ws('|',collect_set(flag)) flag
from t3
group by id,tag
;

9 hive实现词频统计

数据：

uid contents
1 i|love|china
2 china|is|good|i|i|like

统计结果如下,如果出现次数一样，则按照content名称排序：

content cnt
i 3
china 2
good 1
like 1
love 1
is 1

建表：

create table t4(
uid int,
contents string
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/t4.txt' overwrite into table t4;

查询：

select 
word word,
count(*) ctime
from t4
lateral view explode(split(contents,'\\|')) wc as word
group by word
order by ctime desc,word asc
;

10 课程行转列

数据：

t5表
id course
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e

根据编写sql，得到结果如下(表中的1表示选修，表中的0表示未选修)：

id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0

建表：

create table t5(
id string,
course string
)
row format delimited 
fields terminated by ','
;

加载数据：

load data local inpath '/myfile/db1116/t5.txt' overwrite into table t5;

查询：

select
id,
sum(case when course='a' then 1 else 0 end) a,
sum(case when course='b' then 1 else 0 end) b,
sum(case when course='c' then 1 else 0 end) c,
sum(case when course='d' then 1 else 0 end) d,
sum(case when course='e' then 1 else 0 end) e,
sum(case when course='f' then 1 else 0 end) f
from t5
group by id
;

结果：

OK
id      a       b       c       d       e       f
1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0
Time taken: 1.93 seconds, Fetched: 3 row(s)

不加sum（）函数查询：

select
id,
case when course='a' then 1 else 0 end a,
case when course='b' then 1 else 0 end b,
case when course='c' then 1 else 0 end c,
case when course='d' then 1 else 0 end d,
case when course='e' then 1 else 0 end e,
case when course='f' then 1 else 0 end f
from t5
;

结果：

OK
id      a       b       c       d       e       f
1       1       0       0       0       0       0
1       0       1       0       0       0       0
1       0       0       1       0       0       0
1       0       0       0       0       1       0
2       1       0       0       0       0       0
2       0       0       1       0       0       0
2       0       0       0       1       0       0
2       0       0       0       0       0       1
3       1       0       0       0       0       0
3       0       1       0       0       0       0
3       0       0       1       0       0       0
3       0       0       0       0       1       0
Time taken: 0.135 seconds, Fetched: 12 row(s)

11 成绩课程行列互换

数据：

t6表：
name course score
aa English 75
bb math 85
aa math 90

使用hql输出以下结果：

name English math
aa 75 90
bb 0 85

建表：

create table t6(
name string,
course string,
score int
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/t6.txt' overwrite into table t6;

查询：

select
name,
sum(case when course='English' then score else 0 end) English,
sum(case when course='math' then score else 0 end) Math
from t6
group by name
;

12 求top3英雄及其pick率

数据：

 id names
1 亚索,挖掘机,艾瑞莉娅,洛,卡莎
2 亚索,盖伦,奥巴马,牛头,皇子
3 亚索,盖伦,艾瑞莉娅,宝石,琴女
4 亚索,盖伦,赵信,老鼠,锤石

建表：

create table top3(
id int,
names string
)
row format delimited 
fields terminated by ' '
;

加载数据：

load data local inpath '/myfile/db1116/top3.txt' overwrite into table top3;

查询：

select
name,
count(name) c
from (
select 
name
from top3
lateral view explode(split(names,',')) v as name
) tmp 
group by name
order by c desc,name asc
limit 3
;

结果：

OK
name    c
亚索    4
盖伦    3
艾瑞莉娅        2
Time taken: 2.745 seconds, Fetched: 3 row(s)