transpose and pivot data in hive

1. 键-值对的形式存储数据

有时候,我们在hive中编写hql的时候,可以用键-值对的形式存储结果。比如有一些同学在一些课程中的学习行为,我们可以记录成如下形式:

studentclass_namekeyvalue
Aenglishanswer_num9
Achineseanswer_num15
Benglishanswer_num12
Aenglishhomework_num5
Benglishhomework_num7
Bchinesehomework_num10

这样做的好处有以下几点。一是可以避免hql中出现大量的join连接,使得hql的冗长;二是便于修改表,比如说,在有个同学的学习特征提取错误的时候,可以直接删除这个key,而不用再重写修改hql,重新跑hql。在将数据存储成这种形式后,我们需要将其转换成普通的形式。本文主要记录如何通过hive中的内置函数,来进行数据的转换。

假设上述的表格存储在tmp的表中。

create table tmp as
select 'A' as student, 'english' as class_name, 'answer_num' as key, 9 as value from dual
union all
select 'A' as student, 'chinese' as class_name, 'answer_num' as key, 15 as value from dual
union all
select 'B' as student, 'english' as class_name, 'answer_num' as key, 12 as value from dual
union all
select 'A' as student, 'english' as class_name, 'homework_num' as key, 5 as value from dual
union all
select 'B' as student, 'english' as class_name, 'homework_num' as key, 7 as value from dual
union all
select 'B' as student, 'chinese' as class_name, 'homework_num' as key, 10 as value from dual;

则转化步骤如下:

2. step1 –> map()函数

用map()函数来将key,value映射成<键,值>对的形式。

select student, class_name,
       map(key,value) as group_map
from   tmp
;

运行结果为:

A       english {"answer_num":9}
A       chinese {"answer_num":15}
B       english {"answer_num":12}
A       english {"homework_num":5}
B       english {"homework_num":7}
B       chinese {"homework_num":10}

2. step2 –> collect_list()函数

用collect_list() 函数将key值拆解出来作为字段,而相应的value值转换为list。

select t1.student, t1.class_name,
       collect_list(t1.group_map['answer_num']) as answer_num,
       collect_list(t1.group_map['homework_num']) as homework_num
from   ( -- step1
         select student, class_name,
                map(key,value) as group_map
         from   tmp
       ) t1
group by t1.student, t1.class_name
;

运行结果为

A       chinese [15]    []
A       english [9]     [5]
B       chinese []      [10]
B       english [12]    [7]

另外,也可以用 collect_set() 函数。

3. step3 –> 处理step2的结果

set hive.cli.print.header = true
;

select t1.student,
       t1.class_name,
       answer_num[0] as answer_num,
       homework_num[0] as homework_num
from   ( -- step2
         select t1.student, t1.class_name,
                collect_list(t1.group_map['answer_num']) as answer_num,
                collect_list(t1.group_map['homework_num']) as homework_num
         from   ( -- step1
                  select student, class_name,
                         map(key,value) as group_map
                  from   tmp
                ) t1
         group by t1.student, t1.class_name
       ) t1
;

运行结果为

t1.student  t1.class_name   answer_num     homework_num
A             chinese          15              NULL
A             english          9               5
B             chinese          NULL            10
B             english          12              7
Hive中没有直接的行列互换函数,但可以通过以下方式实现: 1. 使用CASE WHEN语句将列转换为行 例如,假设有以下表格: ``` id | name | age | gender -----|------|-----|------- 1 | Alice| 25 | F 2 | Bob | 30 | M 3 | Chris| 27 | M ``` 我们想要将它转换为以下形式: ``` field | 1 | 2 | 3 -------|------|------|------ name | Alice| Bob | Chris age | 25 | 30 | 27 gender | F | M | M ``` 可以使用以下查询语句: ``` SELECT 'name' AS field, MAX(CASE WHEN id = 1 THEN name END) AS 1, MAX(CASE WHEN id = 2 THEN name END) AS 2, MAX(CASE WHEN id = 3 THEN name END) AS 3 FROM my_table UNION ALL SELECT 'age' AS field, MAX(CASE WHEN id = 1 THEN age END) AS 1, MAX(CASE WHEN id = 2 THEN age END) AS 2, MAX(CASE WHEN id = 3 THEN age END) AS 3 FROM my_table UNION ALL SELECT 'gender' AS field, MAX(CASE WHEN id = 1 THEN gender END) AS 1, MAX(CASE WHEN id = 2 THEN gender END) AS 2, MAX(CASE WHEN id = 3 THEN gender END) AS 3 FROM my_table; ``` 2. 使用LATERAL VIEW EXPLODE函数将行转换为列 假设有以下表格: ``` id | category | value -----|----------|------ 1 | A | 10 1 | B | 20 1 | C | 30 2 | A | 40 2 | B | 50 2 | C | 60 ``` 我们想要将它转换为以下形式: ``` id | A | B | C -----|---|---|--- 1 | 10| 20| 30 2 | 40| 50| 60 ``` 可以使用以下查询语句: ``` SELECT id, MAX(CASE WHEN category = 'A' THEN value END) AS A, MAX(CASE WHEN category = 'B' THEN value END) AS B, MAX(CASE WHEN category = 'C' THEN value END) AS C FROM my_table LATERAL VIEW EXPLODE(array('A', 'B', 'C')) exploded_table AS category GROUP BY id; ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值