Hive中行列转换

最新推荐文章于 2024-01-30 14:46:05 发布

原创最新推荐文章于 2024-01-30 14:46:05 发布 · 5.4k 阅读

7 ·

CC 4.0 BY-SA版权

文章标签：

#Hive中行列转换

大数据专栏收录该内容

93 篇文章

订阅专栏

Hive中行列转换

行转列

多行转多列

数据表 row2col

col1   col2    col3
a      c       1
a      d       2
a      e       3  
b      c       4
b      d       5
b      e       6

现在要将其转化为：
col1   c      d      e
a      1      2      3
b      4      5      6

此时需要使用到max(case … when … then … else 0 end)，仅限于转化的字段为数值类型且为正值的情况

创建表：
create table row2col(col1 string,col2 string,col3 int)
row format delimited
fields terminated by ',';

加载数据：
load data local inpath '/root/hivedata/row2col.txt' into table row2col;
a,c,1
a,d,2
a,e,3
b,c,4
b,d,5
b,e,6

select col1,
max(case col2 when 'c' then col3 else 0 end) as c,
max(case col2 when 'd' then col3 else 0 end) as d,
max(case col2 when 'e' then col3 else 0 end) as e
from row2col
group by col1;

多行转单列(重要)

数据表 row2col_1：
col1    col2    col3
a       b       1
a       b       2
a       b       3
c       d       4
c       d       5
c       d       6

将其转化为:
col1    col2    col3
a       b       1,2,3
c       d       4,5,6

此时需要两个内置的函数：
a）concat_ws(参数1，参数2)，用于进行字符的拼接 
	参数1—指定分隔符 
	参数2—拼接的内容 
b）collect_set(col3)，它的主要作用是将某字段的值进行去重汇总，产生array类型字段
   如果不想去重可用collect_list()

创建表：
create table row2col_1(col1 string,col2 string,col3 int)
row format delimited
fields terminated by ',';

加载数据：
load data local inpath '/root/hivedata/row2col_1.txt' into table row2col_1;
a,b,1
a,b,2
a,b,3
c,d,4
c,d,5
c,d,6

select col1, col2, concat_ws('|', collect_set(cast(col3 as string))) as col3
from row2col_1
group by col1, col2;

列转行

多列转多行

数据表 col2row：
col1   c      d      e
a      1      2      3
b      4      5      6

现要将其转化为：
col1   col2    col3
a      c       1
a      d       2
a      e       3
b      c       4
b      d       5
b      e       6

这里需要使用union进行拼接。
union 可以结合多个select语句 返回共同的结果集
保证每个select语句返回的数据类型个数是一致的。

创建表：
create table col2row(col1 string,c int,d int,e int)
row format delimited
fields terminated by ',';

加载数据：
load data local inpath '/root/hivedata/col2row.txt' into table col2row;
a,1,2,3
b,4,5,6

select col1, 'c' as col2, c as col3 from col2row
UNION
select col1, 'd' as col2, d as col3 from col2row
UNION
select col1, 'e' as col2, e as col3 from col2row
order by col1, col2;

单列转多行（重要）

数据表 col2row_2：
col1    col2    col3
a       b       1,2,3
c       d       4,5,6

现要将其转化为：
col1    col2    col3
a       b       1
a       b       2
a       b       3
c       d       4
c       d       5
c       d       6

这里需要使用UDTF（表生成函数）explode()，该函数接受array类型的参数，其作用恰好与collect_set相反，实现将array类型数据行转列。explode配合lateral view实现将某列数据拆分成多行。

创建表：
create table col2row_2(col1 string,col2 string,col3 Array<string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';

create table col2row_2(col1 string,col2 string,col3 string)
row format delimited
fields terminated by '\t';

加载数据：
load data local inpath ‘/root/hivedata/col2row_2.txt’ into table col2row_2;
a b 1,2,3
c d 4,5,6

select col1, col2, lv.col3 as col3
from col2row_2
lateral view explode(split(col3, ‘,’)) lv as col3;