Hive中行列转换
行转列
-
多行转多列
数据表 row2col col1 col2 col3 a c 1 a d 2 a e 3 b c 4 b d 5 b e 6 现在要将其转化为: col1 c d e a 1 2 3 b 4 5 6 此时需要使用到max(case … when … then … else 0 end),仅限于转化的字段为数值类型且为正值的情况创建表: create table row2col(col1 string,col2 string,col3 int) row format delimited fields terminated by ','; 加载数据: load data local inpath '/root/hivedata/row2col.txt' into table row2col; a,c,1 a,d,2 a,e,3 b,c,4 b,d,5 b,e,6select col1, max(case col2 when 'c' then col3 else 0 end) as c, max(case col2 when 'd' then col3 else 0 end) as d, max(case col2 when 'e' then col3 else 0 end) as e from row2col group by col1; -
多行转单列(重要)
数据表 row2col_1: col1 col2 col3 a b 1 a b 2 a b 3 c d 4 c d 5 c d 6 将其转化为: col1 col2 col3 a b 1,2,3 c d 4,5,6 此时需要两个内置的函数: a)concat_ws(参数1,参数2),用于进行字符的拼接 参数1—指定分隔符 参数2—拼接的内容 b)collect_set(col3),它的主要作用是将某字段的值进行去重汇总,产生array类型字段 如果不想去重可用collect_list()创建表: create table row2col_1(col1 string,col2 string,col3 int) row format delimited fields terminated by ','; 加载数据: load data local inpath '/root/hivedata/row2col_1.txt' into table row2col_1; a,b,1 a,b,2 a,b,3 c,d,4 c,d,5 c,d,6select col1, col2, concat_ws('|', collect_set(cast(col3 as string))) as col3 from row2col_1 group by col1, col2;
列转行
-
多列转多行
数据表 col2row: col1 c d e a 1 2 3 b 4 5 6 现要将其转化为: col1 col2 col3 a c 1 a d 2 a e 3 b c 4 b d 5 b e 6 这里需要使用union进行拼接。 union 可以结合多个select语句 返回共同的结果集 保证每个select语句返回的数据类型个数是一致的。创建表: create table col2row(col1 string,c int,d int,e int) row format delimited fields terminated by ','; 加载数据: load data local inpath '/root/hivedata/col2row.txt' into table col2row; a,1,2,3 b,4,5,6select col1, 'c' as col2, c as col3 from col2row UNION select col1, 'd' as col2, d as col3 from col2row UNION select col1, 'e' as col2, e as col3 from col2row order by col1, col2; -
单列转多行(重要)
数据表 col2row_2: col1 col2 col3 a b 1,2,3 c d 4,5,6 现要将其转化为: col1 col2 col3 a b 1 a b 2 a b 3 c d 4 c d 5 c d 6 这里需要使用UDTF(表生成函数)explode(),该函数接受array类型的参数,其作用恰好与collect_set相反,实现将array类型数据行转列。explode配合lateral view实现将某列数据拆分成多行。创建表: create table col2row_2(col1 string,col2 string,col3 Array<string>) row format delimited fields terminated by '\t' collection items terminated by ','; create table col2row_2(col1 string,col2 string,col3 string) row format delimited fields terminated by '\t';
加载数据:
load data local inpath ‘/root/hivedata/col2row_2.txt’ into table col2row_2;
a b 1,2,3
c d 4,5,6
select col1, col2, lv.col3 as col3
from col2row_2
lateral view explode(split(col3, ‘,’)) lv as col3;
8587

被折叠的 条评论
为什么被折叠?



