Hive SQL：一列按分隔符切分求每个类型占比

最新推荐文章于 2024-09-04 21:02:05 发布

原创最新推荐文章于 2024-09-04 21:02:05 发布 · 1.3k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#hive #sql #大数据

大数据应用专栏收录该内容

14 篇文章

订阅专栏

本文介绍了如何在Hadoop 3.2.1和Hive 3.1.3环境中，通过Hive SQL操作对数据进行字段切割并计算各类别的占比。作者详细展示了如何使用explode函数、计数和比例计算，最后给出了完整的代码示例。

数据：A,B,C,D,E
将数据按照 ‘,’ 分割，求每个类型的占比
测试环境：Hadoop 3.2.1，Hive 3.1.3

一、数据准备

创建一张表，并添加数据

# 创建表
create table if not exists test(field1 varchar(100));
# 插入数据
insert into table test values('Action,Adventure,Fantasy,Science Fiction'),('Adventure,Fantasy,Action'),('Action,Adventure,Crime');

test.field1
Action,Adventure,Fantasy,Science Fiction
Adventure,Fantasy,Action
Action,Adventure,Crime

二、切割成一列

select explode(split(field1, ',')) as f1 from test;

在这里插入图片描述

从结果来看，已经切出来了，下面做占比统计

三、占比统计

1. 添加常数列

这里我用的方法是添加一列常数列，然后做 sum 统计每个类型出现的次数

with temp as (
	select explode(split(field1, ',')) as f1 from test
)
select f1, 1 as num from temp;

在这里插入图片描述

2. 统计出现次数

用 sum 来统计出现的次数，然后使用一个子查询获取总条数，给后面统计占比用

with temp as (
	select explode(split(field1, ',')) as f1 from test
)
select f1, sum(num) as total, (select count(*) from temp) as cnt
from (
	select f1, 1 as num from temp
) t1 
group by f1;

在这里插入图片描述

3. 统计占比

直接相除得到结果

with temp as (
	select explode(split(field1, ',')) as f1 from test
)
select f1, round((total/cnt) * 100, 2) as per
from (
	select f1, sum(num) as total, (select count(*) from temp) as cnt
	from (
		select f1, 1 as num from temp
	) t1 
	group by f1
) t2;

在这里插入图片描述

四、最终代码

加个 ’ % ’

with temp as (
	select explode(split(field1, ',')) as f1 from test
)
select f1, concat(round((total/cnt) * 100, 2), '%') as per
from (
	select f1, sum(num) as total, (select count(*) from temp) as cnt
	from (
		select f1, 1 as num from temp
	) t1 
	group by f1
) t2;