Hive 函数UDF开发以及永久注册UDF函数

最新推荐文章于 2025-11-02 19:07:01 发布

原创

最新推荐文章于 2025-11-02 19:07:01 发布 · 8.2k 阅读

12 ·

CC 4.0 BY-SA版权

本文介绍了如何在Hive中开发自定义UDF（用户定义函数），包括创建新项目、编写HelloUDF类并实现evaluate方法，以及打包和上传jar文件。通过创建临时函数sayHello演示了UDF的使用，但要注意临时函数只在当前session有效。永久注册UDF需要将jar包放入指定目录并重新加载元数据。

explode： （把一串数据转换成多行的数据）

创建一个文本：

[hadoop@ruozehadoop000 data]$ vi hive-wc.txt

hello,world,welcome

hello,welcome

创建一个表，并导入文本内容

create table hive_wc(sentence string);
load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;

hello,world,welcome
hello,welcome

求每个单词出现的个数
1) 获取每个单词 split(sentence,",")
["hello","world","welcome"]
["hello","welcome"]
结果输出：
"hello"
"world"
"welcome"
"hello"
"welcome"
通过聚合语法进行计算：

select word, count(1) as c
from (select explode(split(sentence,",")) as word from hive_wc) t
group by word ;

json：工作中常用
创建一张表 rating_json，上传数据，并查看前十行数据信息
create table rating_json(json string);
load data local inpath '/home/hadoop/data/rating.json' into table rating_json;
对json的数据进行处理

jason_tuple 是一个UDTF是 Hive0.7版本引进的
select json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid)
from rating_json limit 10;

作业：

准备一份数据：

[hadoop@ruozehadoop000 data]$ more hive_row_number.txt

1,18,ruoze,M

2,19,jepson,M

3,22,wangwu,F

4,16,zhaoliu,F

5,30,tianqi,M

6,26,wangba,F

创建表并导入数据：

create table hive_rownumber(id int,age int, name string, sex string)
row format delimited fields terminated by ',';

load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;

要求：查询出每种性别中年龄最大的2条数据

order by是全局的排序，做不到分组内的排序
组内进行排序，就要用到窗口函数or分析函数

分析函数
select id,age,name,sex
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rank
from hive_rownumber) t
where rank<=2;

输出结果：