explode: (把一串数据转换成多行的数据)
创建一个文本:
[hadoop@ruozehadoop000 data]$ vi hive-wc.txt
hello,world,welcome
hello,welcome
创建一个表,并导入文本内容
创建一个表,并导入文本内容
create table hive_wc(sentence string);
load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;
load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;
hello,world,welcome
hello,welcome
求每个单词出现的个数
1) 获取每个单词 split(sentence,",")
["hello","world","welcome"]
["hello","welcome"]
结果输出:
"hello"
"world"
"welcome"
"hello"
"welcome"
通过聚合语法进行计算:
hello,welcome
求每个单词出现的个数
1) 获取每个单词 split(sentence,",")
["hello","world","welcome"]
["hello","welcome"]
结果输出:
"hello"
"world"
"welcome"
"hello"
"welcome"
通过聚合语法进行计算:
select word, count(1) as c
from (select explode(split(sentence,",")) as word from hive_wc) t
group by word ;
from (select explode(split(sentence,",")) as word from hive_wc) t
group by word ;
json:工作中常用
创建一张表 rating_json,上传数据,并查看前十行数据信息
create table rating_json(json string);
load data local inpath '/home/hadoop/data/rating.json' into table rating_json;
对json的数据进行处理
jason_tuple 是一个UDTF是 Hive0.7版本引进的
select json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid)
from rating_json limit 10;
作业:
准备一份数据:
row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;
[hadoop@ruozehadoop000 data]$ more hive_row_number.txt
1,18,ruoze,M
2,19,jepson,M
3,22,wangwu,F
4,16,zhaoliu,F
5,30,tianqi,M
6,26,wangba,F
创建表并导入数据:
create table hive_rownumber(id int,age int, name string, sex string)
创建表并导入数据:
row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;
要求:查询出每种性别中年龄最大的2条数据
order by是全局的排序,做不到分组内的排序
组内进行排序,就要用到窗口函数or分析函数
组内进行排序,就要用到窗口函数or分析函数
分析函数
select id,age,name,sex
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rank
from hive_rownumber) t
where rank<=2;
select id,age,name,sex
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rank
from hive_rownumber) t
where rank<=2;
输出结果:
User-Defined Functions : UDF
UDF: 一进一出 upper lower substring
UDAF:Aggregation 多进一出 count max min sum ...
UDTF: Table-Generation 一进多出

User-Defined Functions : UDF
UDF: 一进一出 upper lower substring
UDAF:Aggregation 多进一出 count max min sum ...
UDTF: Table-Generation 一进多出
工具:IDEA + Maven
定义函数功能:输入xxx,输出Hello xxx
自定义UDF函数的步骤:
以下部分摘自学长博客,略作修改。
以下部分摘自学长博客,略作修改。
创建一个新的项目
填写相关信息:
修改参数路径:
填写项目名称和保存路径:
更新完后删除: