函数
build-in
cast(value AS TYPE)
cast('100' AS INT)
binary类型==>string
binary==>string==>int
User-Defined Functions (UDFs)
built-in的满足不了我们的业务需求
自定义函数
db2/oracle ==> 云化(Hadoop)
UDF: one2one concat/lower 70%
UDAF:聚合 many2one count/max......
UDTF: T tabel one2many lateral view explode
开发UDF
1)继承UDF类
2)重写evaluate方法
note:
1)要有返回值
2)参数类型使用Hadoop的类型
create temporary function say_hello
as 'com.ruozedata.udf.HelloUDF';
CREATE FUNCTION say_hello2
AS 'com.ruozedata.udf.HelloUDF'
USING JAR 'hdfs://hadoop000:8020/lib/hive-1.0.jar';
HiveServer2/jdbc
JOIN
inner join
left
right
full
...
CREATE TABLE ruoze_a (
id int,
name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/join_a.txt" OVERWRITE INTO TABLE ruoze_a;
CREATE TABLE ruoze_b (
id int,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/join_b.txt" OVERWRITE INTO TABLE ruoze_b;
CREATE TABLE ruoze_dept(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/dept.txt" OVERWRITE INTO TABLE ruoze_dept;
select
e.empno, e.ename, e.deptno, d.dname
from emp e join ruoze_dept d
on e.deptno = d.deptno ;
https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1515321645252&di=99e33e03866ee26ca089db113d733ad6&imgtype=jpg&src=http%3A%2F%2Fimg4.imgtn.bdimg.com%2Fit%2Fu%3D1151729734%2C2550729243%26fm%3D214%26gp%3D0.jpg
作业
1) Date Functions
2) 把HelloUDF给我整合到Hive的源码中去,重新编译部署
然后启动hive,show functions是否能够看到你注册的
sayhello方法
3) 根据https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBC
案例自行测试 除了select之外的其他方法
build-in
cast(value AS TYPE)
cast('100' AS INT)
binary类型==>string
binary==>string==>int
User-Defined Functions (UDFs)
built-in的满足不了我们的业务需求
自定义函数
db2/oracle ==> 云化(Hadoop)
UDF: one2one concat/lower 70%
UDAF:聚合 many2one count/max......
UDTF: T tabel one2many lateral view explode
开发UDF
1)继承UDF类
2)重写evaluate方法
note:
1)要有返回值
2)参数类型使用Hadoop的类型
create temporary function say_hello
as 'com.ruozedata.udf.HelloUDF';
CREATE FUNCTION say_hello2
AS 'com.ruozedata.udf.HelloUDF'
USING JAR 'hdfs://hadoop000:8020/lib/hive-1.0.jar';
HiveServer2/jdbc
JOIN
inner join
left
right
full
...
CREATE TABLE ruoze_a (
id int,
name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/join_a.txt" OVERWRITE INTO TABLE ruoze_a;
CREATE TABLE ruoze_b (
id int,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/join_b.txt" OVERWRITE INTO TABLE ruoze_b;
CREATE TABLE ruoze_dept(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
LOAD DATA LOCAL INPATH "/home/hadoop/data/dept.txt" OVERWRITE INTO TABLE ruoze_dept;
select
e.empno, e.ename, e.deptno, d.dname
from emp e join ruoze_dept d
on e.deptno = d.deptno ;
https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1515321645252&di=99e33e03866ee26ca089db113d733ad6&imgtype=jpg&src=http%3A%2F%2Fimg4.imgtn.bdimg.com%2Fit%2Fu%3D1151729734%2C2550729243%26fm%3D214%26gp%3D0.jpg
小表放hadoop缓存,然后读大表map 该图重点!!!
maximum memory = 477626368
shuffle: 相同的deptno分到一个reduce上去
emp: <deptno, (e.empno, e.ename)>
ruoze_dept: <deptno, (d.dname)>
(deptno, e.empno, e.ename, d.dname)
hive的join:
common join/reduce join/shuffle join
mapjoin
set hive.auto.convert.join = true; 没有使用reduce
作业
1) Date Functions
2) 把HelloUDF给我整合到Hive的源码中去,重新编译部署
然后启动hive,show functions是否能够看到你注册的
sayhello方法
3) 根据https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBC
案例自行测试 除了select之外的其他方法