UDF + PYTHON
问题1:报错Cannot run program “python”: error=2, No such file or dictory
解决: hadoop_env ,yarn-env.sh ,hive-env中加入python环境变量:
export PYTHON_HOME=/opt/anaconda3
export PATH=PATH:PATH:PATH:PYTHON_HOME/bin
建表:
CREATE TABLE test_rule
(
dczc
string,
bizhong
string,
zq_type
string)
row format delimited fields terminated by ‘,’
LOCATION
‘hdfs://192.168.0.1:9000/user/hive/warehouse/qqbb.db/test_rule’
导入数据到hive表中:
load data local inpath ‘/home/hue/hive_udf/data.dat’ into table qqbb.test_rule;
函数定义代码:
-- coding: utf-8 --
import sys
for line in sys.stdin:
detail = list(line.strip().split("\t"))
if len(detail) != 3:
continue
else:
dczc = detail[0]
bizhong = detail[1]
zq_type = detail[2]
if dczc == ‘存管账户活期存款’ and bizhong == ‘CNY’:
print(’\t’.join([dczc,bizhong,zq_type,‘01010101’]))
else:
print(‘null’)
# elif len(idcard) == 18:
# if int(idcard[-2]) % 2 == 0:
# print("\t".join([name,idcard,“女”]))
# else:
# print("\t".join([name,idcard,“男”]))
# else:
# print("\t".join([name,idcard,“身份信息不合法!”]))
函数测试:
cat data.dat|python python_udf.py
函数使用:
select transform(dczc,bizhong,zq_type) USING ‘python python_udf.py’ AS (dczc,bizhong,zq_type,res) from test_rule;