Hive UDF自定义函数-----------报错解析

最新推荐文章于 2024-12-06 22:07:24 发布

原创最新推荐文章于 2024-12-06 22:07:24 发布 · 2.4k 阅读

3 ·

CC 4.0 BY-SA版权

Hadoop 专栏收录该内容

15 篇文章

订阅专栏

一、报如下错误

-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"bike"},"value":{"_col0":3,"_col1":10.23}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)

1.在hive shell 把自定义的python脚本加载到hive中

add file hdfs:///user/hive/lib/udf1.py;

2.using 'udf1.py' 检查这里是否正确

select transform(id,vtype,price) using 'udf1.py' as (vtype string,mean float,var float) from (select * from test cluster by vtype) as temp_table;

二、报下面的错误

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)

1.报上面的错误，基本都是脚本自身有问题，调试下脚本。

举个例子：

#!/usr/bin/python
# _*_ coding:utf-8 _*_
import sys
import logging
import numpy as np
import pandas as pd
sep='\t'

def read_input(input_data):
    for line in input_data:
        line = line.strip()
        if line == "":
            continue
        yield line.split(sep)
def main():
    data=read_input(sys.stdin)
    for vtype,group in groupby(data,itemgetter(1)):
        group=[(int(rowid),vtype,float(price)) for rowid,vtype,price in gr
oup]     #如果不把price做类型转换，在 df['price'].mean()就会有报错的问题，就会抛出上面的异常。
        df=pd.DataFrame(group,columns=('id','vtype','price'))
        output=[vtype,df['price'].sum(),df['price'].mean()]
        #print len(group)
        print (sep.join(str(o) for o in output ))