GenericUDTF都需要实现initialize

Givenc.

于 2025-02-18 10:17:16 发布

阅读量125

点赞数 1

文章标签： sql spark java

本文链接：https://blog.youkuaiyun.com/qq_43488791/article/details/145698828

版权

hive 2.x和 hive 3.x的UDTF在实现时，虽然继承的抽象类GenericUDTF只要求实现process()、close()两个方法，但是实际Hive或Spark调用运行时又会出现如下报错：

>>> [2025-02-17 21:21:16][INFO   ][SparkServerlessJobLauncher]: JobRun state changed, To Failed
Process Output>>> Exception in thread "main" org.apache.spark.sql.AnalysisException: No handler for UDF/UDAF/UDTF 'com.alibaba.dt.hive.udtf.StringSplitAnn': java.lang.IllegalStateException: Should not be called directly; line 15 pos 0 at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.initialize(GenericUDTF.java:72)
Process Output>>>       at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.initialize(GenericUDTF.java:56)
Process Output>>>       at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector$lzycompute(hiveUDFs.scala:235)
Process Output>>>       at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector(hiveUDFs.scala:235)
Process Output>>>       at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema$lzycompute(hiveUDFs.scala:243)
Process Output>>>       at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema(hiveUDFs.scala:243)
Process Output>>>       at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:195)
Process Output>>>       at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:161)
Process Output>>>       at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:240)
Process Output>>>       at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:155)
Process Output>>>       at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1506)
Process Output>>>       at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:233)
Process Output>>>       at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:227)
Process Output>>>       at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:299)

提示就是会调用initialize()方法，而且还特别搞笑的调用到了一个@Deprecated的initialize默认实现。
在这里插入图片描述
本质上，initialize的作用有2个：①检查输入参数的合法性；②定义输出结果的类型和数量。所以还是自己实现一下吧，笑死🤣

// initialize必须要实现，指定输出结构（虽然没有要求实现，但实际会调用）
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
		// 定义2个入参，检查入参数量
        if(argOIs.length != 2){
            throw new UDFArgumentException("ExplodeStringUDTF takes exactly two argument.");
        }

        // 方法1：判断参数类型是否正确-比如STRING
        PrimitiveObjectInspector.PrimitiveCategory arg1 = ((PrimitiveObjectInspector)argOIs[0]).getPrimitiveCategory();
        PrimitiveObjectInspector.PrimitiveCategory arg2 = ((PrimitiveObjectInspector)argOIs[1]).getPrimitiveCategory();
        if (!(arg1.equals(PrimitiveObjectInspector.PrimitiveCategory.STRING))) {
            throw new UDFArgumentTypeException(0, "\"array\" expected at function STRUCT_CONTAINS, but \"" +
                    arg1.name() + "\" " + "is found" + "\n" + argOIs[0].getClass().getSimpleName());
        }
        if (!(arg2.equals(PrimitiveObjectInspector.PrimitiveCategory.STRING))) {
            throw new UDFArgumentTypeException(0, "\"array\" expected at function STRUCT_CONTAINS, but \""
                    + arg2.name() + "\" " + "is found" + "\n" + argOIs[1].getClass().getSimpleName());
        }

        // 定义输出结构
        ArrayList<String> fieldNames = new ArrayList<String>();
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
        fieldNames.add("info");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }