UDF
UDF:用户自定义函数,表示以一行数据中的一列或者多列数据作为参数然后返回结果是一个值的函数,例如round()和floor().
示例:比较两个逗号分隔的字符串是否相同。
JAVA代码中一定要继承UDF类并实现evaluate()函数,在查询过程中对应的么一个用到这个函数的地方都会对这个类进行实例化,对每行输入都会调用到evaluate()函数,而且用户是可以重载evaluate方法的。
废话不多说,上代码
import org.apache.hadoop.hive.ql.exec.UDF;
public class UDFTest extends UDF {
private String[] isBlank(String value, String split) {
String[] fields = value.split(split);
return fields;
}
/**
* 判断按照指定符号分隔的两个字段是否一致
*
* @param aids 第一个字段
* @param bids 第二个字段
* @param split 分隔符号
* @return 如果返回值是1 则两条数据相同,如果返回0 则不同
*/
public int evaluate(String aids, String bids, String split) {
int result = 0;
String[] values = isBlank(aids, split);
String[] values1 = isBlank(bids, split);
int length = values.length;
if (length == values1.length) {
for (int i = 0; i < length; i++) {
if (values[i] == values1[i]) {
result = 1;
} else {
return 0;
}
}
} else {
return 0;
}
return result;
}
}
UDF类源码
package org.apache.hadoop.hive.ql.exec;
import org.apache.hadoop.hive.ql.udf.UDFType;
/**
* A User-defined function (UDF) for use with Hive.
* <p>
* New UDF classes need to inherit from this UDF class (or from {@link
* org.apache.hadoop.hive.ql.udf.generic.GenericUDF GenericUDF} which provides more flexibility at
* the cost of more complexity).
* <p>
* Requirements for all classes extending this UDF are:
* <ul>
* <li>Implement one or more methods named {@code evaluate} which will be called by Hive (the exact
* way in which Hive resolves the method to call can be configured by setting a custom {@link
* UDFMethodResolver}). The following are some examples:
* <ul>
* <li>{@code public int evaluate();}</li>
* <li>{@code public int evaluate(int a);}</li>
* <li>{@code public double evaluate(int a, double b);}</li>
* <li>{@code public String evaluate(String a, int b, Text c);}</li>
* <li>{@code public Text evaluate(String a);}</li>
* <li>{@code public String evaluate(List<Integer> a);} (Note that Hive Arrays are represented as
* {@link java.util.List Lists} in Hive.
* So an {@code ARRAY<int>} column would be passed in as a {@code List<Integer>}.)</li>
* </ul>
* </li>
* <li>{@code evaluate} should never be a void method. However it can return {@code null} if
* needed.
* <li>Return types as well as method arguments can be either Java primitives or the corresponding
* {@link org.apache.hadoop.io.Writable Writable} class.</li&