impala1.2.3 udf问题

最新推荐文章于 2022-11-14 17:41:07 发布

weixin_33804582

最新推荐文章于 2022-11-14 17:41:07 发布

阅读量249

点赞数

CC 4.0 BY-SA版权

原文链接：http://blog.51cto.com/caiguangguang/1359312

本文介绍在Impala 1.2.3版本中使用UDF时遇到的问题及解决方案，因该版本不支持String类型作为输入和返回值，文中详细说明如何利用Text类替代，并给出具体代码实例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

新的impala已经支持udf了，在测试环境部署了1.2.3版本的cluster.
在运行测试udf时遇到下面这个错误：
java.lang.IllegalArgumentException （表明向方法传递了一个不合法或不正确的参数。）
经过确认这是一个bug:
https://issues.cloudera.org/browse/IMPALA-791
The currently impala 1.2.3 doesn't support String as the input and return types. You'll instead have to use Text or BytesWritable.
1.2.3版本的impala udf的输入参数和返回值还不支持String,可以使用import org.apache.hadoop.io.Text类代替String

Text的api文档：
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Text.html
重要的几点：
Constructor:
Text(String string) Construct from a string.
Method:
String toString() Convert text back to string
void set(String string) Set to contain the contents of a string.
void set(Text other) copy a text.
void clear() clear the string to empty

在eclipse中测试Text类的用法：

package com.hive.myudf;
import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import org.apache.hadoop.io.Text;
public class TextTest {
    private static Text schemal = new Text( "http://");
    private static Text t = new Text( "GET /vips-mobile/router.do?api_key=04e0dd9c76902b1bfc5c7b3bb4b1db92&app_version=1.8.7 HTTP/1.0");
    private static Pattern p = null;
    private static Matcher m = null;
    public static void main(String[] args) {
        p = Pattern. compile( "(.+?) +(.+?) (.+)");
        Matcher m = p.matcher( t.toString());
        if (m.matches()){
                String tt = schemal +"test.test.com" +m.group(2);
                System. out .println(tt);
                //return m.group(2);
        } else {
                System. out .println("not match" );
                //return null;
        }
        schemal .clear();
        t.clear();
        }
}

测试udf：

package com.hive.myudf;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import org.apache.log4j.Logger;
public class UDFNginxParseUrl extends UDF {
  private static final Logger LOG = Logger.getLogger(UDFNginxParseUrl.class);  
  private Text schemal = new Text("http://" );
  private Pattern p1 = null;
  private URL url = null;
  private Pattern p = null;
  private Text lastKey = null ;
  private String rt;
  public UDFNginxParseUrl() {
  }
  public Text evaluate(Text host1, Text urlStr, Text partToExtract) {
    LOG.debug( "3args|args1:" + host1 +",args2:" + urlStr + ",args3:" + partToExtract);
       System. out.println("3 args" );
       System. out.println("args1:" + host1 +",args2:" + urlStr + ",args3:" + partToExtract);
    if (host1 == null || urlStr == null || partToExtract == null) {
      //return null;
       return new Text("a" );
    }
     p1 = Pattern.compile("(.+?) +(.+?) (.+)" );
     Matcher m1 = p1.matcher(urlStr.toString());
     if (m1.matches()){
         LOG.debug("into match" );
         String realUrl = schemal.toString() + host1.toString() + m1.group(2);
          Text realUrl1 = new Text(realUrl);
          System. out.println("URL is " + realUrl1);
          LOG.debug("realurl:" + realUrl1.toString());
          try{
                LOG.debug("into try" );
               url = new URL(realUrl1.toString());
          } catch (Exception e){
               //return null;
                LOG.debug("into exception" );
                return new Text("b" );
          }
                           
     }
    if (partToExtract.equals( "HOST")) {
      rt = url.getHost();
      LOG.debug( "get host" + rt );
    }
    //return new Text(rt);
    LOG.debug( "get what");
    return new Text("rt" );
  }
}

几个注意的地方：
1.function是和db相关联的。
2.jar文件存放在hdfs中
3.function会被catalog缓存

转载于:https://blog.51cto.com/caiguangguang/1359312