hive：自定义函数UDF-其他实例参考

最新推荐文章于 2022-11-22 16:40:58 发布

花和尚也有春天

最新推荐文章于 2022-11-22 16:40:58 发布

阅读量1.2k

点赞数

分类专栏： hive 文章标签： udf hive

本文链接：https://blog.youkuaiyun.com/weixin_38750084/article/details/86028352

版权

本文介绍了在Hive中创建和使用自定义函数(UDF)的实例，包括将大写转为小写的功能及将Hive数据推送到Kafka的实现。详细步骤涵盖了从创建Maven项目、配置依赖、打包上传到HDFS，再到Hive中注册和使用UDF的全过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

官网实例：

实例1：自定义一个大写转小写函数

第一步：

idea创建maven项目，并在pom中添加依赖：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>ramos</groupId>
  <artifactId>hive-udf-test</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>hive-udf-test</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      <hadoop.version>2.6.0</hadoop.version>
      <hive.version>1.1.0</hive.version>
  </properties>


  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
    </dependency>
    <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>${hive.version}</version>
    </dependency>
    <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
    </dependency>
        
  </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

代码：

package ramos.hive_udf_test;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/**
 * LowerUDF
 * 不可以返回void，可以为null。
 *
 */
public class LowerUDF extends UDF
{

    public Text evaluate(Text str){

        //validate
        if(null == str.toString()){
            return null;
        }

        //lower
        return new Text(str.toString().toLowerCase());

    }

	
	
	public static void main( String[] args )
    {
//        System.out.println( "Hello World!" );

        System.out.println(new LowerUDF().evaluate(new Text("HIVE")));



    }
}

打包：

上传jar到linux：

[root@sparkproject1 hiveTestJar]# ll
total 4
-rw-r--r-- 1 root root 3092 Jun  9  2019 hive-udf-test-0.0.1-SNAPSHOT.jar
[root@sparkproject1 hiveTestJar]# 
[root@sparkproject1 hiveTestJar]# 
[root@sparkproject1 hiveTestJar]# pwd
/usr/local/hive/hiveTestJar
[root@sparkproject1 hiveTestJar]#

hive中添加jar：

add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;

create temporary function my_lower as "ramos.hive_udf_test.LowerUDF";

hive> add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;
Usage: add [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null
hive>

如上问题，我将路径写为linux本地路径便无法添加jar包，将jar上传到hdfs，然后使用hdfs的路径就可以。

参考：https://blog.youkuaiyun.com/u011495642/article/details/84327256

hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;
hive>

将har文件上传到hdfs（我试了本地的路径一直报错，注意目录一定要正确！！）

dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;

创建并注册函数：

例1

create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';

例2

hive> 
    > 
    > create function hive2kafka as'ramos.hive_udf_test.Hive2Kafka'

最低0.47元/天解锁文章

hive：自定义函数UDF-其他实例参考

实例1：自定义一个 大写转小写函数

如上问题，我将路径写为linux本地路径便无法添加jar包，将jar上传到hdfs，然后使用hdfs的路径就可以。

实例1：自定义一个大写转小写函数