udf实例

本文详细介绍了在Hive中使用用户定义函数(UDF)的实际案例,通过示例展示了如何创建和应用UDF来处理大数据。内容涵盖UDF的基本概念、编写过程以及在实际查询中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

udf类

package com.test.film;

import org.apache.hadoop.hive.ql.exec.UDF;

/*
 * 功能:获取电影新闻
 */
public class GetFilmNews extends UDF {

	public GetFilmNews() {

	}

	public String evaluate(String id, String name, String title, String author,
			String publish_time_string, String release_date,
			String release_status, String url, String summary, String body) {
		
		StringBuilder line_result_sb = new StringBuilder();
		
		// url
		if(null == url || url.trim().equals("") || url.equals("-")){
			return "";
		}
		
		// title
		if(null == title || title.trim().equals("") || title.equals("-")){
			return "";
		}
		
		title = title.trim();
		title = title.replaceAll("·", "·");
		title = title.replaceAll(""", "\"");
		
		// summary
		if(null == summary || summary.trim().equals("")){
			summary = "";
		}
		summary = summary.trim();
		summary = summary.replaceAll("\n|\r", "").replaceAll("·", "·").trim();
		
		// body
		if(null == body || body.trim().equals("")){
			body = "";
		}
		body = body.trim();
		body = body.replaceAll("\n|\r", "").replaceAll("·", "·").trim();
	
		if(summary.equals("") && body.equals("")){
			return "";
		}
		
		// 如果body为空,summary不为空,则用summary作为body
		if(body.equals("") && !summary.equals("") ){
			body = summary;
		}
		
		if(body.equals("") || body.equals("-")){
			return "";
		}
		
		line_result_sb.append(id);
		line_result_sb.append("\t");
		line_result_sb.append(name);
		line_result_sb.append("\t");
		line_result_sb.append(title);
		line_result_sb.append("\t");
		line_result_sb.append(body);
		
		return line_result_sb.toString();
	}
}

hivesql:

CLASSIFIER_JAR="/bigdata/Jar_GetDataFromHive/getdatafromhive-0.0.1-SNAPSHOT.jar"
hive -e "add jars $CLASSIFIER_JAR;
create temporary function getFilmNews as 'com.test.film.GetFilmNews';
set mapred.reduce.tasks=100;
set hive.map.aggr=true;
set mapred.job.priority=NORMAL;
use dmm;
select getFilmNews(t.id, t.name, t.title, t.author, t.publish_time_string, t.release_date, t.release_status, t.url, t.summary, t.body) 
from 
(select e.id as id, b.name as name, e.title as title, e.author as author, e.publish_time_string as publish_time_string, b.release_date as release_date, b.release_status as release_status, e.url as url, e.summary as summary, e.body as body 
from 
dmm.web_data e join dmm.movie_info b on (e.id=b.id) where e.media_type !=-1 and b.release_status != 2) t;
">./filmnews.txt


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值