set hive.fetch.task.conversion含义

本文介绍Hive中通过调整fetch.task.conversion配置项为minimal或more模式来优化查询性能的方法,这两种模式可以避免不必要的MapReduce任务启动,从而提高简单查询的速度。

 我们在执行hive代码的时候,一条简单的命令大部分都会转换成为mr代码在后台执行,但是有时候我们仅仅只是想获取一部分数据而已,仅仅是获取数据,还需要转化成为mr去执行吗?那个也太浪费时间和内存啦,所以有一个hive的配置如下图所示:
  
      我们会发现这个属性所对应着两种模式,minimal和more。
      在minimal下,我们执行select * ,limit,filter在一个表所属的分区表上操作,这三种情况都会直接进行数据的拿去,也就是直接把数据从对应的表格拿出来,不用跑mr代码,这样会快点儿运行程序。
      在more模式下,运行select,filter,limit,都是运行数据的fetch,不跑mr应用,所以感觉more模式会更好点儿。

      具体的我们看看下面的演示:
      
[sql]  view plain  copy
  1. set hive.fetch.task.conversion=minimal  //默认情况下是minimal  
[sql]  view plain  copy
  1. select * from  emp  


[html]  view plain  copy
  1. select  empno   from emp;//mr应用程序  




[sql]  view plain  copy
  1. set hive.fetch.task.conversion=more  
[html]  view plain  copy
  1. select  empno   from emp;//mr应用程序  


     可以看到这种情况下。select的查询据变成了数据的fetch而不是mr应用。
hive> select > floor(cast(col3 as bigint) / (1000 * 60 * 60)) as hour_range, > count(*) as record_count > from media_index > where col3 rlike '^[0-9]+$' -- 仅保留数字记录 > group by floor(cast(col3 as bigint) / (1000 * 60 * 60)); Query ID = ccd_20251214183915_362370c6-1398-4a55-b844-25275b2a71d8 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Defaulting to jobconf value of: 5 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2025-12-14 18:39:17,021 Stage-1 map = 0%, reduce = 0% Ended Job = job_local1836596976_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec hive> -- 启用Fetch模式,统计待机记录(>5分钟=300000毫秒) hive> set hive.fetch.task.conversion=more; hive> select count(*) as standby_record_count > from media_index > where col3 rlike '^[0-9]+$' -- 过滤非数字异常值 > and cast(col3 as bigint) > 300000; Query ID = ccd_20251214183926_ee3f8b3f-69bd-4bd6-ba56-f39639222430 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2025-12-14 18:39:27,726 Stage-1 map = 0%, reduce = 0% Ended Job = job_local1568795444_0006 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec
最新发布
12-16
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值