FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

本文介绍了一个Hive SQL执行过程中出现的MapJoin内存溢出错误,并提供了详细的错误日志。通过分析,指出了可能存在的两个问题:内存评估不准确及Hadoop版本中的Bug,并给出了三种解决方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

解决方案

如果你是return code 3

hive (default)> set hive.auto.convert.join=false;

报错详情

create table tmp.tmp_company_min as select t1.companyid, nvl(t2.dtime,t3.dtime) dtime, nvl(max(t2.date), max(t3.date)) date, nvl(max(t2.hour), max(t3.hour)) hour, nvl(max(t2.minute), max(t3.minute)) minute, avg(t3.rad_s) rad_s, sum(t2.acp) acp, sum(t2.shd_p) shd_p, avg(t2.acp_dr) acp_dr from dim.dim_power_station t1 left outer join tmp.tmp_station_min t2 on t1.id=t2.stid left outer join ( select date,dtime,stid,rad_s,hour,minute from gdm.gdm_weather_minute t2 where date>='20201122' and date<='20201122' ) t3 on t1.id=t3.stid and t2.dtime=t3.dtime group by t1.companyid,nvl(t2.dtime,t3.dtime);
Automatically selecting local only mode for query
Query ID = root_20201123155736_8e0abf49-858d-45af-9f7e-bd00ed7eeda8
Total jobs = 2
Stage-4 is selected by condition resolver.
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2020-11-23 15:57:39,801 Stage-4 map = 0%,  reduce = 0%
2020-11-23 15:57:43,835 Stage-4 map = 100%,  reduce = 0%
2020-11-23 15:57:48,848 Stage-4 map = 100%,  reduce = 100%
Ended Job = job_local2034268875_0001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20201123155736_8e0abf49-858d-45af-9f7e-bd00ed7eeda8.log
2020-11-23 15:57:52	Starting to launch local task to process map join;	maximum memory = 477626368
2020-11-23 15:57:53	Processing rows:	200000	Hashtable size:	199999	Memory usage:	120281336	percentage:	0.252
2020-11-23 15:57:54	Processing rows:	300000	Hashtable size:	299999	Memory usage:	144693416	percentage:	0.303
2020-11-23 15:57:54	Processing rows:	400000	Hashtable size:	399999	Memory usage:	196798792	percentage:	0.412
2020-11-23 15:57:55	Processing rows:	500000	Hashtable size:	499999	Memory usage:	242188200	percentage:	0.507
2020-11-23 15:57:57	Processing rows:	600000	Hashtable size:	599999	Memory usage:	284774512	percentage:	0.596
Execution failed with exit status: 3
Obtaining error information

Task failed!
Task ID:
  Stage-9

Logs:

/var/log/hive/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
MapReduce Jobs Launched: 
Stage-Stage-4:  HDFS Read: 210722870 HDFS Write: 74300946 SUCCESS
Total MapReduce CPU Time Spent: 0 msec

报错说明

Hive官方说明:

Hive converted a join into a locally running and faster 'mapjoin', but ran out of memory while doing so. There are two bugs responsible for this.

Bug 1)

hives metric for converting joins miscalculated the required amount of memory. This is especially true for compressed files and ORC files, as hive uses the filesize as metric, but compressed tables require more memory in their uncompressed 'in memory representation'.

You could simply decrease 'hive.smalltable.filesize' to tune the metric, or increase 'hive.mapred.local.mem' to allow the allocation of more memory for map tasks.

The later option may lead to bug number two if you happen to have a affected hadoop version.

Bug 2)

Hive/Hadoop ignores 'hive.mapred.local.mem' ! (more exactly: bug in Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times, effectively overriding the user set hive.mapred.local.mem setting. see: https://issues.apache.org/jira/browse/HADOOP-10245

There are 3 workarounds for this bug:
	1) assign more memory to the local! Hadoop JVM client (this is not! mapred.map.memory) because map-join child jvm will inherit the parents jvm settings
In cloudera manager home, click on "hive" service,
then on the hive service page click on "configuration"
Gateway base group --(expand)--> Resource Management -> Client Java Heap Size in Bytes -> 1GB
	2) reduce "hive.smalltable.filesize" to ~1MB or below (depends on your cluster settings for the local JVM)
	3) turn off "hive.auto.convert.join" to prevent hive from converting the joins to a mapjoin.
	2) & 3) can be set in Big-Bench/engines/hive/conf/hiveSettings.sql
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值