FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
解决方案
如果你是return code 3
hive (default)> set hive.auto.convert.join=false;
报错详情
create table tmp.tmp_company_min as select t1.companyid, nvl(t2.dtime,t3.dtime) dtime, nvl(max(t2.date), max(t3.date)) date, nvl(max(t2.hour), max(t3.hour)) hour, nvl(max(t2.minute), max(t3.minute)) minute, avg(t3.rad_s) rad_s, sum(t2.acp) acp, sum(t2.shd_p) shd_p, avg(t2.acp_dr) acp_dr from dim.dim_power_station t1 left outer join tmp.tmp_station_min t2 on t1.id=t2.stid left outer join ( select date,dtime,stid,rad_s,hour,minute from gdm.gdm_weather_minute t2 where date>='20201122' and date<='20201122' ) t3 on t1.id=t3.stid and t2.dtime=t3.dtime group by t1.companyid,nvl(t2.dtime,t3.dtime);
Automatically selecting local only mode for query
Query ID = root_20201123155736_8e0abf49-858d-45af-9f7e-bd00ed7eeda8
Total jobs = 2
Stage-4 is selected by condition resolver.
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2020-11-23 15:57:39,801 Stage-4 map = 0%, reduce = 0%
2020-11-23 15:57:43,835 Stage-4 map = 100%, reduce = 0%
2020-11-23 15:57:48,848 Stage-4 map = 100%, reduce = 100%
Ended Job = job_local2034268875_0001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20201123155736_8e0abf49-858d-45af-9f7e-bd00ed7eeda8.log
2020-11-23 15:57:52 Starting to launch local task to process map join; maximum memory = 477626368
2020-11-23 15:57:53 Processing rows: 200000 Hashtable size: 199999 Memory usage: 120281336 percentage: 0.252
2020-11-23 15:57:54 Processing rows: 300000 Hashtable size: 299999 Memory usage: 144693416 percentage: 0.303
2020-11-23 15:57:54 Processing rows: 400000 Hashtable size: 399999 Memory usage: 196798792 percentage: 0.412
2020-11-23 15:57:55 Processing rows: 500000 Hashtable size: 499999 Memory usage: 242188200 percentage: 0.507
2020-11-23 15:57:57 Processing rows: 600000 Hashtable size: 599999 Memory usage: 284774512 percentage: 0.596
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-9
Logs:
/var/log/hive/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
MapReduce Jobs Launched:
Stage-Stage-4: HDFS Read: 210722870 HDFS Write: 74300946 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
报错说明
Hive官方说明:
Hive converted a join into a locally running and faster 'mapjoin', but ran out of memory while doing so. There are two bugs responsible for this.
Bug 1)
hives metric for converting joins miscalculated the required amount of memory. This is especially true for compressed files and ORC files, as hive uses the filesize as metric, but compressed tables require more memory in their uncompressed 'in memory representation'.
You could simply decrease 'hive.smalltable.filesize' to tune the metric, or increase 'hive.mapred.local.mem' to allow the allocation of more memory for map tasks.
The later option may lead to bug number two if you happen to have a affected hadoop version.
Bug 2)
Hive/Hadoop ignores 'hive.mapred.local.mem' ! (more exactly: bug in Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times, effectively overriding the user set hive.mapred.local.mem setting. see: https://issues.apache.org/jira/browse/HADOOP-10245
There are 3 workarounds for this bug:
1) assign more memory to the local! Hadoop JVM client (this is not! mapred.map.memory) because map-join child jvm will inherit the parents jvm settings
In cloudera manager home, click on "hive" service,
then on the hive service page click on "configuration"
Gateway base group --(expand)--> Resource Management -> Client Java Heap Size in Bytes -> 1GB
2) reduce "hive.smalltable.filesize" to ~1MB or below (depends on your cluster settings for the local JVM)
3) turn off "hive.auto.convert.join" to prevent hive from converting the joins to a mapjoin.
2) & 3) can be set in Big-Bench/engines/hive/conf/hiveSettings.sql