error in shuffle in fetcher 分析及方案

本文深入解析了MapReduce中的ShuffleError错误,特别是errorinshuffleinfetcher问题,探讨了其根本原因在于reduce阶段内存不足导致的OOM,并提供了解决方案,包括调整mapreduce.reduce.shuffle.memory.limit.percent参数,以优化shuffle过程中的内存使用。

error in shuffle in fetcher 分析及方案

ShuffleError 错误信息:

Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

Cause 原因:reduce会在map执行到一定比例启动多个fetch线程去拉取map的输出结果,放到reduce的内存、磁盘中,然后进行merge。当数据量大时,拉取到内存的数据就会引起OOM,所以此时要减少fetch占内存的百分比,将fetch的数据直接放在磁盘上。
有关参数:mapreduce.reduce.shuffle.memory.limit.percent

Default Configuration 默认参数:

<property>
   <name>mapreduce.reduce.shuffle.memory.limit.percent</name>
  <value>0.25</value>
  <description>Expert: Maximum percentage of the in-memory limit that a
  single shuffle can consume</description>
</property>

OR 或者

## hive
hive>set mapreduce.reduce.shuffle.memory.limit.percent;
mapreduce.reduce.shuffle.memory.limit.percent=0.15

Solution 处理方案:限制reduce的shuffle内存使用

如果是hive sql,在sql执行之前,增加如下语句:
set mapreduce.reduce.shuffle.memory.limit.percent=0.15;
如果是 MapReduce 程序,在job conf中设置如下:
job.getConfiguration().setStrings("mapreduce.reduce.shuffle.memory.limit.percent", "0.15");

参考:http://www.sqlparty.com/yarn%E5%9C%A8shuffle%E9%98%B6%E6%AE%B5%E5%86%85%E5%AD%98%E4%B8%8D%E8%B6%B3%E9%97%AE%E9%A2%98error-in-shuffle-in-fetcher/

转载于:https://www.cnblogs.com/myblog1900/p/10031873.html

Traceback (most recent call last): File "Z:\yolo\yolo\yolov5-7.0\train.py", line 636, in <module> main(opt) File "Z:\yolo\yolo\yolov5-7.0\train.py", line 529, in main train(opt.hyp, opt, device, callbacks) File "Z:\yolo\yolo\yolov5-7.0\train.py", line 284, in train for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- File "Z:\python\Lib\site-packages\tqdm\std.py", line 1181, in __iter__ for obj in iterable: File "Z:\yolo\yolo\yolov5-7.0\utils\dataloaders.py", line 171, in __iter__ yield next(self.iterator) ^^^^^^^^^^^^^^^^^^^ File "Z:\python\Lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "Z:\python\Lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\python\Lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data data.reraise() File "Z:\python\Lib\site-packages\torch\_utils.py", line 721, in reraise raise RuntimeError(msg) from None RuntimeError: Caught MemoryError in DataLoader worker process 2. Original Traceback (most recent call last): File "Z:\python\Lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) ^^^^^^^^^^^^^^^^^^^^ File "Z:\python\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Z:\python\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "Z:\yolo\yolo\yolov5-7.0\utils\dataloaders.py", line 659, in __getitem__ img, labels = self.load_mosaic(index) ^^^^^
最新发布
05-30
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值