Hadoop Streaming 问题及排查

运行报错及排查

hdfs  dfs -rm -r /output1
mapred streaming \
  -input /idis_demo_dir \
  -output /output1 \
  -mapper  wc_map.py \
  -reducer wc_reducer.py \
  -file wc_map.py \
  -file wc_reducer.py

报错:

Caused by: java.io.IOException: Cannot run program "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1598584262917_0007/container_1598584262917_0007_01_000007/./wc_map.py": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
	... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 24 more

修改运行

hdfs dfs -put . /

hdfs  dfs -rm -r /output1
mapred streaming \
  -input /idis_demo_dir \
  -output /output1 \
  -mapper  /root/wc_map.py \
  -reducer /root/wc_reducer.py 
hdfs dfs -chmod  -R 777 /root

报错

....
Caused by: java.io.IOException: error=13, Permission denied
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 24 more
hdfs  dfs -rm -r /output1
mapred streaming \
  -input /idis_demo_dir \
  -output /output1 \
  -mapper  "python /root/wc_map.py" \
  -reducer "python /root/wc_reducer.py" 
hdfs  dfs -rm -r /output1
mapred streaming \
  -input /idis_demo_dir \
  -output /output1 \
  -mapper  "python  wc_map.py" \
  -reducer "python  wc_reducer.py" \
  -file wc_map.py
  -file wc_reduce.py

报错

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 

由 python 语法造成,怎么看错误呢?

解决方案

mapred streaming 换成 hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D stream.non.zero.exit.is.failure=false \
-input /idis_demo_dir \
-output /output1 \
-mapper  "python  /root/wc_map.py" \
-reducer "python  /root/wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py

PipeMapRed.waitOutputThreads(): subprocess failed with code 1

代码有问题,运行错误

PipeMapRed.waitOutputThreads(): subprocess failed with code 2

你的文件对于空格和tab,换行等符号有问题

PipeMapRed.waitOutputThreads(): subprocess failed with code 127

没有找到可以执行的python解释器。.py 文件头添加#!/usr/bin/env python
使用 #!/usr/bin/python 也无法运行

输出分析

data.txt

As far as I am concerned, idis demo  is the most useful tools in the world.

不进行 reduce

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D mapreduce.job.reduces=0 \
-input /data.txt \
-output /output1 \
-mapper  "python wc_map.py" \
-reducer "python wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py \

结果

As	1
far	1
as	1
I	1
am	1
concerned,	1
idis	1
demo	1
is	1
the	1
most	1
useful	1
tools	1
in	1
the	1
world.	1

文件

1,2,1,1,1
1,2,2,1,1
1,3,1,1,1
1,3,2,1,1
1,3,3,1,1
1,2,3,1,1
1,3,1,1,1
1,3,2,1,1
1,3,3,1,1

脚本

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D mapreduce.job.reduces=0 \
-input /data.txt \
-output /output1 \
-mapper  "python wc_map.py" \
-reducer "python wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py \

输出

1,2,3,1,1	1
1,3,1,1,1	1
1,3,2,1,1	1
1,3,3,1,1	1
1,2,1,1,1	1
1,2,2,1,1	1
1,3,1,1,1	1
1,3,2,1,1	1
1,3,3,1,1	1

如何调试

查看 map 处理结果

指定 -D mapred.reduce.tasks=0 可查看 map 处理的结果。

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
    -D mapred.map.tasks=1 \
    -D mapred.reduce.tasks=0 \
......

调试数据

使用 print("{}".format(字段)) 可以调试结果

# reducer.py
import sys
for line in sys.stdin:
    if line.strip() == "":
        continue
    fields = line[:-1].split("\t")
    sno = fields[0]
    print("{}".format(sno))
    continue
hdfs dfs -cat /output1/*

输出

01
01
01
02
02
02
03
04

输出调试

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
    -files  mapper.py,hdfs://localhost:9000/cache.txt#cache1File \
    -D mapred.reduce.tasks=0 \
    -input /data.txt \
    -verbose \
    -output "/output2" \
    -mapper "python mapper.py cache1File"
import os
import sys


try:
    cacheFileName = sys.argv[2:]
    print(os.path.isfile(cacheFileName))
except Exception as e:
    print(str(e))
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值