文章目录
运行报错及排查
hdfs dfs -rm -r /output1
mapred streaming \
-input /idis_demo_dir \
-output /output1 \
-mapper wc_map.py \
-reducer wc_reducer.py \
-file wc_map.py \
-file wc_reducer.py
报错:
Caused by: java.io.IOException: Cannot run program "/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1598584262917_0007/container_1598584262917_0007_01_000007/./wc_map.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
修改运行
hdfs dfs -put . /
hdfs dfs -rm -r /output1
mapred streaming \
-input /idis_demo_dir \
-output /output1 \
-mapper /root/wc_map.py \
-reducer /root/wc_reducer.py
hdfs dfs -chmod -R 777 /root
报错
....
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
hdfs dfs -rm -r /output1
mapred streaming \
-input /idis_demo_dir \
-output /output1 \
-mapper "python /root/wc_map.py" \
-reducer "python /root/wc_reducer.py"
hdfs dfs -rm -r /output1
mapred streaming \
-input /idis_demo_dir \
-output /output1 \
-mapper "python wc_map.py" \
-reducer "python wc_reducer.py" \
-file wc_map.py
-file wc_reduce.py
报错
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
由 python 语法造成,怎么看错误呢?
解决方案
将 mapred streaming
换成 hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D stream.non.zero.exit.is.failure=false \
-input /idis_demo_dir \
-output /output1 \
-mapper "python /root/wc_map.py" \
-reducer "python /root/wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py
PipeMapRed.waitOutputThreads(): subprocess failed with code 1
代码有问题,运行错误
PipeMapRed.waitOutputThreads(): subprocess failed with code 2
你的文件对于空格和tab,换行等符号有问题
PipeMapRed.waitOutputThreads(): subprocess failed with code 127
没有找到可以执行的python解释器。.py 文件头添加#!/usr/bin/env python
使用 #!/usr/bin/python 也无法运行
输出分析
data.txt
As far as I am concerned, idis demo is the most useful tools in the world.
不进行 reduce
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D mapreduce.job.reduces=0 \
-input /data.txt \
-output /output1 \
-mapper "python wc_map.py" \
-reducer "python wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py \
结果
As 1
far 1
as 1
I 1
am 1
concerned, 1
idis 1
demo 1
is 1
the 1
most 1
useful 1
tools 1
in 1
the 1
world. 1
文件
1,2,1,1,1
1,2,2,1,1
1,3,1,1,1
1,3,2,1,1
1,3,3,1,1
1,2,3,1,1
1,3,1,1,1
1,3,2,1,1
1,3,3,1,1
脚本
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D mapreduce.job.reduces=0 \
-input /data.txt \
-output /output1 \
-mapper "python wc_map.py" \
-reducer "python wc_reducer.py" \
-file wc_map.py \
-file wc_reducer.py \
输出
1,2,3,1,1 1
1,3,1,1,1 1
1,3,2,1,1 1
1,3,3,1,1 1
1,2,1,1,1 1
1,2,2,1,1 1
1,3,1,1,1 1
1,3,2,1,1 1
1,3,3,1,1 1
如何调试
查看 map 处理结果
指定 -D mapred.reduce.tasks=0
可查看 map 处理的结果。
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-D mapred.map.tasks=1 \
-D mapred.reduce.tasks=0 \
......
调试数据
使用 print("{}".format(字段))
可以调试结果
# reducer.py
import sys
for line in sys.stdin:
if line.strip() == "":
continue
fields = line[:-1].split("\t")
sno = fields[0]
print("{}".format(sno))
continue
hdfs dfs -cat /output1/*
输出
01
01
01
02
02
02
03
04
输出调试
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar \
-files mapper.py,hdfs://localhost:9000/cache.txt#cache1File \
-D mapred.reduce.tasks=0 \
-input /data.txt \
-verbose \
-output "/output2" \
-mapper "python mapper.py cache1File"
import os
import sys
try:
cacheFileName = sys.argv[2:]
print(os.path.isfile(cacheFileName))
except Exception as e:
print(str(e))