1、用python写流处理脚本
如果如下代码,不处理标准数据:
# encoding=utf8
# import sys
#for each in sys.stdin:------------------不处理流
# pass
try:
import logs
print 'have logs, coooool'
except:
print 'no logs, too bad'
a.当输入的数据比较小时(大概1,2M),程序没问题。
b.当输入的数据比较大时,超过30M,会出现如下错误
2017-01-23 21:47:45,999 FATAL [IPC Server handler 3 on 44950] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1483536528209_0147_m_000001_0 - exited : java.io.IOException: 断开的管道
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
所以必须处理输入。
原因分析:
输入流写入的时间长于 python进程的存活时间。
2、待续。。。