Error:
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
Root cause:
I am using 1.7.0_45. While python spark on yarn has some issue which makes pyspark does not work with spark build with jdk7:
解决Spark 'No module named pyspark' 错误

本文介绍了如何通过重新打包Spark来解决在启动时遇到的'No module named pyspark'问题。具体步骤包括使用'unzip'解压spark-assembly jar文件,然后在指定目录下使用$JAVA6_HOME/bin/jar命令创建新的jar包,确保在命令末尾加上dot。
最低0.47元/天 解锁文章
9297

被折叠的 条评论
为什么被折叠?



