Error:
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
Root cause:
I am using 1.7.0_45. While python spark on yarn has some issue which makes pyspark does not work with spark build with jdk7: