配置Spark 绝对是一个扎心的过程,配置到你怀疑人生, 打算把大多的问题都记录下来,就当作记录遇到的坑
Table of Contents
A JNI error has occurred, please check your installation and try again
Py4JError("Answer from Java side is empty")
Terminal 下 spark-submit 会去启动Jupyter notebook
1.WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2.A JNI error has occurred, please check your installation and try again
3.ubuntu18 删除JAVA11并安装JAVA8
4.Yarn启动失败 java.io.IOException: Failed to send RPC 8277242275361198650 to datanode-055: java.nio.channels.ClosedChannelException
5.Py4JError("Answer from Java side is empty")
6.terminal下 spark-submit 会去启动Jupyter notebook
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
standalone 模式: jps查看进程,每一个node里的woker进程也都存在。 查看master:8080,发现worker数为0, 关闭spark后,重启 spark 后 解决问题。
ps.一般来说,not accepted any resources 就是没有woker了, 网吧秘技,重启spark解决
A JNI error has occurred, please check your installation and try again
Java版本问题,default-java 为java11,改为java8 问题解决
ubuntu18 删除JAVA11并安装JAVA8
spark2.4在JAVA11下问题多多,真是搞得令人头大
首先 sudo update-java-alternatives 没啥用,可能是有些dependency仍然有影响,所以决定完全删除
default-jdk的dependency如下:
- Depends: default-jre
- Depends: default-jdk-headless
- Depends: openjdk-11-jdk
sudo apt remove openjdk-11-jre-headless openjdk-11-jre openjdk-11-jdk-headless openjdk-11-jdk
在安装JAVA8 就行了
sudo apt-get install openjdk-8-jdk openjdk-8-jre
Yarn启动失败 java.io.IOException: Failed to send RPC 8277242275361198650 to datanode-055: java.nio.channels.ClosedChannelException
这个问题最恶心,Yarn无法启动,啥都败给,无数研究后终于解决,而且这个问题运行环境不同解决方法可能差很多
环境:ubuntu18.04
spark:2.4.2
hadoop:3.1.2
解决方法:step1降级到JAVA8, step2 配置yarn-site.xml文件(所有的node点都要修改)
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
在我的环境下这个问题已经解决了, 百度找到了其他方法是spark2.0.2的
https://blog.youkuaiyun.com/qq_38038143/article/details/88430151
Py4JError("Answer from Java side is empty")
设置spark.executor.memory, spark.driver.memory,根据电脑设置我的16G内存,executor =1g num-executor=5,driver 512M 这个基本就是我的极限了,再大 就报错了
Terminal 下 spark-submit 会去启动Jupyter notebook
我在系统变量下设置了
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
结果spark-submit就不行了,把这两个注释掉再 source ~/.bashrc也不行
解决:terminal里输入
unset PYSPARK_DRIVER_PYTHON
就可以注销掉了,但是启动jupyter就需要手动设置driver了