若提示出现以下报错:
Traceback (most recent call last): File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\shell.py", line 31, in <module> from pyspark import SparkConf File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark_init_.py", line 51, in <module> from pyspark.context import SparkContext File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\context.py", line 30, in <module> from pyspark import accumulators File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\accumulators.py", line 97, in <module> from pyspark.serializers import read_int, PickleSerializer File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\serializers.py", line 71, in <module> from pyspark import cloudpickle File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\cloudpickle.py", line 209, in <module> _cell_set_template_code = _make_cell_set_template_code() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\spark-3.0.2-bin-hadoop3.2\python\pyspark\cloudpickle.py", line 172, in _make_cell_set_template_code return types.CodeType(
在最后提示
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils... does not exist in the JVM
方法一:
先安装 pip install findspark
然后在python程序中加入以下代码
import findspark
findspark.init()
方法二:
查看pyspark版本与spark版本是否一致
1.查看spark版本:spark-shell,运行后就可以看到
2.查看pyspark版本,注意这个是有很多地方的
1.在你报错的python的Project中运行:(查看当前python程序使用的pyspark版本)
from pyspark import __version__ as pyspark_version
print("Current PySpark version:", pyspark_version)
2.若发现与spark版本不一致则:(删除安装pysaprk)
1.pip uninstall spark
2. pip install pyspark==spark的版本号 -i https://pypi.tuna.tsinghua.edu.cn/simple
3.若再次运行发现还是报错,则你的pyspark版本安装错了位置
1.如果你使用anaconda navigator
2.点击第二个Environment
3.选择你的python程序所在的文件夹中,我的是第二个
4.在旁边搜索pyspark,然后查看版本号,看看是否与spark一致,若不一致,则:
5.点击以下这个绿色的按钮
6.选择Open Terminal点击
7.
8.将pyspark删除,并且安装与spark版本一致的pyspark
详情见方法二中的2.2