安装Hadoop
https://www.cnblogs.com/chevin/p/9090683.html
安装Spark
https://www.cnblogs.com/chevin/p/11064854.html
这里有一个坑,一开始电脑用的是3.8的python,然后命令行启动pyspark一直报错,如下图

无法正确初始化Spark和SparkContext
Traceback (most recent call last):
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\shell.py”, line 31, in
from pyspark import SparkConf
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark_init_.py”, line 51, in
from pyspark.context import SparkContext
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\context.py”, line 31, in
from pyspark import accumulators
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\accumulators.py”, line 97, in
from pyspark.serializers import read_int, PickleSerializer
File “D:\spark-2.4.5-bin-ha

本文介绍了在Windows环境下,由于使用Python 3.8导致启动pyspark时遇到的错误,错误类型为TypeError。通过降级Python版本至3.6并重新配置环境变量,成功解决了启动pyspark的报错问题。
最低0.47元/天 解锁文章
7699





