安装Hadoop
https://www.cnblogs.com/chevin/p/9090683.html
安装Spark
https://www.cnblogs.com/chevin/p/11064854.html
这里有一个坑,一开始电脑用的是3.8的python,然后命令行启动pyspark一直报错,如下图
无法正确初始化Spark和SparkContext
Traceback (most recent call last):
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\shell.py”, line 31, in
from pyspark import SparkConf
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark_init_.py”, line 51, in
from pyspark.context import SparkContext
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\context.py”, line 31, in
from pyspark import accumulators
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\accumulators.py”, line 97, in
from pyspark.serializers import read_int, PickleSer