前提条件
在搭建好hadoop平台的基础上进行搭建,参看之前文章。
基于虚拟机搭建Hadoop大数据平台集群_传道授业解惑者的博客-优快云博客
spark大数据平台的搭建
master节点
解压spark的安装包
#tar -zxvf /root/spark-3.1.3-bin-hadoop3.2.tgz -C /usr/local/
修改配置文件
①spark-env.sh
#cd /usr/local/spark-3.1.3-bin-hadoop3.2/conf
#cp spark-env.sh.template spark-env.sh
#vi spark-env.sh
最后添加9行内容即可。
export JAVA_HOME=/usr/local/jdk64/jdk1.8.0_321
export HADOOP_CONF_DIR=/usr/local/hadoop-3.2.2/etc/hadoop/
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_EXECUTOR_CORES=1
②workers
#cp workers.template workers
#vi workers
把localhost改成slave1
slave1
③spark-defalt.conf
#cp spark-defaults.conf.template spark-defaults.conf
#vi spark-defaults.conf
最后添加以下内容。
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:8020/spark-logs
spark.history.fs.logDirectory hdfs://master:8020/spark-logs
把master节点上spark的安装内容复制到slave1上即可。
#scp -r /usr/local/spark-3.1.3-bin-hadoop3.2/ slave1:/usr/local/
在hdfs上创建目录
#hdfs dfs -mkdir /spark-logs
启动spark集群。
#/usr/local/spark-3.1.3-bin-hadoop3.2/sbin/start-all.sh
验证:
启动spark shell
# /usr/local/spark-3.1.3-bin-hadoop3.2/bin
#./spark-shell
2022-05-06 00:02:03,688 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = spark://master:7077, app id = app-20220506000215-0001).
Spark session available as 'spark'.
Welcome to
____ __
/ / ___ / /\ / _ / _ `/ __/ '/// .__/_,// //_\ version 3.1.3//
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_321)
Type in expressions to have them evaluated.
Type :help for more information.
scala>