Spark是基于内存的分布式计算框架
本文基于<<Hadoop2.7.7 HA完全分布式集群搭建>>搭建
1.下载相应的安装包
cd /usr/local
# 下载Scala安装包
wget https://downloads.lightbend.com/scala/2.13.1/scala-2.13.1.tgz
# 下载Spark安装包
wget https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
# 解压
tar -zxvf scala-2.13.1.tgz
tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz
2.修改环境变量配置
vim /etc/profile
export SCALA_HOME=/usr/local/scala-2.13.1
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
# WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH
# 环境生效
source /etc/profile
3.修改Spark相应配置
cd /usr/local/spark-2.4.4-bin-without-hadoop/conf
cp spark-env.sh.template spark-env.sh
#然后修改spark-env.sh为
# 配置JAVA_HOME
export JAVA_HOME=/usr/local/java
export SCALA_HOME=/usr/local/scala-2.13.1
# 设置Master所在的机器名
export SPARK_MASTER_IP=weyes01
# 每一个Worker最多可以使用的内存
export SPARK_WORKER_MEMORY=1024m
# 每一个Worker最多可以使用的cpu core的个数
export SPARK_WORKER_CORES=3
# 提交Application的端口,默认为7077
export SPARK_MASTER_PORT=7077
export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.7/etc/hadoop
# 修改python的版本
export PYSPARK_PYTHON=/usr/bin/python3
# 设置从节点
cp slaves.template slaves
# 这里只设置了一台,按照业务需求可设置多台
vim slaves
weyes02
# 修改python版本
cd /usr/local/spark-2.4.4-bin-hadoop2.7/bin
vim pyspark
if [[ -z "$PYSPARK_PYTHON" ]]; then
if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! $WORKS_WITH_IPYTHON ]]; then
echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2
exit 1
else
PYSPARK_PYTHON=python3 # 这里需要修改
fi
fi
4.将安装包传输到weyes02上
scp -r spark-2.4.4-bin-hadoop2.7 weyes02:`pwd`
scp -r scala-2.13.1 weyes02:`pwd`
5.启动集群并验证
# master节点
/usr/local/spark-2.4.4-bin-hadoop2.7/sbin/start-all.sh
# 查看Scala是否启动成功
scala -version
Scala code runner version 2.13.1 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.
# 查看Spark是否启动成功
# 通过查看日志是否安装成功
cd /usr/local/spark-2.4.4-bin-hadoop2.7/logs
# master节点
19/12/10 20:20:25 INFO master.Master: I have been elected leader! New state: ALIVE
19/12/10 20:20:55 INFO master.Master: Registering worker 192.168.18.125:37457 with 3 cores, 1024.0 MB RAM
# slave节点
19/12/10 20:20:54 INFO worker.Worker: Connecting to master weyes01:7077...
19/12/10 20:20:55 INFO worker.Worker: Successfully registered with master spark://weyes01:7077
jps验证一下

本文详细介绍了如何在Hadoop2.7.7HA完全分布式集群上搭建Spark集群,包括下载和解压安装包、配置环境变量、修改Spark配置文件、传输安装包到从节点以及启动和验证集群的步骤。
808

被折叠的 条评论
为什么被折叠?



