一、搭建好Scala环境
下载地址:https://www.scala-lang.org/download/
拉到最下面,官网现在是2.13.3版本,案例用的是2.11.6版本:https://download.youkuaiyun.com/download/qq_41622603/12919481
(1)把安装包上传到服务器,并解压,这里上传到/opt/software目录下
切换到目录:cd /opt/software
解压:tar -zxvf scala-2.11.6.tgz
(2)配置环境变量
vi /etc/profile
加入红色箭头一行和追加到PATH
(3)把scala和profile远程复制到slave节点上():
scp -r /opt/software/scala-2.11.6 root@node2:/opt/software/
scp -r /opt/software/scala-2.11.6 root@node3:/opt/software/
scp /etc/profile root@node2:/etc/
scp /etc/profile root@node3:/etc/
二、安装Spark
(1)下载:https://download.youkuaiyun.com/download/qq_41622603/12919493
官网下载地址:http://spark.apache.org/downloads.html
案例用的是2.3.1版本,支持Hadoop2.7以后的版本
然后也是上传到/opt/software目录
然后切换到目录:cd /opt/software
解压:tar -zxvf spark-2.3.1-bin-hadoop2.7.tgz
(2)添加环境变量
vi /etc/profile
加入红色箭头一行和追加到PATH
(3)编辑spark-env.sh文件
切换到spark的conf目录下:cd /opt/software/spark-2.3.1-bin-hadoop2.7/conf/
目录下只有spark-env.sh.template,所以我们要复制一份出来重命名为spark-env.sh:cp spark-env.sh.template spark-env.sh
vi spark-env.sh
加入下面代码,根据实际情况修改路径
export SPARK_DIST_CLASSPATH=$(/opt/software/hadoop-3.1.4/bin/hadoop classpath)
export JAVA_HOME=/usr/java/jdk1.8.0_261-amd64
export HADOOP_HOME=/opt/software/hadoop-3.1.4
export HADOOP_CONF_DIR=/opt/software/hadoop-3.1.4/etc/hadoop
export SCALA_HOME=/opt/software/scala-2.11.6
#运行Master的主机ip
export SPARK_MASTER_IP=192.168.77.10
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=2
三、启动spark
启动spark前要启动hdfs和yarn
切换到spark目录:cd /opt/software/spark-2.3.1-bin-hadoop2.7
执行启动:sbin/start-all.sh
运行jps查看进程:Master节点会有Master进程运行,Slave节点会有两条Worker进程运行