spark的安装配置

1.前期准备

(1) 最好先把hadoop三件套做好,具体可以参考我之前的一篇文章(这篇文章没过审核,所以只有作者本人才能看到):

黑马程序员hadoop三件套(hdfs,Mapreduce,yarn)的安装配置以及hive的安装配置-优快云博客

(2)把spark-3.4.4-bin-hadoop3.tgz提前下载好(使用ndm下载)

下载链接:apache-spark-spark-3.4.4安装包下载_开源镜像站-阿里云

(3)ndm下载器下载方法:下载速度60M/s,直接跑满带宽!最新NDM中文绿色汉化版,内附详细安装使用教程,平替IDM下载器_哔哩哔哩_bilibili

里面讲的很详细。

2.配置spark

(1)打开Finalshell,上传spark-3.4.4-bin-hadoop3.tgz安装包到 node2节点 的/export/server目录下。(此时在node1上应该使用hadoop用户登录,而在整个配置过程中node2和node3一直保持在root账户即可)

su - hadoop
cd /export/server
rz

(2)解压缩spark-3.4.4-bin-hadoop3.tgz文件

tar -zxf spark-3.4.4-bin-hadoop3.tgz -C /export/server/
(1)(2)效果如下

[root@node1 ~]# su - hadoop
上一次登录:三 11月 20 09:13:46 CST 2024pts/0 上
[hadoop@node1 ~]$ cd /export/server
[hadoop@node1 server]$ rz
[hadoop@node1 server]$ tar -zxf spark-3.4.4-bin-hadoop3.tgz -C /export/server/
[hadoop@node1 server]$ ll
total 380148
drwxrwxr-x  11 hadoop hadoop       196 Nov  3 17:00 apache-hive-3.1.3-bin
-r--------   1 hadoop hadoop        84 Jan 18  2018 dept.txt
drwxr-xr-x   5 root   root          48 Nov 14 09:40 dockerkafka
-r--------   1 hadoop hadoop       579 Jan 18  2018 emp.txt
-rw-r--r--   1 root   root         695 Nov 19 21:29 flink.yml
-rw-rw-r--   1 hadoop hadoop       251 Nov 14 16:12 goods
lrwxrwxrwx   1 hadoop hadoop        27 Nov  3 15:32 hadoop -> /export/server/hadoop-3.3.4
drwxrwxr-x  11 hadoop hadoop       227 Nov  3 15:48 hadoop-3.3.4
drwxrwxr-x   8 hadoop hadoop       176 Nov  9 21:45 hbase-2.5.10
lrwxrwxrwx   1 hadoop hadoop        36 Nov  3 16:52 hive -> /export/server/apache-hive-3.1.3-bin
lrwxrwxrwx.  1 hadoop hadoop        27 Oct 21 17:46 jdk -> /export/server/jdk1.8.0_212
drwxr-xr-x.  7 hadoop hadoop       245 Apr  2  2019 jdk1.8.0_212
-rw-r--r--   1 root   root        2925 Nov 14 09:41 kafka.yml
drwxrwxr-x   2 hadoop hadoop        43 Nov 14 15:55 out
-rw-r--r--   1 root   root          51 Nov 14 15:19 p1.txt
drwxr-xr-x   8 root   root          78 Nov 23 14:34 redis-cluster
-r--------   1 hadoop hadoop        64 Jan 18  2018 salgrade.txt
drwxr-xr-x  13 hadoop hadoop       211 Oct 21 10:29 spark-3.4.4-bin-hadoop3
-r--------   1 hadoop hadoop 388988563 Nov 24 21:55 spark-3.4.4-bin-hadoop3.tgz
-rw-r--r--   1 root   root        2140 Nov 23 14:48 start_redis.yml
-r--------   1 hadoop hadoop        48 Nov  4 11:04 test.txt
-r--------   1 hadoop hadoop     60222 Nov  3  2020 train.csv
-r--------   1 hadoop hadoop       279 Nov  4 11:19 wordcount.hql
-r--------   1 root   root       88190 Jul 20  2017 XX.txt
-r--------   1 root   root       88380 Jul 20  2017 YY.txt

(3)进入/export/server/zookeeper-3.4.6/conf目录

cd /export/server/spark-3.4.4-bin-hadoop3/conf

(4)在conf目录,复制workers.template:cp workers.template workers
修改workers,先删除其中的localhost,然后添加:
node2
node3

cp workers.template workers
vi workers
node2
node3

(5)在conf目录,复制spark-defaults.conf.template:cp spark-defaults.conf.template spark-defaults.conf
修改spark-defaults.conf,往文件里添加:
spark.master                     spark://node1:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://node1:8020/spark-logs
spark.history.fs.logDirectory    hdfs://node1:8020/spark-logs

cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
spark.master                     spark://node1:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://node1:8020/spark-logs
spark.history.fs.logDirectory    hdfs://node1:8020/spark-logs

(6)在conf目录,复制spark-env.sh.template:cp spark-env.sh.template spark-env.sh
修改spark-env.sh,往文件里添加:
JAVA_HOME=/export/server/jdk1.8.0_212
HADOOP_CONF_DIR=/export/server/hadoop-3.3.4/etc/hadoop
SPARK_MASTER_IP=node1
SPARK_MASTER_PORT=7077
SPARK_WORKER_MEMORY=512m
SPARK_WORKER_CORES=1
SPARK_EXECUTOR_MEMORY=512m
SPARK_EXECUTOR_CORES=1
SPARK_WORKER_INSTANCES=1

cp spark-env.sh.template spark-env.sh
vi spark-env.sh
JAVA_HOME=/export/server/jdk1.8.0_212
HADOOP_CONF_DIR=/export/server/hadoop-3.3.4/etc/hadoop
SPARK_MASTER_IP=node1
SPARK_MASTER_PORT=7077
SPARK_WORKER_MEMORY=512m
SPARK_WORKER_CORES=1
SPARK_EXECUTOR_MEMORY=512m
SPARK_EXECUTOR_CORES=1
SPARK_WORKER_INSTANCES=1
(3)~(6)效果如下
[hadoop@node1 server]$ cd /export/server/spark-3.4.4-bin-hadoop3/conf
[hadoop@node1 conf]$ cp workers.template workers
[hadoop@node1 conf]$ vi workers
[hadoop@node1 conf]$ cp spark-defaults.conf.template spark-defaults.conf
[hadoop@node1 conf]$ vi spark-defaults.conf
[hadoop@node1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@node1 conf]$ vi spark-env.sh

(7)改spark-3.4.4-bin-hadoop3目录的拥有者和所属组
chown -R hadoop:hadoop /export/server/spark-3.4.4-bin-hadoop3
mkdir -p /export/server/spark-3.4.4-bin-hadoop3/logs

mkdir -p /export/server/spark-3.4.4-bin-hadoop3/logs
chown -R hadoop:hadoop /export/server/spark-3.4.4-bin-hadoop3
(7)效果如下

[hadoop@node1 conf]$ mkdir -p /export/server/spark-3.4.4-bin-hadoop3/logs
[hadoop@node1 conf]$ chown -R hadoop:hadoop /export/server/spark-3.4.4-bin-hadoop3
[hadoop@node1 conf]$ cd ..
[hadoop@node1 spark-3.4.4-bin-hadoop3]$ ll
total 124
drwxr-xr-x 2 hadoop hadoop  4096 Oct 21 10:29 bin
drwxr-xr-x 2 hadoop hadoop   260 Nov 25 17:00 conf
drwxr-xr-x 5 hadoop hadoop    50 Oct 21 10:29 data
drwxr-xr-x 4 hadoop hadoop    29 Oct 21 10:29 examples
drwxr-xr-x 2 hadoop hadoop 12288 Oct 21 10:29 jars
drwxr-xr-x 4 hadoop hadoop    38 Oct 21 10:29 kubernetes
-rw-r--r-- 1 hadoop hadoop 22982 Oct 21 10:29 LICENSE
drwxr-xr-x 2 hadoop hadoop  4096 Oct 21 10:29 licenses
drwxrwxr-x 2 hadoop hadoop     6 Nov 25 17:00 logs
-rw-r--r-- 1 hadoop hadoop 57842 Oct 21 10:29 NOTICE
drwxr-xr-x 9 hadoop hadoop   311 Oct 21 10:29 python
drwxr-xr-x 3 hadoop hadoop    17 Oct 21 10:29 R
-rw-r--r-- 1 hadoop hadoop  4605 Oct 21 10:29 README.md
-rw-r--r-- 1 hadoop hadoop   166 Oct 21 10:29 RELEASE
drwxr-xr-x 2 hadoop hadoop  4096 Oct 21 10:29 sbin
drwxr-xr-x 2 hadoop hadoop    42 Oct 21 10:29 yarn

(8)将Spark安装包分发到其他节点(确保此时node1是hadoop用户)


scp -r /export/server/spark-3.4.4-bin-hadoop3/ node2:/export/server/

scp -r /export/server/spark-3.4.4-bin-hadoop3/ node3:/export/server/

scp -r /export/server/spark-3.4.4-bin-hadoop3/ node2:/export/server/
scp -r /export/server/spark-3.4.4-bin-hadoop3/ node3:/export/server/

(9)在 所有节点 (node1,node2,node3)配置Spark环境变量(node1记得切换为root用户

node1记得切换为root用户
vi /etc/profile
在文件尾加入:
export SPARK_HOME=/export/server/spark-3.4.4-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin

source /etc/profile
执行source /etc/profile使命令生效


 

vi /etc/profile
export SPARK_HOME=/export/server/spark-3.4.4-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
(9)效果如下
node1上

 node2上

 node3上

(10)在node1进入hadoop用户,先启动hadoop三件套,再启动spark(这里node1切换回hadoop用户)

启动hadoop三件套,并且在集群上创建目录/spark-logs
su - hadoop
start-dfs.sh
start-yarn.sh
hdfs dfs -mkdir /spark-logs
启动spark
cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./start-all.sh
(10)效果如下
启动hadoop三件套,并且在集群上创建目录/spark-logs
[root@node1 ~]# su - hadoop
Last login: Mon Nov 25 16:55:57 CST 2024 on pts/0
[hadoop@node1 ~]$ start-dfs.sh
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node1]
[hadoop@node1 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoop@node1 ~]$ hdfs dfs -mkdir /spark-logs
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 10 items
drwxrwx---   - hadoop supergroup          0 2024-11-03 16:09 /data
drwxr-xr-x   - hadoop supergroup          0 2024-11-14 15:49 /export
drwxr-xr-x   - hadoop supergroup          0 2024-11-17 19:50 /hbase
drwxr-xr-x   - hadoop supergroup          0 2024-11-13 14:39 /hdfs_api2
drwxr-xr-x   - hadoop supergroup          0 2024-11-14 15:56 /myhive2
drwxrwxrwx   - hadoop supergroup          0 2024-11-20 09:11 /output
drwxr-xr-x   - hadoop supergroup          0 2024-11-25 17:06 /spark-logs
-rw-r--r--   3 hadoop supergroup         92 2024-11-18 23:27 /test.txt
drwx-wx-wx   - hadoop supergroup          0 2024-11-03 17:07 /tmp
drwxr-xr-x   - hadoop supergroup          0 2024-11-03 17:05 /user
 启动spark后
node1上
[hadoop@node1 ~]$ cd /export/server/spark-3.4.4-bin-hadoop3/sbin
[hadoop@node1 sbin]$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
node3: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out
node2: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out
[hadoop@node1 sbin]$ jps
13184 DataNode
14624 NodeManager
12981 NameNode
13637 SecondaryNameNode
17029 Jps
14422 ResourceManager
15099 WebAppProxyServer
16717 Master
[hadoop@node1 sbin]$ 
node2上
[root@node2 ~]# jps
13142 DataNode
16856 Jps
16541 Worker
14287 NodeManager
[root@node2 ~]# 
node3上
[root@node3 ~]# jps
17138 Jps
14435 NodeManager
13309 DataNode
16686 Worker
[root@node3 ~]# 

(11)查看客户端(直接在浏览器输入 node1:8080 也行)
 

http://node1:8080

 

(12)关闭spark和hadoop

cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./stop-all.sh
cd
stop-yarn.sh
stop-dfs.sh
jps
效果如下 
[hadoop@node1 sbin]$ cd /export/server/spark-3.4.4-bin-hadoop3/sbin
[hadoop@node1 sbin]$ ./stop-all.sh
node2: stopping org.apache.spark.deploy.worker.Worker
node3: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
[hadoop@node1 sbin]$ cd
[hadoop@node1 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
Stopping proxy server [node1]
[hadoop@node1 ~]$ stop-dfs.sh
Stopping namenodes on [node1]
Stopping datanodes
Stopping secondary namenodes [node1]
[hadoop@node1 ~]$ jps
45472 Jps
[hadoop@node1 ~]$ 

3.总结:

如果遇到了报错,一般都是/export/server/spark-3.4.4-bin-hadoop3/sbin目录权限的问题。

平时启动hadoop和spark,直接复制以下命令然后在node1上面执行即可。

su - hadoop
start-dfs.sh
start-yarn.sh
cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./start-all.sh

如果要关闭hadoop和spark,直接复制以下命令然后在node1上面执行即可。(上面也有)

cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./stop-all.sh
cd
stop-yarn.sh
stop-dfs.sh
jps

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值