系列文章地址
Centos 7 环境 hadoop 3.2.0 完全分布式集群搭建
Centos 7 环境 Spark 2.4.3 完全分布式集群的搭建过程
Centos 7 环境 HBase 2.1.5 完全分布式集群的搭建过程
Centos 7 环境 Storm 2.0.0 完全分布式集群的搭建过程
注意:Spark 集群的搭建是在hadoop 集群搭建基础之上完成的,首先要搭建完成hadoop 集群,具体过程可以参考
Centos 7 下hadoop 3.2.0 完全分布式集群搭建
一 集群规划
master | worker | |
centos48(10.0.0.48) | Y | Y |
centos49(10.0.0.49) | Y | |
centos50(10.0.0.50) | Y |
二 下载部署Sark 并做相关的配置
1. 配置 spark-env.sh
cd /usr/local
wget http://mirror.bit.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz
cd /usr/local/spark-2.4.3-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
#文件末尾加入以下内容
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HADOOP_CONF_DIR=/usr/local/hadoop-3.2.0/etc/hadoop
export SPARK_MASTER_HOST=centos48
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1g
export SPARK_MASTER_WEBUI_PORT=8088
2. 配置 slaves
cp slaves.template slaves
vi slaves
#文件末尾加入以下内容
centos48
centos49
centos50
3. 重命名 start-all.sh stop-all.sh
start-all.sh, stop-all.sh hadoop 下原来也有,为了避免冲突,需要重名令 spark 下的文件
cd /usr/local/spark-2.4.3-bin-hadoop2.7/sbin
mv start-all.sh start-spark-all.sh
mv stop-all.sh stop-spark-all.sh
4. 复制spark 到另外两台机器
cd /usr/local
scp -r ./spark-2.4.3-bin-hadoop2.7 root@centos49:/usr/local
scp -r ./spark-2.4.3-bin-hadoop2.7 root@centos50:/usr/local
三 启动spark ,验证
1. 修改环境变量 (3台机器都要做修改)
vi /etc/profile
#文件末尾加入以下内容
export SPARK_HOME=/usr/local/spark-2.4.3-bin-hadoop2.7
export PATH=$SPARK_HOME/sbin:$PATH
2.启动 spark
start-spark-all.sh
3 查看进程
centos48 机器上
centos 49 机器上
centos50 机器上
4. 通过页面查看
http://10.0.0.48:8088
5. 执行example 程序,并在hadoop YARN 上查看
1. 执行 exmaple 程序
cd /usr/local/spark-2.4.3-bin-hadoop2.7/bin
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --executor-memory 1G --num-executors 10 /usr/local/spark-2.4.3-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.3.jar 100
2019-08-21 16:24:41,455 INFO scheduler.TaskSetManager: Finished task 96.0 in stage 0.0 (TID 96) in 113 ms on centos48 (executor 1) (99/100)
2019-08-21 16:24:41,594 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on centos49:17162 (size: 1256.0 B, free: 366.3 MB)
2019-08-21 16:24:41,842 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 2204 ms on centos49 (executor 10) (100/100)
2019-08-21 16:24:41,846 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 4.676 s
2019-08-21 16:24:41,844 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-08-21 16:24:41,860 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 5.008590 s
Pi is roughly 3.1414335141433516
2019-08-21 16:24:41,895 INFO server.AbstractConnector: Stopped Spark@22ee2d0{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-08-21 16:24:41,917 INFO ui.SparkUI: Stopped Spark web UI at http://centos48:4040
2019-08-21 16:24:42,113 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
2019-08-21 16:24:42,148 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2019-08-21 16:24:42,149 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2019-08-21 16:24:42,178 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
2019-08-21 16:24:42,178 INFO cluster.YarnClientSchedulerBackend: Stopped
2019-08-21 16:24:42,189 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2019-08-21 16:24:42,299 INFO memory.MemoryStore: MemoryStore cleared
2019-08-21 16:24:42,300 INFO storage.BlockManager: BlockManager stopped
2019-08-21 16:24:42,321 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2019-08-21 16:24:42,324 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2019-08-21 16:24:42,360 INFO spark.SparkContext: Successfully stopped SparkContext
2019-08-21 16:24:42,511 INFO util.ShutdownHookManager: Shutdown hook called
2019-08-21 16:24:42,511 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0a93cd55-de2a-466b-a398-ec8da81f6ebc
2019-08-21 16:24:42,517 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-cfe65620-2d79-4bdd-9e6b-fb34e069ad45
2. 在hadoop YARN 上可以看到本次任务