安装spark standalone mode

最新推荐文章于 2024-04-03 19:54:49 发布

转载最新推荐文章于 2024-04-03 19:54:49 发布 · 609 阅读

cloud 专栏收录该内容

234 篇文章

订阅专栏

本文详细介绍了如何在本地环境中安装并配置Spark集群，包括安装JDK、Scala，构建Spark源码，配置环境变量，启动Master和Slave节点，并通过Web UI进行监控。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考：

http://spark.incubator.apache.org/docs/latest/

http://spark.incubator.apache.org/docs/latest/spark-standalone.html

http://www.yanjiuyanjiu.com/blog/20130617/

1.安装JDK

2.安装scala 2.9.3

Spark 0.7.2 依赖 Scala 2.9.3, 我们必须要安装Scala 2.9.3.

下载 scala-2.9.3.tgz 并保存到home目录(已经在sg206上）.
$ tar -zxf scala-2.9.3.tgz
$ sudo mv scala-2.9.3 /usr/lib
$ sudo vim /etc/profile
# add the following lines at the end
export SCALA_HOME=/usr/lib/scala-2.9.3
export PATH=$PATH:$SCALA_HOME/bin
# save and exit vim
#make the bash profile take effect immediately
source /etc/profile
# test
$ scala -version

3.building spark

cd /home

tar -zxf spark-0.7.3-sources.gz

cd spark-0.7.3

sbt/sbt package (需要git环境 yum install git）

4.配置文件

spark-env.sh

############

export SCALA_HOME=/usr/lib/scala-2.9.3
export SPARK_MASTER_IP=172.16.48.202
export SPARK_WORKER_MEMORY=10G

#############

slaves

将从节点IP添加至slaves配置文件

5.启动和停止

bin/start-master.sh - Starts a master instance on the machine the script is executed on.
bin/start-slaves.sh - Starts a slave instance on each machine specified in the conf/slaves file.
bin/start-all.sh - Starts both a master and a number of slaves as described above.
bin/stop-master.sh - Stops the master that was started via the bin/start-master.sh script.
bin/stop-slaves.sh - Stops the slave instances that were started via bin/start-slaves.sh.
bin/stop-all.sh - Stops both the master and the slaves as described above.

6. 浏览master的web UI(默认http://localhost:8080). 这是你应该可以看到所有的word节点，以及他们的CPU个数和内存等信息。