Spark Standalone Mode Configuration

本文详细介绍了如何从零开始在多台Linux Debian机器上配置Spark 2.0.2的独立模式集群。包括安装Java、Scala等前置条件,下载预编译的Spark版本并配置环境变量,设置无密码SSH访问,以及启动Spark集群的方法。

  For currently popular distributed framework Spark, here it shows the intro and steps to configure the spark standalone mode on several machines.

It is easy to configure it from stratch.  The following instruction I note down is based on the spark-2.0.2-bin-hadoop2.7 as example on the linux debian machines for scala programming.

Assume you have two machines with IP: 192.168.0.51 and 192.168.0.52 

1.  Preinstall java, scala, sbt

  check: https://www.scala-lang.org/download/install.html        

             http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

 

2. Download prebuilt spark version with hadoop. or you can compile on your own 

the link can be referenced: https://spark.apache.org/downloads.html

 

3. Unzip the file and create the link for easy visit later

e.g.   execute: ln -s /usr/local/spark-2.0.2-bin-hadoop2.7 /usr/local/spark
 
4. Configure the spark environments:
 (1) configure slaves file:   /usr/local/spark-2.0.2-bin-hadoop2.7/conf/slaves
# A Spark Worker will be started on each of the machines listed below.
192.168.0.51
192.168.0.52
 
(2) configure spar_env.sh.               e.g.
#spark-env.sh
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/home/local/jdk
#export SPARK_LOCAL_IP=localhost
export SPARK_EXECUTOR_MEMORY=6g
export SPARK_EXECUTOR_CORES=6
export SPARK_MASTER_IP=192.168.0.51
export SPARK_MASTER_PORT=8070
export SPARK_MASTER_WEBUI_PORT=8080
#export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_PORT=8092
#export SPARK_WORKER_MEMORY=4g
#export SPARK_WORKER_CORES=4
 
5.  Set up passwordless ssh access key
  (1) Generate ssh key without password
$ ssh-keygen -t rsa -P ""

(2) Copy id_rsa.pub to authorized-keys

$  cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

(3) Start ssh localhost           if you want to work in only one localhost machine for spark standalone

 $ ssh localhost
 
6. Start spark
$SPARK_HOME/sbin/start-all.sh
 execute  jps  to check worker and master have been up
 
7. Write and run your application
 
execute:  sbt package 
 
execute:  $SPARK_HOME/bin/spark-submit \
      --class "main.scala.MainAppTest" \
      --master local[4] \
      xxxxxxxx.jar 
 
 
 
 
 

转载于:https://www.cnblogs.com/anxin6699/p/7150048.html

#!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in any mode # - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G) # Options read in any cluster manager using HDFS # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # Options read in YARN client/cluster mode # - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN # Options for the daemons used in the standalone deploy mode # - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y") # - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g). # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y") # - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y") # - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers # Options for launcher # - SPARK_LAUNCHER_OPTS, to set config properties and Java options for the launcher (e.g. "-Dx=y") # Generic options for the daemons used in the standalone deploy mode # - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs) # - SPARK_LOG_MAX_FILES Max log files of Spark daemons can rotate to. Default is 5. # - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp) # - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER) # - SPARK_NICENESS The scheduling priority for daemons. (Default: 0) # - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file. # Options for native BLAS, like Intel MKL, OpenBLAS, and so on. # You might get better performance to enable these options if using native BLAS (see SPARK-21305). # - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL # - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS # Options for beeline # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli (e.g. "-Dx=y") # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G) #设置JAVA安装目录 JAVA_HOME=/export/server/jdk1.8.0_401 # HADOOP软件配置文件目录,读取HDFS上文件和运行YARN集群 HAD00P_CONF_DIR=/export/server/hadoop-3.3.6/etc/hadoop YARN_CONF_DIR=/export/server/hadoop-3.3.6/etc/hadoop #指定spark老大Master的IP和提交任务的通信端口 #告知Spark的master运行在哪个机器上 #export SPARK_MASTER_HOST=node1 #告知sparkmaster的通讯端囗 export SPARK_MASTER_PORT=7077 #告知spark master的 webui端端口 SPARK_MASTER_WEBUI_PORT=8080 # worker cpu可用核数 SPARK_WORKER_CORES=1 #worker可用内存 SPARK_WORKER_MEMORY=1g # worker的工作通讯地址 SPARK_WORKER_PORT=7078 # worker的 webui地址 SPARK_WORKER_WEBUI_PORT=8081 #设置历史服务器 #配置的意思是将spark程序运行的历史日志存到hdfs的/sparklog文件夹中 SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://node1:8020/sparklog/ -Dspark.history.fs.cleaner.enabled=true" SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181 -Dspark.deploy.zookeeper.dir=/spark-ha" # spark.deploy.recoveryMode 指定HA模式基于zookeeper实现 #指定Zookeeper的连接地址#指定在zookeeper中注册临时节点的路径 在这个spark-env中如何把8080网址中的spark链接链接编程主机名node1:7077,node2:7077,node3:7077
08-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值