Spark3.1.3完全分布式

该文详细介绍了如何在Linux环境下安装Spark,包括上传和解压安装包,设置环境变量,修改配置文件,启动spark-shell,以及配置完全分布式集群。此外,还提到了Spark连接Hive的额外步骤和分发文件到其他节点的操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、安装spark

1. 使用Xshell工具上传spark-3.1.1-bin-hadoop3.2.tgz到/opt/software目录下:

3f3f27030824403a98998dd1ab01f03a.png

 2.在/opt/software目录下使用tar命令解压spark-3.1.1-bin-hadoop3.2.tgz到/opt/module目录下:

tar -zxvf spark-3.1.1-bin-hadoop3.2.tgz -C /opt/module

3.在/opt/module目录下修改文件夹名称:

mv spark-3.1.1-bin-hadoop3.2/ spark

4.打开环境变量:

vim /etc/profile

在底部添加如下配置:

#spark
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin

 5.将profile文件重新运行:

source /etc/profile

6. 拷贝spark-env.sh.template文件另存为spark-env.sh:

cp /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh

7.编辑spark-env.sh配置文件:

vim /opt/module/spark/conf/spark-env.sh

添加如下配置:

export JAVA_HOME=/opt/module/jdk
export SPARK_MASTER_IP=bigdata01
export SPARK_LOCAL_DIRS=/opt/module/spark
export HADOOP_CONF_DIR=/opt/module/hadoop/etc/hadoop

 8.启动spark:

./bin/spark-shell --master local[2]

 执行结果如下:

3a3ba42d829643dc805c57c3b7c45319.png

 spark安装成功!

 二、配置完全分布式

1. 拷贝workers.template文件另存为workers:

cp /opt/module/spark/conf/workers.template /opt/module/spark/conf/workers

2.编辑workers配置文件:

vim /opt/module/spark/conf/workers

添加如下配置:

bigdata01
bigdata02
bigdata03

3.使用Xshell工具上传mysql-connector-java-5.1.27-bin.jar到/opt/module/spark/jars:

77e43b20933e4f86b5681eb66db915c8.png

 4.分发文件:

scp -r /opt/module/spark @bigdata02:/opt/module
scp -r /opt/module/spark @bigdata03:/opt/module
scp -r /etc/profile @bigdata02:/etc/
scp -r /etc/profile @bigdata03:/etc/

5.启动spark集群:

spark-submit --master yarn --class org.apache.spark.examples.SparkPi /opt/module/spark/examples/jars/spark-examples_2.12-3.1.1.jar 

执行结果如下:

cf6705ea1a404f81be35f80723e49037.png

 集群启动成功!

注:spark连接hive还需要执行以下操作:

cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf
cp $HADOOP_HOME/etc/hadoop/core-site.xml $SPARK_HOME/conf 
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $SPARK_HOME/conf

 

 

### Hadoop 3.1.3 Pseudo-Distributed Installation and Configuration Tutorial In a pseudo-distributed mode, Hadoop runs on a single machine but simulates the cluster environment by running multiple daemons (NameNode, DataNode, ResourceManager, NodeManager) as separate Java processes. This setup is useful for testing and development purposes. #### Environment Preparation Ensure that Java is installed since Hadoop requires it to run. Verify this with `java -version`. Also, set up SSH access so you can start Hadoop services without entering passwords manually each time[^1]. #### Downloading and Extracting Hadoop Download the Hadoop package from either the official website or an alternative source like Baidu Cloud Disk. After downloading `hadoop-3.1.3.tar.gz`, extract it into `/usr/local` using: ```bash sudo tar -zxf ~/下载/hadoop-3.1.3.tar.gz -C /usr/local # Unpack under /usr/local directory. cd /usr/local/ sudo mv ./hadoop-3.1.3/ ./hadoop # Rename folder to &#39;hadoop&#39;. sudo chown -R $USER:$USER ./hadoop # Change ownership of files. ``` Adjust `$USER` according to your username if necessary[^2]. #### Configuring Core-Site.xml Edit the core-site.xml file located at `/usr/local/hadoop/etc/hadoop/core-site.xml`. Add these properties inside `<configuration>` tags: ```xml <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> ``` This setting specifies how clients connect to NameNode when accessing HDFS resources. #### Configuring HDFS-Site.xml Modify hdfs-site.xml found similarly within etc/hadoop/. Insert below configurations between existing `<configuration>` elements: ```xml <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/user/data/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/user/data/data</value> </property> ``` These settings define replication factor along with directories where metadata and actual data will reside locally during operation. #### Formatting the Namenode Before starting any service, format the namenode first via command line interface: ```bash $HADOOP_HOME/bin/hdfs namenode -format ``` Confirm successful completion before proceeding further. #### Starting Services Start all required services through scripts provided in bin folders: ```bash start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh start historyserver ``` Check whether everything started correctly by visiting http://localhost:50070/, which should display Web UI pages related to HDFS operations. #### Verifying Setup Run example MapReduce jobs included out-of-the-box to verify proper functioning after completing above steps successfully. --related questions-- 1. What are common issues encountered while configuring Hadoop in pseudo-distributed mode? 2. How does one configure SSH key-based authentication for passwordless login used in Hadoop setups? 3. Can other versions besides Hadoop 3.1.3 be configured following similar procedures outlined here? 4. Where do logs get stored upon encountering errors post-installation? 5. Is there support available online specifically targeting troubleshooting tips regarding Hadoop installations?
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值