环境搭建
hadoop+spark
前期准备
- 配置免密登录
生成密钥:ssh-keygen -t rsa
添加密钥 cat ~/id_ras.pub >> ~/authorized_keys
scp 传输同步到其他节点 scp 文件 user@hostname:路径
exp: scp scp ~/.ssh/id_rsa.pub root@root:~ - 配置hosts vim /etc/hosts
- 安装jdk1.8 离线安装命令为 rpm -ivh java-1.8.0-openjdk-devel-1.8.0.161-2.b14.el7.x86_64.rpm
安装步骤
Hadoop
- 下载安装包,上传到主机服务器
- 在服务器将安装包解压到/usr/local/目录下
命令: tar -zxf /root/Download/hadoop.tar.gz -C /usr/local - 重命名 命令:mv ./hadoop2.7/ ./hadoop
- 编辑~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH = $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/ sbin
source ~/.bashrc 让配置生效
5. 进入/usr/local/hadoop/etc/hadoop目录中修改配置
注:下文配置中的master为主节点的主机名
slaves 写入主机名
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
- 将文件发送至从节点
cd /usr/local/
rm -rf ./hadoop/tmp # 删除临时文件
rm -rf ./hadoop/logs/* # 删除日志文件
tar -zcf ~/hadoop.master.tar.gz ./hadoopcd ~
scp ./hadoop.master.tar.gz slave01:/home/hadoop
scp ./hadoop.master.tar.gz slave02:/home/hadoop
- 在从节点上执行
sudo rm -rf /usr/local/hadoop/
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
- 启动hadoop集群
cd /usr/local/hadoop
bin/hdfs namenode -formatsbin/start-all.sh
Spark
- 下载spark安装包
- 解压spark到/usr/local/
- 改名
- 配置环境变量
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PYSPARK_PYTHON=python3
source ~/.bashrc
5. 修改配置
slaves
cd /usr/local/spark/
cp ./conf/slaves.template ./conf/slaves
spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=192.168.1.104
6.复制文件到从节点机器上
cd /usr/local/
tar -zcf ~/spark.master.tar.gz ./sparkcd ~
scp ./spark.master.tar.gz slave01:/home/hadoop
scp ./spark.master.tar.gz slave02:/home/hadoop
7.执行操作
sudo rm -rf /usr/local/spark/sudo
tar -zxf ~/spark.master.tar.gz -C /usr/local
8.启动集群
先启动hadoop,再启动spark master ,再启动spark worker
master
cd /usr/local/spark/
sbin/start-master.sh
worker
sbin/start-slave.sh --master spark://master:7077