虚拟机搭建hadoop集群-虚拟机搭建hadoop集群

虚拟机搭建hadoop集群

创建三台虚拟机

  1. 使用vmware创建三台虚拟机centos1、centos2、centos3
  2. 配置虚拟机网络,/etc/sysconfig/network-scripts/ifcfg-ens32
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens32
UUID=98710bbd-6435-43b8-97ae-73532d6fe3b8
DEVICE=ens32
ONBOOT=yes
IPADDR=192.168.8.4
PREFIX=24
GATEWAY=192.168.8.2
DNS1=192.168.8.2l
  1. 修改hostname,/etc/hostname
  2. 修改hosts文件,/etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.8.1 rufeng
192.168.8.3 centos1
192.168.8.4 centos2
192.168.8.5 centos3

集群实用脚本

cluster_all.sh

#!/bin/bash

# /opt/cluster_bin/cluster_all.sh

if [ -z "$1" ]
then
        echo "error arg!"
        exit 1
fi

command=$(echo $* | cut -d " " -f 1-)

hosts=($(cut -d " " -f 1 /opt/cluster_nodes))

for host in ${hosts[@]}
do
        echo "========== $host ============"
        ssh $host $command
done

/opt/cluster_nodes文件保存了所有节点的host

同步文件xsync.sh

#!/bin/bash

if [ $# -lt 1 ]
then
	echo "error arg"
	exit 1
fi

hosts=($(cut -d " " -f 1 /opt/cluster_nodes))

	
for host in ${hosts[@]}
do
	echo "=========== $host ============"	
	for file in $@
	do
		if [ ! -e "$file" ]
		then
			echo "file not exists: $file"
			break
		fi
			
		if [ -d "$file" ]
		then
			cur_dir=$(cd -P ${file}; pwd)
			pdir=$(cd -P $(dirname ${cur_dir}); pwd)
			fname=$(basename ${cur_dir})
		else
			pdir=$(cd -P $(dirname ${file}); pwd)
			fname=$(basename $file)
	 	fi

                if [ "$pdir" != "/" ]
                then
                        ssh $host mkdir -p $pdir
                fi

                #echo "ssh $host mkdir -p $pdir"
                #echo "rsync -av $pdir/$fname $host:$pdir"
                rsync --delete -av $pdir/$fname $host:$pdir
	done
done

配置虚拟机ssh互信

  1. 在root用户下配置集群互信
  2. 使用集群命令同步脚本创建用户bigdata,在bigdata用户下配置集群互信

集群互信配置参考

安装JDK、Hadoop并配置环境变量

# /etc/profile.d/bigdata.sh

export JAVA_HOME=/opt/jdk8
export SPARK_HOME=/opt/spark3.0
export HADOOP_HOME=/opt/hadoop3.2
export PATH=$PATH:/opt/cluster_bin:$JAVA_HOME/bin

集群规划

centos1centos2centos3
HDFSNameNode DataNodeDataNodeSecondaryNameNode DataNode
YARNNodeManagerResourceManager NodeManagerNodeManager

配置文件

$HADOOP_HOME/etc/hadoop

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos1:8020</value>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop3.2/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 atguigu -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>bigdata</value>
</property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos1:8020</value>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop3.2/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 atguigu -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>bigdata</value>
</property>
</configuration>
[bigdata@centos2 hadoop3.2]$ cat ./etc/hadoop/hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>centos1:9870</value>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>centos3:9868</value>
</property>
</configuration>

yarn-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定 MR 走 shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定 ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos2</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO
NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP
RED_HOME</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/hadoop3.2/etc/hadoop:/opt/hadoop3.2/share/hadoop/common/lib/*:/opt/hadoop3.2/share/hadoop/common/*:/opt/hadoop3.2/share/hadoop/hdfs:/opt/hadoop3.2/share/hadoop/hdfs/lib/*:/opt/hadoop3.2/share/hadoop/hdfs/*:/opt/hadoop3.2/share/hadoop/mapreduce/lib/*:/opt/hadoop3.2/share/hadoop/mapreduce/*:/opt/hadoop3.2/share/hadoop/yarn:/opt/hadoop3.2/share/hadoop/yarn/lib/*:/opt/hadoop3.2/share/hadoop/yarn/*</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://centos1:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>

其中yarn.application.classpath为hadoop classpath命令的值

mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定 MapReduce 程序运行在 Yarn 上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos1:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos1:19888</value>
</property>
</configuration>

workers

centos1
centos2
centos3

启动和测试集群

  1. 分发配置文件
  2. 在centos1格式化NameNode(第一次启动时需要),$HADOOP_HOME/bin hdfs namenode -format
  3. 在centos1启动hdfs,$HADOOP_HOME/sbin/start-dfs.sh
  4. 在centos2启动yarn,$HADOOP_HOME/sbin/start-yarn.sh
  5. 在centos1启动历史服务器,$HADOOP_HOME/bin/mapred --daemon start historyserver

注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化 NameNode的话,一定要先停止namenode和datanode进程,并且要删除所有机器的 data和logs目录,然后再进行格式化。

测试集群:$HADOOP_HOME/bin/hadoop jar jar文件 启动类 输入文件路径 输出路径

集群启动停止脚本

#!/bin/bash

# /opt/cluster_bin/myhadoop.sh

if [ ! $# -eq 1 ]
then
	echo "arg error!"
	exit 1
fi

if [ ! -e "$HADOOP_HOME" ]
then
	echo "HADOOP_HOME not set!"
	exit 1
fi

namenode="centos1"
resourcenode="centos2"

case $1 in
"start")
	echo "start hadoop cluster"
	
	echo "start hdfs on $namenode"
	ssh $namenode "${HADOOP_HOME}/sbin/start-dfs.sh"
	echo "start yarn on $resourcenode"
	ssh $resourcenode "${HADOOP_HOME}/sbin/start-yarn.sh"
	echo "start historyserver"
	ssh $namenode "${HADOOP_HOME}/bin/mapred --daemon start historyserver"
	;;
"stop")

	echo "stop hadoop cluster"
	
	echo "stop hdfs on $namenode"
	ssh $namenode "${HADOOP_HOME}/sbin/stop-dfs.sh"
	echo "stop yarn on $resourcenode"
	ssh $resourcenode "${HADOOP_HOME}/sbin/stop-yarn.sh"
	echo "stop historyserver"
	ssh $namenode "${HADOOP_HOME}/bin/mapred --daemon stop historyserver"
	;;
*)
	echo "arg error start or stop"
	;;
esac

常用端口号说明

名称Hadoop2.xHadoop3.x
NameNode内部通信端口8020/90008020/9000/9820
NameNode HTTP UI500709870
MapReduce查看任务执行80888088
历史服务器通信端口1988819888
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值