hadoop ha 分布式高可用集群的爬坑之旅

本文详细介绍了在Hadoop集群中配置高可用HA的过程,包括HDFS和YARN的HA配置步骤,解决了配置过程中遇到的常见问题,并分享了配置成功后的启动与验证经验。

看到好多关于hadoop ha的介绍,挺诱人,中间耽搁了一段时间,现在终于把它做了,我在原先的集群之上,花了5个多小时根据官网把hadoop ha 配置了,以下是这次的总结:

配置
启动
查看

HDFS ha 的配置

hdfs-site.xml

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!--nameservice名 下面保持一致-->
<property>
  <name>dfs.nameservices</name>
  <value>mztt</value>
</property>
<!--配置几个namenode hadoop3.x 说是可以配置多个-->
<property>
  <name>dfs.ha.namenodes.mztt</name>
  <value>nn1,nn2</value>
</property>
<!-- RPC 通信地址-->
<property>
  <name>dfs.namenode.rpc-address.mztt.nn1</name>
  <value>mztt1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mztt.nn2</name>
  <value>mztt2:8020</value>
</property>
<!-- web 地址-->
<property>
  <name>dfs.namenode.http-address.mztt.nn1</name>
  <value>mztt1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mztt.nn2</name>
  <value>mztt2:50070</value>
</property>
<!-- edits 的位置-->
<property>
  	<name>dfs.namenode.shared.edits.dir</name>
  	<value>qjournal://mztt1:8485;mztt2:8485;mztt3:8485/mztt</value>
</property>

<property>
 	<name>dfs.client.failover.proxy.provider.mztt</name>
 	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--隔离-->
<property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
</property>

<property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!--隔离的超时时间 -->
<property>
      <name>dfs.ha.fencing.ssh.connect-timeout</name>
      <value>30000</value>
</property>
<!--journalnode 数据的存放目录-->
<property>
       <name>dfs.journalnode.edits.dir</name>
       <value>/opt/data/journal</value>
</property>
<!--自动切换-->
<property>
  	 <name>dfs.ha.automatic-failover.enabled</name>
  	 <value>true</value>
</property>

</configuration>

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<!--默认路径前缀-->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://mztt</value>
	</property>

<!--zookeeper 的节点-->
	<property>
   		<name>ha.zookeeper.quorum</name>
  	        <value>mztt1:2181,mztt2:2181,mztt3:2181</value>
 	</property>
<!--io 缓存-->	
<property>
		<name>io.file.buffer.size</name>
		<value>8192</value>
	</property>

<!--临时目录-->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/data/tmp</value>
	</property>


</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!--MR方式设为yarn-->
<property>
      
		<name>mapreduce.framework.name</name>

        <value>yarn</value>

</property>

<!--历史服务和 web 地址-->
<property>

        <name>mapreduce.jobhistory.address</name>

        <value>mztt1:10020</value>

</property>

 
 <property>

        <name>mapreduce.jobhistory.webapp.address</name>

        <value>mztt1:19888</value>

</property>

</configuration>

YARN ha 的配置

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
	<property>
 		 <name>yarn.nodemanager.aux-services</name>
 	 	<value>mapreduce_shuffle</value>
	</property>
<!--<property>
    
	<name>yarn.log-aggregation-enable</name>
    
	<value>true</value>
 
</property>

<property>
    
	<name>yarn.log-aggregation.retain-seconds</name>
   
	<value>106800</value>
 
</property> -->
<!--开启yarn ha-->
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>


<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>cluster1</value>
</property>
<!--rm 的名字-->
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<!--指定resourcemanager运行的节点-->
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>mztt1</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>mztt2</value>
</property>
<!--web 端地址-->
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>mztt1:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>mztt2:8088</value>
</property>
<!--zk 地址-->
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>mztt1:2181,mztt2:2181,mztt3:2181</value>
</property>

</configuration>

启动

启动这地方挺坑,第一次最好照这个顺序来。

  1. 启动zookeeper
//有环境变量 
./zkServer.sh start
  1. 启动 journalnode
hadoop-daemon.sh start journalnode
  1. 先格式化一个namenode,任意一个都行
	hdfs name -format
  1. 将2个namenode 启动
hadoop-daemon.sh start namenode
  1. 同步2个namenode的数据
hdfs namenode -bootstrapStandby
  1. 格式化zkfc
hdfs zkfc -formatZK

查看

jps 看一下进程发现都起来了:

8240 DataNode
14032 Jps
8065 JournalNode
8003 QuorumPeerMain
9238 NodeManager
8166 NameNode
8427 DFSZKFailoverController
8669 ResourceManager

然后去网页查看,2个namenode都能访问,只不过一个是standby,另一个是 active,然后kill了一个,试试是不是跟介绍的一样能自动切换。
5分钟过去了,standby还是没有切换,起初我以为是切换可能需要时间或者我配置有误,
把原先集群的历史服务什么的都给注了,还是不行。
最后,百度一圈,找到了解决,我系统是centOS 7.6,装的时候是最小安装,说是少了一个fuser,开始装:


yum search fuser


#已加载插件:fastestmirror
Determining fastest mirrors
 * base: mirrors.huaweicloud.com
 * extras: mirrors.huaweicloud.com
 * updates: mirrors.cn99.com
============================================== 匹配:fuser ==============================================
psmisc.x86_64 : Utilities for managing processes on your system

# 发现在psmisc里
yum install psmisc

装完之后,重启集群OK了。

<think>嗯,用户现在的问题是希望在Hadoop完全分布式集群搭建Kafka高可用集群。首先,需要理解用户的场景:用户可能正在搭建一个大数据处理平台,需要集成Kafka实现实时数据流处理,同时要求高可用性。用户引用的资料里提到SSH免密登录、Hadoop集群配置、Zookeeper集群搭建等,说明用户已经有基本环境,但可能对Kafka集成部分不太熟悉。用户的需求明确,但深层需求可能是确保Kafka在Hadoop集群上稳定运行,避免单点故障,并能够和现有组件如Hive、Spark等协同工作。用户提供的引用中有提到Zookeeper、Flink、HiveServer2等,推测用户可能计划构建包含流处理和批处理的综合系统,所以Kafka的高可用性至关重要。接下来需要确定用户的环境:从引用看,用户似乎已经部署了Hadoop集群(包括NameNode和ResourceManager的高可用),Zookeeper集群也已搭建(引用[2][3])。因此,步骤应该从验证Zookeeper状态开始,因为Kafka依赖Zookeeper进行协调和元数据管理。Kafka高可用集群的核心在于多Broker配置,每个Broker需要唯一的ID和相同的Zookeeper连接地址。配置文件中必须设置监听地址、日志存储路径、副本数等关键参数。分发配置文件到所有节点并启动Kafka时,需要注意启动顺序:先Zookeeper,后Kafka。用户可能遇到的常见问题包括端口冲突、磁盘权限不足、副本配置错误导致高可用失效。因此,准备在相关问题中补充这些运维知识点。最后,测试部分需要覆盖创建多分区Topic、生产者-消费者验证、模拟故障转移等场景,确保高可用机制生效。还需要考虑和现有Hadoop组件的兼容性,比如HDFS路径权限(如果Kafka日志存HDFS)、防火墙设置等。引用[4]提到Spark启动命令,用户可能有流计算需求,所以应该提醒后续集成SparkStreaming或Flink的配置要点。</think> 以下是基于 Hadoop 完全分布式集群搭建 Kafka 高可用集群的详细步骤(需确保 Hadoop 集群、Zookeeper 集群SSH 免密登录已就绪): --- ### **一、前置条件** 1. **Hadoop 集群** - 确认 HDFS/YARN 高可用已启用(如 NameNode HA、ResourceManager HA)[^3] 2. **Zookeeper 集群** - 至少 3 节点(例如 `hadoop01,hadoop02,hadoop03:2181`)[^2][^5] - 验证状态:`zkServer.sh status` 3. **Java 环境** - 所有节点安装 JDK 8+,配置 `JAVA_HOME`(参考 `hadoop-env.sh` 设置)[^1][^3] 4. **SSH 免密登录** - 确保集群节点间可无密码互访[^1] --- ### **二、Kafka 集群搭建步骤** #### **1. 下载并分发 Kafka** ```bash # 在 hadoop01 操作 wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz tar -zxvf kafka_2.13-3.6.1.tgz -C /usr/local/ scp -r /usr/local/kafka_2.13-3.6.1 hadoop02:/usr/local/ scp -r /usr/local/kafka_2.13-3.6.1 hadoop03:/usr/local/ ``` #### **2. 配置 Kafka 高可用** 编辑每台节点的 `server.properties`(路径:`$KAFKA_HOME/config/server.properties`): ```properties # 唯一 Broker ID(不同节点不同ID) broker.id=1 # hadoop01=1, hadoop02=2, hadoop03=3 # 监听地址(内网IP) listeners=PLAINTEXT://hadoop01:9092 # 日志存储目录(确保目录存在且可写) log.dirs=/data/kafka-logs # 建议使用独立磁盘 # Zookeeper 集群地址 zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181 # 高可用关键配置 default.replication.factor=3 # 副本数≥Broker数 min.insync.replicas=2 # 最小同步副本数 offsets.topic.replication.factor=3 transaction.state.log.replication.factor=3 ``` #### **3. 同步配置并设置环境变量** ```bash # 分发配置文件(以 hadoop01 为基准) scp /usr/local/kafka/config/server.properties hadoop02:/usr/local/kafka/config/ scp /usr/local/kafka/config/server.properties hadoop03:/usr/local/kafka/config/ # 所有节点添加环境变量 echo 'export KAFKA_HOME=/usr/local/kafka' >> /etc/profile echo 'export PATH=$PATH:$KAFKA_HOME/bin' >> /etc/profile source /etc/profile # [^4] ``` #### **4. 启动 Kafka 集群** ```bash # 所有节点依次启动(先启Zookeeper,再启Kafka) zkServer.sh start # 若Zookeeper未运行 kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties # 验证进程 jps | grep Kafka # 应看到 Kafka 进程 ``` #### **5. 测试高可用** ```bash # 创建 Topic(副本数=3) kafka-topics.sh --create \ --bootstrap-server hadoop01:9092 \ --replication-factor 3 \ --partitions 3 \ --topic test-ha # 查看 Topic 状态 kafka-topics.sh --describe \ --bootstrap-server hadoop01:9092 \ --topic test-ha # 模拟故障(关闭一个Broker后观察Leader切换) kafka-server-stop.sh # 在某个节点执行 ``` --- ### **三、关键配置说明** | **配置项** | **值示例** | **作用** | |------------|------------|----------| | `broker.id` | 1/2/3 | Broker 唯一标识 | | `zookeeper.connect` | `hadoop01:2181,...` | 集群协调服务地址 | | `default.replication.factor` | 3 | 数据默认副本数 | | `min.insync.replicas` | 2 | 最小同步副本数(保证数据安全) | | `log.dirs` | `/data/kafka-logs` | 高性能磁盘目录 | --- ### **四、常见问题** 1. **端口冲突** - 检查 `server.properties` 中的 `listeners` 端口是否被占用 2. **磁盘空间不足** - 监控 `/data/kafka-logs` 目录,设置日志清理策略: ```properties log.retention.hours=168 # 保留7天 log.retention.bytes=1073741824 # 1GB/分区 ``` 3. **副本不同步** - 确保 `min.insync.replicas < replication.factor` --- ### **相关配置参考** - **Zookeeper 集成**:Zookeeper 管理 Kafka Broker 元数据与选举[^5] - **Hadoop 协作**:Kafka 数据可下沉至 HDFS(需配置 `connect-hdfs` 插件) - **安全加固**:建议启用 SASL/SSL 认证(生产环境必备)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值