CRS-4402(Doc ID 1212703.1)

本文探讨了Oracle Grid Infrastructure 11.2.0.2及后续版本中未启用多播导致的问题,包括安装失败、集群同步服务启动失败等,并提供了故障验证步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



In this Document

Symptoms

Changes

Cause

Solution

  Scalability RAC Community

References


APPLIES TO:


Oracle Database - Enterprise Edition - Version 11.2.0.2 to 12.1.0.1 [Release 11.2 to 12.1]

Information in this document applies to any platform.

This issue impacts environments that do not have multicast enabled for the private network in the following situations:


New installations of Oracle Grid Infrastructure 11.2.0.2 where multicast is not enabled on 230.0.1.0

Upgrades to Oracle Grid Infrastructure 11.2.0.2 from a pre-11.2.0.2 release where multicast is not enabled on 230.0.1.0 or 224.0.0.251

Installation of GI PSU 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 where multicast is not enabled on 230.0.1.0 or 224.0.0.251

Installation or upgrade to 12.1.0.1.0 where multicast is not enabled on 230.0.1.0 or 224.0.0.251



SYMPTOMS


If multicast based communication is not enabled as required either on the nodes of the cluster or on the network switches used for the private interconnect, the root.sh, which is called as part of a fresh installation of Oracle Grid Infrastructure 11.2.0.2, or the rootupgrade.sh (called as part of an upgrade to Oracle Grid Infrastructure 11.2.0.2) will only succeed on the first node of the cluster, but will fail on subsequent nodes with the symptoms shown below:


CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating

An active cluster was found during exclusive startup, restarting to join the cluster

Failed to start Oracle Clusterware stack

Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/crs/11.2.0.2/crs/install/crsconfig_lib.pm line 1016.

/u01/app/crs/11.2.0.2/perl/bin/perl -I/u01/app/crs/11.2.0.2/perl/lib -I/u01/app/crs/11.2.0.2/crs/install /u01/app/crs/11.2.0.2/crs/install/rootcrs.pl execution failed

 


Note:  The symptoms will be the same whether an upgrade or a fresh installation of Oracle Grid Infrastructure 11.2.0.2 is performed; so will be the required diagnostics. This issue is also documented in the Oracle Database Readme 11g Release 2 Section 2.39 - "Open Bugs" under BUG: 9974223. 

 


Note: This issue also impacts the following 11.2.0.3 PSUs: 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 as well as 12.1.0.1 installations if multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses (one of the 2 must be enabled/functional). With 11.2.0.3 GI was enhanced to utilize broadcast or multicast to bootstrap. However the 11.2.0.3.5 GI PSU introduced a new issue that effectively disables the broadcast functionality (Bug 16547309). 



Symptom verification 

To verify that Oracle CSS daemon fails to start in clustered mode due to a multicasting issue on the network, the ocssd.log file (located under $GI_HOME/log/<nodename>/cssd/ocssd.log) must be reviewed. In case, joining the cluster fails because of such an issue, the following can be observed: 


1. When CSS starts in clustered mode to join an existing cluster, we will see an entry in the CSSD log indicating that CSS will attempt to establish communication with a peer in the cluster. For this analysis, we see in the CSSD log for node2 that communication is attempted with node1, which looks similar to:  


 


2010-09-16 23:13:14.862: [GIPCHGEN][1107937600] gipchaNodeCreate: adding new node 0x2aaab408d4a0 { host 'node1', haName 'CSS_ttoprf10cluster', srcLuid 54d7bb0e-ef4a0c7e, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 9563084, flags 0x0 }

2.  Shortly after the above log entry we will see an attempt to establish communication to node1 from node2 via multicast address 230.0.1.0, port 42424 on the private interconnect (here: 192.168.1.2):


2010-09-16 23:13:14.862: [GIPCHTHR][1106360640] gipchaWorkerUpdateInterface: created remote interface for node 'node1', haName 'CSS_mycluster', inf 'mcast://230.0.1.0:42424/192.168.1.2'


3.  If the communication can be established successfully, we will see a log entry on node2 containing "gipchaLowerProcessAcks: ESTABLISH finished" for the peer node (node1). If the communication cannot be established, we will not see this log entry. Instead, we will see an entry indicating that the network communication cannot be established. This entry will look similar to the one shown below:


2010-09-16 23:13:15.839: [ CSSD][1087465792]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 180134562, wrtcnt, 8627, LATS 9564064, lastSeqNo 8624, uniqueness 1284701023, timestamp 1284703995/10564774


The above log entry indicates that CSSD is unable to establish network communication on the interface used for the private interconnect. In this particular case, the issue was that multicast communication on the 230.0.1.0 IP was blocked on the network used as the private interconnect.


 


 


CHANGES


New installations of Oracle Grid Infrastructure 11.2.0.2

Upgrades of a previous release to Oracle Grid Infrastructure 11.2.0.2

Installation of the 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 GI PSUs where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses

New installations of Oracle Grid Infrastructure 12.1.0.1 where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses

Upgrades of a previous release to Oracle Grid Infrastructure 12.1.0.1 where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses

Note: 11.2.0.4 Grid Infrastructure is not impacted by this issue.

 


CAUSE


Assuming that Cluster Verify (cluvfy) has succeeded regarding the network checks on all nodes of the cluster or the symptoms described above are observed as part of an upgrade to Oracle Grid Infrastructure 11.2.0.2 (which means that the current release does not encounter such communication issues), this issue is probably caused by multicast not being enabled on the network used as the private interconnect.


Background information for 11.2.0.2

Oracle Grid Infrastructure 11.2.0.2 introduces a new feature called "Redundant Interconnect Usage", which provides an Oracle internal mechanism to make use of physically redundant network interfaces for the Oracle (private) interconnect. As part of this new feature, multicast based communication on the private interconnect is utilized to establish communication with peers in the cluster on each startup of the stack on a node. Once the connection with the peers in the cluster has been established, the communication is switched back to unicast. Per default, the 230.0.1.0 address (port 42424) on the private interconnect network is used for multicasting. Another IP can be enabled using the patch mentioned below, if it is determined that using the 230.0.1.0 IP causes the multicast communication to fail. Multicasting on either of these IPs and the respective port must, however, be enabled and functioning across the network and on each node meant to be part of the cluster. If multicasting is not enabled as required, nodes will fail to join the cluster with the symptoms discussed.


Background information for 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 GI PSUs and 12.1.0.1

With 11.2.0.3 GI was enhanced to utilize broadcast or multicast (on 230.0.1.0 or 224.0.0.251 addresses) to bootstrap. However the 11.2.0.3.5 GI PSU introcuces a new issue with effectivly disables the broadcast functionality (Bug 16547309).  Do note that most networks do support multicast on the 224.0.0.251 multicast address without any special configuration, therefore the odds of this being an issue for 11.2.0.3.5 - 11.2.0.3.7 and 12.1.0.1 are greatly reduced.


Note:  The Oracle CSS daemon may fail to establish network communication with peer nodes for other reasons than multicast not working as required on the private interconnect, which is discussed in this note. Therefore, refer to Note: 1054902.1 for general network communication troubleshooting, if you determine that multicasting is not the root cause for such issues on your system.

 


转载于:https://my.oschina.net/1272149624/blog/620653

<think>好的,我现在需要解释一下用户提供的Kafka控制台消费者命令。首先,我要分解每个参数的作用,确保自己理解正确。 用户使用的命令是kafka-console-consumer.sh,这是Kafka自带的命令行工具,用于从主题中消费消息。接下来是--bootstrap-server参数,后面跟着IP和端口,应该是Kafka集群的地址。这里要注意IP是否正确,端口是否开放,可能涉及网络配置问题。 然后是--consumer.config,指定了一个配置文件路径。这个文件应该包含消费者的配置,比如SASL认证信息,尤其是如果Kafka启用了安全机制的话。需要确认该路径是否存在,权限是否足够,以及配置文件内容是否正确,比如是否有用户名、密码、安全协议等。 --topic参数后面跟着的是主题名称crs2-zy-cdn,需要确保该主题确实存在,并且用户有权限访问。可能需要检查主题名称是否拼写正确,或者是否有命名空间的问题。 --max-messages 5表示只消费5条消息然后退出。这个参数适用于测试或查看少量数据的情况,但要注意如果消费者组已经提交了偏移量,再次运行可能会从上次的位置继续,不过这里可能没有指定消费者组,所以默认可能是新的消费者组,每次重新开始。 接下来要确认各个参数组合后的整体效果。命令的作用是从指定主题消费5条消息,使用提供的配置文件进行认证和配置。可能的常见问题包括配置错误、网络不可达、主题不存在、认证失败等。需要提示用户检查这些方面,比如确认配置文件中的SASL设置,检查网络连接,确认主题是否存在,以及权限是否正确。 另外,可能需要解释SASL.properties的内容,比如是否有sasl.mechanism、security.protocol等关键配置项。如果用户遇到连接问题,建议他们验证这些配置是否正确,或者尝试用其他工具如kcat测试连接。 最后,确保解释清晰,分步骤说明每个参数的作用,以及可能遇到的问题和解决方法,帮助用户理解命令的用途和执行过程。</think>以下是对该命令的逐步解释: 1. **命令主体** `kafka-console-consumer.sh` 是 Kafka 自带的命令行工具,用于从指定主题消费消息。 2. **关键参数解析** - `--bootstrap-server 10.104.170.104:19090` 指定 Kafka 集群的连接地址,格式为 `IP:端口`。此处表示目标 Kafka Broker 的 IP 是 `10.104.170.104`,监听端口是 `19090`。 - `--consumer.config /home/cbs/cbs_kafka_outer/config/sasl.properties` 指定消费者配置文件路径。该文件通常包含安全认证参数,例如: ```properties security.protocol=SASL_PLAINTEXT # 安全协议类型 sasl.mechanism=PLAIN # SASL 认证机制 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="xxx" password="xxx"; ``` - `--topic crs2-zy-cdn` 指定要消费的主题名称。需确保该主题已存在且有权访问。 - `--max-messages 5` 限制最多消费 5 条消息后自动退出。常用于测试或快速验证数据。 3. **命令行为** 此命令会从 `crs2-zy-cdn` 主题拉取最新未消费的 5 条消息(默认从最新偏移量开始),通过 SASL 认证连接到指定 Broker,并在控制台输出消息内容。 4. **常见问题排查** - **连接失败** 检查 IP/端口是否可达,确认防火墙规则,验证 `sasl.properties` 中的用户名/密码是否正确。 - **主题不存在** 运行 `kafka-topics.sh --list --bootstrap-server 10.104.170.104:19090` 查看所有主题。 - **认证错误** 检查 `sasl.properties` 中 SASL 机制是否与 Broker 配置匹配(如 PLAIN/SCRAM)。 5. **扩展操作示例** ```bash # 从头开始消费(重置偏移量) kafka-console-consumer.sh ... --from-beginning # 指定消费者组 kafka-console-consumer.sh ... --group test-group ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值