Symptoms:
One of cluster member shows problem. It always happened on standby member. If goes into deep, you will find some of cluster member interfaces are showing down or partially up, although physically interface is up and connected properly.
Log into command line on primary member:
[[email protected]]# cphaprob stat
Cluster Mode: New High Availability (Active Up)
with IGMP Membership
Number Unique Address Assigned Load State
1 (local) 1.1.1.1 100% Active
2 1.1.1.2 0% Down
On standby checkpoint member :
[[email protected]]# cphaprob stat
Cluster Mode: New High Availability (Active Up)
with IGMP Membership
Number Unique Address Assigned Load State
1 1.1.1.1 100% Active
2 (local) 1.1.1.2 0% Down
[[email protected]]# cphaprob -i list
Built-in Devices:
Device Name: Interface Active Check
Current state: problem
Device Name: HA Initialization
Current state: OK
Registered Devices:
Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 93466.5 sec
Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 93439.2 sec
Device Name: cphad
Registration number: 2
Timeout: 2 sec
Current state: OK
Time since last report: 0.2 sec
Device Name: fwd
Registration number: 3
Timeout: 2 sec
Current state: OK
Time since last report: 0.2 sec
[[email protected]]# cphaprob -a if
Required interfaces: 4
Required secured interfaces: 1
DMZ UP non sync(non secured), multicast
Internal Inbound: DOWN (10.9 secs) Outbound: DOWN (88822.4 secs) non sync(non secured), multicast
Lan1 UP sync(secured), multicast
External Inbound: DOWN (88822.4 secs) Outbound: DOWN (89001.8 secs) non sync(non secured), multicast
Virtual cluster interfaces: 3
DMZ 100.9.2.30
Internal 100.9.40.1
External 100.9.38.20
Solution:
Change Cluster mode from Multicast mode to Broadcast mode. From command line, it is “cphaconf set_ccp broadcast”. This change does not require system reboot or cpstop/cpstart. Also it can survive reboot.
[[email protected]]# cphaconf set_ccp broadcast
[[email protected]]# cphaprob -a if
Required interfaces: 4
Required secured interfaces: 1
DMZ UP non sync(non secured), broadcast
Internal UP non sync(non secured), broadcast
Lan1 UP sync(secured), broadcast
External UP non sync(non secured), broadcast
Virtual cluster interfaces: 3
DMZ 10.19.2.30
Internal 10.19.140.1
External 10.19.138.20
[[email protected]]# cphaprob stat
Cluster Mode: New High Availability (Active Up)
with IGMP Membership
Number Unique Address Assigned Load State
1 (local) 1.1.1.1 100% Active
2 1.1.1.2 0% Standby
[[email protected]]# cphaconf set_ccp broadcast
[[email protected]]# cphaprob stat
Cluster Mode: New High Availability (Active Up)
with IGMP Membership
Number Unique Address Assigned Load State
1 1.1.1.1 100% Active
2 (local) 1.1.1.2 0% Standby
[[email protected]]# cphaprob -a if
Required interfaces: 4
Required secured interfaces: 1
DMZ UP non sync(non secured), broadcast
Internal UP non sync(non secured), broadcast
Lan1 UP sync(secured), broadcast
External UP non sync(non secured), broadcast
Virtual cluster interfaces: 3
DMZ 100.9.2.30
Internal 100.9.40.1
External 100.9.38.20
Note: To softly switch cluster statue between cluster member, use this command “clusterXL_admin <up|down> [-p]“
[[email protected]]# clusterXL_admin up
Setting member to normal operation …
Member current state is Standby
本文介绍了一个集群中出现的问题现象及排查过程,并提供了解决方案。主要问题是集群成员中的某个节点出现了接口状态异常,导致集群通信出现问题。通过更改集群模式为广播模式解决了问题。

被折叠的 条评论
为什么被折叠?



