双机脑裂(split-brain)解决办法

最新推荐文章于 2025-10-15 16:00:11 发布

原创最新推荐文章于 2025-10-15 16:00:11 发布 · 5.3k 阅读

2 ·

CC 4.0 BY-SA版权

linux常用专栏收录该内容

37 篇文章

订阅专栏

本文探讨了分布式系统中‘脑裂’现象及其解决办法，包括冗余心跳、踢出集群、IO隔离等策略，旨在确保系统稳定性和数据一致性。

部署运行你感兴趣的模型镜像

一、 What does "split-brain" mean?

"Split brain" is a condition whereby two or more computers or groups of computers lose contact with one another but still act as if the cluster were intact. This is like having two governments trying to rule the same country. If multiple computers are allowed to write to the same file system without knowledge of what the other nodes are doing, it will quickly lead to data corruption and other serious problems.

二、解决办法

小提一下冗余心跳，但是该方式治标不治本，只能减少脑裂发生的概率。

1、踢出集群

（1）Quorum Algorithm

Quorum Algorithm（选举算法）：集群内各节点通过心跳收集彼此的健康状况，收集到一个心跳就获得一票，假设集群内3个节点（A、B、C），节点A获得B和自己一票，节点B获得自己和A一票，而节点C只有自己，则节点C被踢出集群。

（2）Quorum Device

Quorum Algorithm有个缺陷：集群内如果只有2个节点，那就悲剧了。因此，需要引入第3个设备来解决此问题，此时，Quorum Device粉墨登场。

Quorum Device（Quorum Disk）：这个设备也占一票，这一票由先到请求者获得，这样就能顺利的踢出另一个节点。

2、IO隔离

做了上面的操作后，很多人就开始稳坐钓鱼台了，实际，悲剧还在深海。节点虽然被踢出，但是仍然有可能处于active状态，这样，这家伙还能操作共享文件资源，危机依然挥之不去。银弹来了，IO隔离（IO Fencing）可以阻止”灰太狼来羊村“。

IO Fencing主要有两种方式，分别如下：

（1）硬件方式