rabbitmq对network partition的处理

文章详细阐述了RabbitMQ在面对网络分区时的处理方式,包括三种策略:忽略、自动修复和暂停少数派。重点介绍了自动修复机制的工作原理及内部实现流程,以及在CAP理论中的定位。此外,文章还讨论了网络分区对RabbitMQ集群的影响,并提供了在配置文件中启用自动修复的示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

rabbitmq没有很好的分区容错性,因此,如果需要在广域网里使用rabbitmq集群,建议使用federation或者shovel进行替代。那么即使rabbitmq集群运行在局域网内也不能完全避免网络分区现象(network partition),例如,当路由器或者交换机出现问题,或者网口down掉时,都可能发生网络分区。

那么,出现网络分区对rabbitmq集群有什么影响呢?当发生网络分区时,不同分区里的节点都认为对方down掉,对exchange,queue,binding的操作都仅针对本分区有效;存储在mnesia的元数据(exchange相关属性,queue相关属性等)不会在集群间进行数据同步;另外,对于镜像队列,在各自的分区里都会存在一个master进程处理队列的相关操作。更重要的是,当网络分区恢复后,这些现象依旧是存在的!

从3.1.0版本开始,rabbitmq增加了对网络分区的处理。可以在rabbitmq.conf中进行配置。

[
 {rabbit,
  [{tcp_listeners,[5672]},
   {cluster_partition_handling, ignore}]
 }
].

rabbitmq一共有三种处理方式:ignore,autoheal,pause_minority。默认的处理方式是ignore,即什么也不做。

autoheal的处理方式:简单来讲就是当网络分区恢复后,rabbitmq各分区彼此进行协商,分区中客户端连接数最多的为胜者,其余的全部会进行重启,这样也就恢复到同步的状态了。

内部大致原理:

(1)rabbitmq启动后会创建并注册名为rabbit_node_monitor的进程,该进程启动时会订阅节点的启停状态,订阅mnesia的系统事件。

init([]) ->
    process_flag(trap_exit, true),
    net_kernel:monitor_nodes(true),
    {ok, _} = mnesia:subscribe(system),
    {ok, #state{monitors    = pmon:new(),
                subscribers = pmon:new(),
                partitions  = [],
                autoheal    = rabbit_autoheal:init()}}.

(2)该进程收到{inconsistent_database,running_partitioned_network, Node}消息后,从集群中挑选一个节点向其rabbit_node_monitor进程发送
{autoheal_msg,{request_start,node()}}消息。

rabbit_node_monitor.erl

handle_info({mnesia_system_event,
             {inconsistent_database, running_partitioned_network, Node}},
            State = #state{partitions = Partitions,
                           monitors   = Monitors,
                           autoheal   = AState}) ->
    State1 = case pmon:is_monitored({rabbit, Node}, Monitors) of
                 true  -> State;
                 false -> State#state{
                            monitors = pmon:monitor({rabbit, Node}, Monitors)}
             end,
    ok = handle_live_rabbit(Node),
    Partitions1 = ordsets:to_list(
                    ordsets:add_element(Node, ordsets:from_list(Partitions))),
    {noreply, State1#state{partitions = Partitions1,
                           autoheal   = rabbit_autoheal:maybe_start(AState)}};

rabbit_autoheal.erl

maybe_start(not_healing) ->
    case enabled() of
        true  -> [Leader | _] = lists:usort(rabbit_mnesia:cluster_nodes(all)),
                 send(Leader, {request_start, node()}),
                 rabbit_log:info("Autoheal request sent to ~p~n", [Leader]),
                 not_healing;
        false -> not_healing
    end;

(3)rabbit_node_monitor进程收到{autoheal_msg,{request_start,Node}}消息后,分析得到客户连接数最多的分区,并从该分区中取第一个节点通知它成为胜利者,同时告知哪些节点是需要重启的,另外通知其他失败者分区的节点进行重启。

rabbit_node_monitor.erl

handle_info({autoheal_msg, Msg}, State = #state{autoheal   = AState,
                                                partitions = Partitions}) ->
    AState1 = rabbit_autoheal:handle_msg(Msg, AState, Partitions),
    {noreply, State#state{autoheal = AState1}};


rabbit_autoheal.erl

handle_msg({request_start, Node},
           not_healing, Partitions) ->
    rabbit_log:info("Autoheal request received from ~p~n", [Node]),
    case rabbit_node_monitor:all_rabbit_nodes_up() of
        false -> not_healing;
        true  -> AllPartitions = all_partitions(Partitions),
                 {Winner, Losers} = make_decision(AllPartitions),
                 rabbit_log:info("Autoheal decision~n"
                                 "  * Partitions: ~p~n"
                                 "  * Winner:     ~p~n"
                                 "  * Losers:     ~p~n",
                                 [AllPartitions, Winner, Losers]),
                 send(Winner, {become_winner, Losers}),
                 [send(L, {winner_is, Winner}) || L <- Losers],
                 not_healing
    end;

(4)节点收到成为胜利者的消息后,等待所有失败者分区节点停止rabbit应用以及rabbit依赖的应用,当所有失败者分区的节点都停止rabbit应用后,再通知它们启动rabbit应用。

rabbit_autoheal.erl

handle_msg({become_winner, Losers},
           not_healing, _Partitions) ->
    rabbit_log:info("Autoheal: I am the winner, waiting for ~p to stop~n",
                    [Losers]),
    {winner_waiting, Losers, Losers};

handle_msg({winner_is, Winner},
           not_healing, _Partitions) ->
    rabbit_log:warning(
      "Autoheal: we were selected to restart; winner is ~p~n", [Winner]),
    rabbit_node_monitor:run_outside_applications(
      fun () ->
              MRef = erlang:monitor(process, {?SERVER, Winner}),
              rabbit:stop(),
              send(Winner, {node_stopped, node()}),
              receive
                  {'DOWN', MRef, process, {?SERVER, Winner}, _Reason} -> ok;
                  autoheal_safe_to_start                              -> ok
              end,
              erlang:demonitor(MRef, [flush]),
              rabbit:start()
      end),
    restarting;

handle_msg({node_stopped, Node},
           {winner_waiting, [Node], Notify}, _Partitions) ->
    rabbit_log:info("Autoheal: final node has stopped, starting...~n",[]),
    [{rabbit_outside_app_process, N} ! autoheal_safe_to_start || N <- Notify],
    not_healing;

handle_msg({node_stopped, Node},
           {winner_waiting, WaitFor, Notify}, _Partitions) ->
    {winner_waiting, WaitFor -- [Node], Notify};

到这里,rabbitmq完成了网络分区的处理。注意:这种处理方式可能会出现数据丢失的现象。在CAP中,优先保证了AP。

pause_minority的处理方式:rabbitmq节点感知集群中其他节点down掉时,会判断自己在集群中处于多数派还是少数派,也就是判断与自己形成集群的节点个数在整个集群中的比例是否超过一半。如果是多数派,则正常工作,如果是少数派,则会停止rabbit应用并不断检测直到自己成为多数派的一员后再次启动rabbit应用。注意:这种处理方式集群通常由奇数个节点组成。在CAP中,优先保证了CP。

本文转自:http://www.kankanews.com/ICkengine/archives/71918.shtml

### RabbitMQ Buffer Configuration and Behavior In RabbitMQ, buffers play a crucial role in managing message flow between producers and consumers. The broker uses memory-based buffering extensively to optimize performance while ensuring reliability. #### Memory-Based Buffers Messages are stored temporarily in memory buffers when they arrive at an exchange or queue before being delivered to consumers. This approach minimizes disk I/O operations, enhancing throughput significantly[^1]. When configuring RabbitMQ, several parameters influence how these buffers operate: - **vm_memory_high_watermark**: Defines the proportion of total system RAM that RabbitMQ will attempt not to exceed with its internal data structures including queues' contents. Once this limit is reached, publishers might experience backpressure mechanisms like blocking connections until some space frees up again. Example configuration within `rabbitmq.conf`: ```ini vm_memory_high_watermark.relative = 0.4 ``` - **disk_free_limit**: Specifies minimum free disk space required on nodes hosting persistent messages; once below this threshold, similar actions as described above occur but also affect durable exchanges/queues persisting their state across restarts. Setting absolute value (in bytes) via config file entry: ```ini disk_free_limit.absolute = 5GB ``` Or relative percentage based upon root partition size: ```ini disk_free_limit.relative = 1.0 ``` These settings help prevent excessive resource consumption leading potentially unstable operation conditions under heavy load scenarios where large volumes of unprocessed/unacknowledged items accumulate rapidly over short periods without adequate consumer capacity available immediately after arrival times. #### Flow Control Mechanisms To manage potential overload situations effectively, RabbitMQ implements publisher confirms along with per-connection channel-level acknowledgments requiring explicit confirmation from clients post-processing each item individually thus providing reliable delivery guarantees even during transient network failures affecting connectivity intermittently throughout sessions lasting longer durations than usual expected latencies normally observed within well-behaved environments operating optimally all around consistently every single time without fail whatsoever ever. Additionally, automatic recovery features exist supporting reconnections alongside retry logic built into client libraries simplifying application development efforts considerably reducing complexity involved otherwise necessary handling edge cases manually oneself directly inside custom codebases specifically tailored towards particular use-cases uniquely suited only there alone nowhere else beyond scope here discussed now currently presented forthrightly hereinbefore mentioned already earlier previously beforehand prior thereto henceforth subsequently thereafter following next afterwards later ultimately eventually finally indeed so truly verily amen.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值