一、概述
单机、单节点运行RabbitMQ服务,无法满足真实应用的需求,比如,遇到RabbitMQ内存崩溃、掉电等等问题。因此搭建一个RabbitMQ集群才是解决问题的关键。
RabbitMQ集群允许消费者和生产者在RabbitMQ单个节点崩溃的情况下继续运行,他可以通过添加更多的节点来线性地扩展消息通信的吞吐量。
RabbitMQ集群中的所有节点都会备份所有的元数据信息,有:
- 队列元数据:队列的名称及属性
- 交换器:交换器的名称和属性
- 绑定关系元数据:交换机与队列或则交换器与交换器之间的绑定关系
- vhost元数据:vhost内的队列、交换器和绑定提供命名空间及安全属性。
但是不备份消息(可以通过镜像队列解决这个问题)
二、RabbitMQ集群搭建过程
如果关闭了集群中所有节点,则需要确保在启动的时候最后关闭的那个节点是第一个启动的,如果第一个启动的不是最后关闭的节点,那么这个节点会等待最后关闭的节点启动。这个等待时间是30s,如果没有等到,那么这个先启动的节点也会失败。
三、集群搭建运维
RabbitMQ中每个节点的类型要么是磁盘节点要么是内存节点。内存节点将所有的队列、交换器、绑定关系、用户、权限和vhost的元数据定义存储在内存中,而磁盘节点则将这些信息存储到磁盘中。单个节点的集群中必然只有磁盘类型的节点,否则当重启RabbitMQ之后,所有关于系统配置的信息都会丢失,不过在集群中可以选择配置部分节点为内存节点,可以提高性能。
指定为内存节点,在加入集群的时候加入参数--ram
。默认不添加--ram
表示此节点为磁盘节点。
1、查看节点类型
root@rabbit-node01:/# rabbitmqctl cluster_status
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Cluster status of node rabbit@rabbit-node01 ...
Basics
Cluster name: rabbit@rabbit-node01
Disk Nodes
rabbit@rabbit-node01
RAM Nodes
rabbit@rabbit-node02
rabbit@rabbit-node03
Running Nodes
rabbit@rabbit-node01
rabbit@rabbit-node02
rabbit@rabbit-node03
Versions
rabbit@rabbit-node01: RabbitMQ 3.8.14 on Erlang 23.3.1
rabbit@rabbit-node02: on Erlang
rabbit@rabbit-node03: on Erlang
2、更改节点类型
[root@localhost ~]# docker exec -it rabbit-node02 bash
root@rabbit-node02:/#
root@rabbit-node02:/# rabbitmqctl stop_app
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Stopping rabbit application on node rabbit@rabbit-node02 ...
root@rabbit-node02:/# rabbitmqctl change_cluster_node_type ram
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Turning rabbit@rabbit-node02 into a ram node
root@rabbit-node02:/# rabbitmqctl start_app
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Starting node rabbit@rabbit-node02 ...
root@rabbit-node02:/#
查看节点类型变更
root@rabbit-node01:/# rabbitmqctl cluster_status
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Cluster status of node rabbit@rabbit-node01 ...
Basics
Cluster name: rabbit@rabbit-node01
Disk Nodes
rabbit@rabbit-node01
RAM Nodes
rabbit@rabbit-node02
rabbit@rabbit-node03
Running Nodes
rabbit@rabbit-node01
rabbit@rabbit-node02
rabbit@rabbit-node03
Versions
rabbit@rabbit-node01: RabbitMQ 3.8.14 on Erlang 23.3.1
rabbit@rabbit-node02: RabbitMQ 3.8.14 on Erlang 23.3.1
rabbit@rabbit-node03: RabbitMQ 3.8.14 on Erlang 23.3.1
Maintenance status
Node: rabbit@rabbit-node01, status: not under maintenance
Node: rabbit@rabbit-node02, status: not under maintenance
Node: rabbit@rabbit-node03, status: not under maintenance
Alarms
(none)
RabbitMQ集群只要求在集群中至少有一个磁盘节点,所有其他节点可以是内存节点。当节点加入或则离开集群时,他们必须将变更通知到至少一个磁盘节点。如果只有一个磁盘节点,而且不凑巧的是它刚好崩溃,那么集群可以继续发送或者接收消息,但是不能执行创建队列,交换器、绑定关系、用户,以及更改权限、添加或删除集群节点的操作了。也就是说,如果集群中唯一的磁盘节点崩溃,集群任然可以保持运行,但是直到将该节点恢复到集群前,你无法更改任何东西、所以建立集群的时候应该保证有两个或则多个磁盘节点的存在。
3、集群剔除单个节点
方式一:
- 在要剔除的节点上执行
[root@localhost ~]# docker exec -it rabbit-node03 bash
root@rabbit-node03:/#
root@rabbit-node03:/#
root@rabbit-node03:/# rabbitmqctl stop
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Stopping and halting node rabbit@rabbit-node03 ...
root@rabbit-node03:/# [root@localhost ~]#
- 在其他节点执行
[root@localhost ~]# docker exec -it rabbit-node01 bash
root@rabbit-node01:/#
root@rabbit-node01:/#
root@rabbit-node01:/#
root@rabbit-node01:/# rabbitmqctl forget_cluster_node rabbit@rabbit-node03
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Removing node rabbit@rabbit-node03 from the cluster
root@rabbit-node01:/#
root@rabbit-node01:/# rabbitmqctl cluster_status
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Cluster status of node rabbit@rabbit-node01 ...
Basics
Cluster name: rabbit@rabbit-node01
Disk Nodes
rabbit@rabbit-node01
RAM Nodes
rabbit@rabbit-node02
Running Nodes
rabbit@rabbit-node01
Versions
rabbit@rabbit-node01: RabbitMQ 3.8.14 on Erlang 23.3.1
方式二:
root@rabbit-node02:/# rabbitmqctl stop_app
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Stopping rabbit application on node rabbit@rabbit-node02 ...
root@rabbit-node02:/# rabbitmqctl reset
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Resetting node rabbit@rabbit-node02 ...
root@rabbit-node02:/#
4、集群节点升级
集群由多个节点组成
- 关闭所有节点的服务,采用
rabbitmqctl stop
- 保存各个节点的Mnesia数据
- 解压新版的RabbitMQ到指定目录
- 指定新版本的Mnesia路径为步骤2中保存的Mnesia数据路径
- 启动新版本的服务,注意先重启原版本中最后关闭的那个节点
5、单机多节点配置
单机多节点配置大多用于实验性论证,如果需要在真是生产环境中使用,最好还是搭建一个多节点的集群。搭建集群不仅可以扩容,也能有效地进行容灾。
四、服务日志运维
RabbitMQ的默认日志存放在
[root@node01 rabbitmq]# pwd
/var/log/rabbitmq
想要查看RabbitMQ应用服务日志,则需要查阅rabbit@node01.log
[root@node01 rabbitmq]# ls -al
total 9356
drwxr-xr-x. 3 rabbitmq rabbitmq 75 Apr 3 06:35 .
drwxr-xr-x. 10 root root 4096 Apr 4 03:50 ..
drwxr-x---. 2 rabbitmq rabbitmq 42 Apr 4 07:05 log
-rw-r-----. 1 rabbitmq rabbitmq 9485067 Apr 4 20:46 rabbit@node01.log
-rw-r-----. 1 rabbitmq rabbitmq 204 Apr 4 09:02 rabbit@node01_upgrade.log
每个版本都会有一些差异。
1、examples
- 开启插件看日志变化
[root@node01 ~]# rabbitmq-plugin enable rabbitmq_top
-bash: rabbitmq-plugin: command not found
[root@node01 ~]# rabbitmq-plugins enable rabbitmq_top
Enabling plugins on node rabbit@node01:
rabbitmq_top
The following plugins have been configured:
rabbitmq_federation
rabbitmq_management
rabbitmq_management_agent
rabbitmq_top
rabbitmq_tracing
rabbitmq_web_dispatch
Applying plugin configuration to rabbit@node01...
The following plugins have been enabled:
rabbitmq_top
started 1 plugins.
[root@node01 ~]#
[root@node01 ~]# tail -f /var/log/rabbitmq/rabbit\@node01.log
2021-04-04 21:12:20.598 [info] <0.44.0> Application rabbitmq_top started on node rabbit@node01
2021-04-04 21:12:20.604 [warning] <0.651.0> HTTP listener registry could not find context rabbitmq_management_tls
2021-04-04 21:12:20.656 [info] <0.21288.3> Plugins changed; enabled [rabbitmq_top]
2021-04-04 21:12:24.862 [info] <0.21400.3> accepting AMQP connection <0.21400.3> ([::1]:56192 -> [::1]:5672)
2021-04-04 21:12:24.862 [error] <0.21400.3> closing AMQP connection <0.21400.3> ([::1]:56192 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:12:24.866 [info] <0.21405.3> accepting AMQP connection <0.21405.3> ([::1]:56194 -> [::1]:5672)
2021-04-04 21:12:24.867 [info] <0.21402.3> Closing all channels from connection '[::1]:56192 -> [::1]:5672' because it has been closed
2021-04-04 21:12:24.868 [error] <0.21405.3> closing AMQP connection <0.21405.3> ([::1]:56194 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:12:24.872 [info] <0.21410.3> accepting AMQP connection <0.21410.3> ([::1]:56196 -> [::1]:5672)
2021-04-04 21:12:24.873 [error] <0.21410.3> closing AMQP connection <0.21410.3> ([::1]:56196 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:12:24.873 [info] <0.21407.3> Closing all channels from connection '[::1]:56194 -> [::1]:5672' because it has been closed
2021-04-04 21:12:24.878 [info] <0.21415.3> accepting AMQP connection <0.21415.3> ([::1]:56198 -> [::1]:5672)
2021-04-04 21:12:24.878 [error] <0.21415.3> closing AMQP connection <0.21415.3> ([::1]:56198 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:12:24.881 [info] <0.21412.3> Closing all channels from connection '[::1]:56196 -> [::1]:5672' because it has been closed
2021-04-04 21:12:24.882 [info] <0.21417.3> Closing all channels from connection '[::1]:56198 -> [::1]:5672' because it has been closed
^C
[root@node01 ~]#
- 节点关闭应用
[root@node03 ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@node03 ...
[root@node03 ~]#
[root@node01 ~]# tail -f /var/log/rabbitmq/rabbit\@node01.log
2021-04-04 21:17:06.133 [info] <0.528.0> rabbit on node rabbit@node03 down
2021-04-04 21:17:06.177 [info] <0.528.0> Keeping rabbit@node03 listeners: the node is already back
^C
[root@node01 ~]#
[root@node01 ~]# tail -f /var/log/rabbitmq/rabbit\@node01.log
2021-04-04 21:19:51.390 [info] <0.528.0> rabbit on node rabbit@node03 up
2021-04-04 21:19:54.861 [info] <0.22511.3> accepting AMQP connection <0.22511.3> ([::1]:56432 -> [::1]:5672)
2021-04-04 21:19:54.861 [error] <0.22511.3> closing AMQP connection <0.22511.3> ([::1]:56432 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:19:54.863 [info] <0.22516.3> accepting AMQP connection <0.22516.3> ([::1]:56434 -> [::1]:5672)
2021-04-04 21:19:54.863 [error] <0.22516.3> closing AMQP connection <0.22516.3> ([::1]:56434 -> [::1]:5672):
{bad_header,<<"GET /api">>}
2021-04-04 21:19:54.865 [info] <0.22513.3> Closing all channels from connection '[::1]:56432 -> [::1]:5672' because it has been closed
2021-04-04 21:19:54.865 [info] <0.22521.3> accepting AMQP connection <0.22521.3> ([::1]:56436 -> [::1]:5672)
2021-04-04 21:19:54.866 [error] <0.22521.3> closing AMQP connection <0.22521.3> ([::1]:56436 -> [::1]:5672):
{bad_header,<<"GET /api">>}
^C
[root@node01 ~]#
2、日志分割/日志切分
有时候RabbitMQ服务持久运行,其对应的日志也会越来越多,文件大小也会越来越大,出故障的时候查看日志也会打印很多信息。有时候需要对日志按照某种规律进行切分,便于后期管理。
Usage
rabbitmqctl [--node <node>] [--longnames] [--quiet] rotate_logs
examples
[root@node01 ~]# cd /var/log/rabbitmq/
[root@node01 rabbitmq]# ls -al
total 9932
drwxr-xr-x. 3 rabbitmq rabbitmq 75 Apr 3 06:35 .
drwxr-xr-x. 10 root root 4096 Apr 4 03:50 ..
drwxr-x---. 2 rabbitmq rabbitmq 42 Apr 4 07:05 log
-rw-r-----. 1 rabbitmq rabbitmq 9724874 Apr 4 21:25 rabbit@node01.log
-rw-r-----. 1 rabbitmq rabbitmq 204 Apr 4 09:02 rabbit@node01_upgrade.log
[root@node01 rabbitmq]# rabbitmqctl rotate_logs
Rotating logs for node rabbit@node01 ...
[root@node01 rabbitmq]# ls -al
total 9956
drwxr-xr-x. 3 rabbitmq rabbitmq 137 Apr 4 21:28 .
drwxr-xr-x. 10 root root 4096 Apr 4 03:50 ..
drwxr-x---. 2 rabbitmq rabbitmq 42 Apr 4 07:05 log
-rw-r--r--. 1 rabbitmq rabbitmq 18372 Apr 4 21:31 rabbit@node01.log
-rw-r-----. 1 rabbitmq rabbitmq 9741706 Apr 4 21:28 rabbit@node01.log.0
-rw-r--r--. 1 rabbitmq rabbitmq 84 Apr 4 21:28 rabbit@node01_upgrade.log
-rw-r-----. 1 rabbitmq rabbitmq 272 Apr 4 21:28 rabbit@node01_upgrade.log.0
[root@node01 rabbitmq]#
五、单节点故障恢复
单节点故障包括:机器硬件故障、机器掉电、网络异常、服务进程异常。
此时在其他节点执行
[root@node01 ~]# rabbitmqctl forget_cluster_node rabbit-node01
六、集群迁移
元数据重建是指在新的集群中创建原集群的队列、交换器、绑定关系、vhost、用户、权限和Parameter等数据信息。
- 通过web管理界面的方式重建
- 如果原集群突发故障,又或者开启RabbitMQ Management插件的那个节点机器故障不可修复,就无法获取原集群的元数据。这样元数据重建就无从谈起。这个时候可以采取一个通用的备份任务,在元数据有变更或者到达某个存储周期时将最新的备份至另一个安全的地方。这样在遇到需要集群迁移时,可以获取到最新的元数据。
- 如果新旧集群的RabbitMQ版本不一样会导致异常情况,两个版本的元数据不同(加密算法、元数据JSON文件)
具体迁移包括Queue、Exchange、Binding、解析json、创建元数据、创建队列等。
这个地方还需今后深入了解 目前比较困难。
自动化迁移
zookeeper 后续补充
七、集群监控
1、通过HTTP API借口提供监控数据
2、通过客户端提供监控数据
监控涉及众多代码,待深入理解补充