RocketMq监控大盘制作

本文详细解读了RocketMQ的官方监控指标,包括生产者、消费者和Broker的TPS、消息大小、延迟等关键数据,并介绍了阿里云特有的监控配置和告警策略,适合用于搭建定制化的Prometheus监控体系。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.官网资料

​​​​​​https://github.com/apache/rocketmq-exporter

官方模板

Grafana Dashboard ID: 10477, name: RocketMQ Exporter Overview. For details of the dashboard please see RocketMQ Exporter Overview.

2.常用指标

类型监控项说明
Brokerrocketmq_broker_tps单个broker每秒生成的消息数
rocketmq_broker_qps单个broker的qps(每秒请求处理数)
Producerrocketmq_producer_tps单个topic的消息生产的(TPS生产tps)
rocketmq_producer_message_size单个topic每秒消息生产的总数据量大小
rocketmq_producer_offset单个topic消息生产的offset
Consumer Groupsrocketmq_consumer_tps单个consumer组每秒消息的TPS(消费tps)
rocketmq_consumer_message_size单个consumer组每秒消息消息的总数据大小
rocketmq_consumer_offset单个consumer组消息的offset
rocketmq_group_get_latency_by_storetime单个消费组延迟时间
rocketmq_group_get_latency单个队列的某个主题的消费者延迟
rocketmq_message_accumulation单个消费组延迟消费消息数量
Consumerrocketmq_client_consume_fail_msg_count消费者一小时内消费消息失败的数量
rocketmq_client_consume_fail_msg_tps消费者每秒消费消息失败的数量
rocketmq_client_consume_ok_msg_tps消费者每秒消费成功的消息数
rocketmq_client_consumer_pull_tps消费者每秒消费的消息数
 rocketmq_client_consume_rt每条消息的平均消费时间
 rocketmq_client_consumer_pull_rt拉取每条消息的平均时间
 rocketmq_client_consumer_pull_tps客户端每秒拉取的消息数
Containercontainer_cpu_usage_seconds_total容器CPU使用率
container_memory_usage_bytes当前使用的内存量
container_fs_usage_bytes容器磁盘空间使用
container_fs_writes_bytes_total磁盘写入速度
container_fs_reads_bytes_total磁盘读取速度
rocketmq_brokeruntimerocketmq_brokeruntime_commitlog_disk_ratio
rocketmq_brokeruntime_consumequeue_disk_ratio
rocketmq_brokeruntime_commitlogdir_capacity_free
rocketmq_brokeruntime_commitlogdir_capacity_total
rocketmq_brokeruntime_commitlog_maxoffset
rocketmq_brokeruntime_commitlog_minoffset

下面这个是阿里云官方配置的时候自带的,阿里云专用

指标
类别
描述
rocketmq_broker_tps
Broker
Broker produces the number of messages per second
rocketmq_broker_qps
Broker
Broker consumes messages per second
rocketmq_producer_tps
Producer
The number of messages produced per second per topic
rocketmq_producer_message_size
Producer
The size of a message produced per second by a topic (in bytes)
rocketmq_producer_offset
Producer
The progress of a topic's production message
rocketmq_consumer_tps
Consumer Groups
The number of messages consumed per second by a consumer group
rocketmq_consumer_message_size
Consumer Groups
The size of the message consumed by the consumer group per second (in bytes)
rocketmq_consumer_offset
Consumer Groups
Progress of consumption message for a consumer group
rocketmq_group_get_latency
Consumer Groups
Consumer latency on some topic for one queue
rocketmq_group_get_latency_by_storetime
Consumer Groups
Consumption delay time of a consumer group
rocketmq_message_accumulation
Consumer Groups
How far Consumer offset lag behind
rocketmq_client_consume_fail_msg_count
Consumer
The number of messages consumed fail in one hour
rocketmq_client_consume_fail_msg_tps
Consumer
The number of messages consumed fail per second
rocketmq_client_consume_ok_msg_tps
Consumer
The number of messages consumed success per second
rocketmq_client_consume_ok_msg_tps
Consumer
The number of messages consumed success per second
rocketmq_client_consume_rt
Consumer
The average time of consuming every message
rocketmq_client_consumer_pull_rt
Consumer
The average time of pulling every message
rocketmq_client_consumer_pull_tps
Consumer
The number of messages pulled by client per second
rocketmq_brokeruntime_pmdt_0to10ms
Broker
The number of put message broke responds within 0to10ms
rocketmq_brokeruntime_pmdt_10to50ms
Broker
The number of put message broke responds within 10to50ms
rocketmq_brokeruntime_pmdt_50to100ms
Broker
The number of put message broke responds within 50to100ms
rocketmq_brokeruntime_pmdt_100to200ms
Broker
The number of put message broke responds within 100to200ms
rocketmq_brokeruntime_pmdt_200to500ms
Broker
The number of put message broke responds within 200to500ms
rocketmq_brokeruntime_pmdt_500to1s
Broker
The number of put message broke responds within 500to1s
rocketmq_brokeruntime_pmdt_1to2s
Broker
The number of put message broke responds within 1to2s
rocketmq_brokeruntime_pmdt_2to3s
Broker
The number of put message broke responds within 2to3s
rocketmq_brokeruntime_pmdt_3to4s
Broker
The number of put message broke responds within 3to4s
rocketmq_brokeruntime_pmdt_4to5s
Broker
The number of put message broke responds within 4to5s
rocketmq_brokeruntime_pmdt_5to10s
Broker
The number of put message broke responds within 5to10s
rocketmq_brokeruntime_pmdt_10stomore
Broker
The number of put message broke responds within 10stomore
rocketmq_brokeruntime_query_threadpoolqueue_headwait_timemills
Broker
Query thread pool queue head element wait time
rocketmq_brokeruntime_pull_threadpoolqueue_headwait_timemill
Broker
Pull thread pool queue head element wait time
rocketmq_brokeruntime_send_threadpoolqueue_headwait_timemills
Broker
Send thread pool queue head element wait time
rocketmq_brokeruntime_commitlog_disk_ratio
Broker
Broker commit log disk ratio
rocketmq_brokeruntime_consumequeue_disk_ratio
Broker
Broker consume queue disk ratio
rocketmq_brokeruntime_commitlogdir_capacity_free
Broker
Broker commit log dir capacity free
rocketmq_brokeruntime_commitlogdir_capacity_total
Broker
Broker commit log dir capacity total
rocketmq_brokeruntime_commitlog_maxoffset
Broker
Broker commit log max offset
rocketmq_brokeruntime_msg_put_total_today_now
Broker
Broker msg put total today now
rocketmq_brokeruntime_msg_gettotal_today_now
Broker
Broker msg get total today now
rocketmq_brokeruntime_dispatch_behind_bytes
Broker
Broker dispatch behind bytes
rocketmq_brokeruntime_put_message_size_total
Broker
Broker put message size total
rocketmq_brokeruntime_put_message_average_size
Broker
Broker put message average size
rocketmq_brokeruntime_msg_gettotal_yesterdaymorning
Broker
Broker msg get total yesterday morning
rocketmq_brokeruntime_msg_gettotal_todaymorning
Broker
Broker msg get total today morning

更多指标参考官方源码:

https://github.com/apache/rocketmq-exporter/blob/master/rocketmq_exporter_overview.json

这里有人做了一版更实用的大盘,下载json即可:

一套拿来即用的RocketMQ监控面板和告警规则_不识君的荒漠的博客-优快云博客_rocketmq 监控

告警规则主要有如下几条:

  • broker节点挂了
  • 磁盘空间不足
  • broker busy告警
  • 消息积压
  • broker写入消息耗时太久(可能是broker IO压力大或才内存资源不足了)
  • 集群发送的tps暴增了(可能是有人突然对集群压测,或者猛写入一批消息了)

 下面是阿里云的一些实用问题。

阿里云需要在Prometheus监控增加RocketMQ的监控组件

 需要注意如果是阿里云部署的私有云之类需要注意镜像是否推送。

配置好默认会自带一个监控大盘实际上也就是官方做的模板,只是用了阿里云的环境变量。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值