使用Prometheus监控Redis,推荐使用官方插件:
https://github.com/oliver006/redis_exporter这个exporter。 目前我们主要使用的是redis的主从复制搭配Sentinel实现的高可用方案。
部署方式:
将exporter部署在虚拟机上解压后执行:
redis_exporter -web.listen-address=:9121 -redis.addr redis://xxxx:26379 -redis.password password
redis_exporter提供了一个Grafana Dashboard画图 Grafana Dashboard
prometheus server配置:
global:
scrape_interval: 10s
scrape_timeout: 10s
evaluation_interval: 1m
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "xxxx:9093"
rule_files:
- /usr/local/prometheus/rules/*.rules
scrape_configs:
- job_name: 'file-ds'
file_sd_configs:
- refresh_interval: 1m
files:
- ./conf.d/targets*.json
- job_name: 'redis'
file_sd_configs:
- refresh_interval: 1m
files:
- ./conf.d/redis*.json
[xxx@prometheus01 prometheus]# cat conf.d/redis-discovery.json
[
{
"targets": ["xxxx:9121"],
"labels": {
"instance": "preproduct-redis-xxx:9121"
}
}
]
添加报警规则:
[xxx@prometheus01 prometheus]# cat /usr/local/prometheus/rules/redis_alert.rules
groups:
- name: RedisStatsAlert
rules:
- alert: Redis is down
expr: redis_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} Redis is down"
description: "Redis database is down. This requires immediate action!"
# - alert: 最近一次创建 RDB 文件,操作失败
# expr: redis_rdb_last_bgsave_duration_sec == 0
# for: 1m
# labels:
# severity: warning
# annotations:
# summary: " Instance {{ $labels.instance }} rdb_last_bgsave_status "
# description: "最近一次创建 RDB 文件的结果是失败"
- alert: master link status(复制连接当前的状态)
expr: redis_master_link_up == 0
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 复制连接当前断开"
description: "redis_master_link状态是0 关闭状态"
# - alert: 最近一次创建 AOF 文件失败
# expr: redis_aof_last_rewrite_duration_sec == 0
# for: 1m
# labels:
# severity: warning
# annotations:
# summary: "Instance {{ $labels.instance }} redis aof last rewrite duration sec"
# description: "最近一次创建 AOF 文件失败"