Prometheus & AlertManger 采集、评估、告警等配置项

Prometheus

global(全局)配置

  • scrape_interval
    默认情况下抓取指标数据的频率
# How frequently to scrape targets by default.
[ scrape_interval: <duration> | default = 1m ]
  • scrape_timeout
 # How long until a scrape request times out.
[ scrape_timeout: <duration> | default = 10s ]
  • evaluation_interval
    评估规则的频率
# How frequently to evaluate rules.
[ evaluation_interval: <duration> | default = 1m ]

scrape_configs下的scrape_config配置

  • scrape_interval
# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]
  • scrape_timeout
# Per-scrape timeout when scraping this job.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

alerting.alertmanagers下的alertmanager_config配置

  • timeout
# Per-target Alertmanager timeout when pushing alerts.
[ timeout: <duration> | default = 10s ]

AlertManger

全局

  • resolve_timeout

ResolveTimeout是alertmanager使用的默认值,如果 alerts 不包括EndsAt,在这个时间过后,如果 alerts 没有被更新,AlertManager 会将其声明为已解决(Resolved)。
这个参数对来自Prometheus的 alerts 没有影响,因为它们总是包括EndsAt


 # ResolveTimeout is the default value used by alertmanager if the alert does
 # not include EndsAt, after this time passes it can declare the alert as resolved if it has not been updated.
 # This has no impact on alerts from Prometheus, as they always include EndsAt.
 [ resolve_timeout: <duration> | default = 5m ]

route的配置

  • group_wait
    一组告警第一次发送之前等待的时间。用于等待抑制告警,或等待同一组告警采集更多初始告警后一起发送。(一般设置为0秒 一几分钟)
# How long to initially wait to send a notification for a group
# of alerts. Allows to wait for an inhibiting alert to arrive or collect
# more initial alerts for the same group. (Usually ~0s to few minutes.)
# If omitted, child routes inherit the group_wait of the parent route.
[ group_wait: <duration> | default = 30s ]
  • group_interval

一组已发送初始通知的告警接收到新告警后,再次发送通知前等待的时间(一般设置为5分钟或更多)

# How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more.) If omitted, child routes
# inherit the group_interval of the parent route.
[ group_interval: <duration> | default = 5m ]
  • repeat_interval

一条成功发送的告警,在再次发送通知之前等待的时间。(通常设置为3小时或更长时间)

# How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more). If omitted,
# child routes inherit the repeat_interval of the parent route.
# Note that this parameter is implicitly bound by Alertmanager's
# `--data.retention` configuration flag. Notifications will be resent after either
# repeat_interval or the data retention period have passed, whichever
# occurs first. `repeat_interval` should be a multiple of `group_interval`.
[ repeat_interval: <duration> | default = 4h ]

备注:route.routes下的配置也可以配置group_wait、group_interval、repeat_interval这三个配置,对route下相同的这三个配置进行覆盖

group_wait:alertmanager 在接收到一条新的告警(第一次出现的告警)时,将这条告警发送给 receiver 之前需要等待的时间

group_interval:对于一条已经出现过的告警,alertmanager 检查会每隔 group_interval 时间检查一次告警

repeat_interval: 对于一条已经出现过的告警,每隔 repeat_interval 会重新发送给 receiver

https://kkgithub.com/prometheus/alertmanager/issues/2647
https://blog.youkuaiyun.com/acdsxdas/article/details/143477501
https://blog.youkuaiyun.com/qq_37843943/article/details/120665690

分析

1、Prometheus:无论告警规则配置的for(持续时间)为多长;在产生第一个告警后,都得间隔evaluation_interval的时间周期才评估一次是否告警,evaluation_interval模式时间1m

2、Alertmanager:在收到一条新的告警之后,会等待 group_wait 时间,对这条新的告警做一些分组、更新、静默的操作。当第一条告警经过 group_wait 时间之后,Alertmanager 会每隔 group_interval 时间检查一次这条告警,判断是否需要对这条告警进行一些操作,当 Alertmanager 经过 n 次 group_interval 的检查后,n * group_interval 恰好大于 repeat_interval 的时候,Alertmanager 才会将这条告警再次发送给对应的 receiver。

group_wait: 10s
group_interval: 20s
repeat_interval: 30s

从第一次超过阈值到ops-alert的时间计算:

60s (evaluation_interval) + 2 * 20s(group_interval) = 1m40s

Alertmanager到ops-alert是时间计算:

2 * 20s(group_interval) = 40s
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

BUG弄潮儿

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值