Prometheus会在配置文件定义一些告警规则表达式, 当采集的metrics经过聚合, 满足告警表达式条件, 将触发告警, 发送给告警服务Alertmanager. 所以,本文主要分析与Alertmanager交互的通知管理(notifierManager), 但会先梳理下规则管理(ruleManager)的部分内容. 因为告警规则的最终判断是由规则管理(ruleManager)完成.
一: 规则管理(ruleManager)把告警信息发送给通知管理(notifierManager)
(1) 规则管理(ruleManager)调用方法NewManager完成实例化, 其中,参数是结构体ManagerOptions, 我们需要关注的是该结构体中类型为NotifyFunc的方法sendAlerts.
prometheus/cmd/prometheus/main.go
ruleManager = rules.NewManager(&rules.ManagerOptions{
Appendable: fanoutStorage,
TSDB: localStorage,
QueryFunc: rules.EngineQueryFunc(queryEngine, fanoutStorage),
// 若触发告警规则,则通过sendAlerts发送告警信息
NotifyFunc: sendAlerts(notifierManager, cfg.web.ExternalURL.String()),
Context: ctxRule,
ExternalURL: cfg.web.ExternalURL,
Registerer: prometheus.DefaultRegisterer,
Logger: log.With(logger, "component", "rule manager"),
OutageTolerance: time.Duration(cfg.outageTolerance),
ForGracePeriod: time.Duration(cfg.forGracePeriod),
ResendDelay: time.Duration(cfg.resendDelay),
})
prometheus/rules/manager.go
type ManagerOptions struct {
ExternalURL *url.URL
QueryFunc QueryFunc
NotifyFunc NotifyFunc
Context context.Context
Appendable Appendable
TSDB storage.Storage
Logger log.Logger
Registerer prometheus.Registerer
OutageTolerance time.Duration
ForGracePeriod time.Duration
ResendDelay time.Duration
Metrics *Metrics
}
(2) sendAlerts方法主要作用是把规则管理(ruleManager)把告警信息转换成notifier.Alert类型.
prometheus/cmd/prometheus/main.go
//遍历告警信息,构造告警,告警信息大于0, 则发送告警
// sendAlerts implements the rules.NotifyFunc for a Notifier.
func sendAlerts(s sender, externalURL string) rules.NotifyFunc {
return func(ctx context.Context, expr string, alerts ...*rules.Alert) {
var res []*notifier.Alert
// 遍历告警信息
for _, alert := range alerts {
a := ¬ifier.Alert{
StartsAt: alert.FiredAt,
Labels: alert.Labels,
Annotations: alert.Annotations,
GeneratorURL: externalURL + strutil.TableLinkForExpression(expr),
}
// 若告警结束,设置告警结束时间为ResolverdAt时间
if !alert.ResolvedAt.IsZero() {
a.EndsAt = alert.ResolvedAt
} else {
// 若告警还是active转台,设置告警结束时间为当前时间
a.EndsAt = alert.ValidUntil
}
res = append(res, a)
}
// 若是有告警信息, 则发送告警
if len(alerts) > 0 {
s.Send(res...)
}
}
}
(3) 接着由Send方法把notifier.Alert类型的告警信息添加到告警队列中n.queue,其中会涉及到一些队列大小限制,通过先进先出保存最新的告警信息.最后等待通知管理(notifierManager)从队列中取走.
prometheus/notifier/notifier.go
// Send queues the given notification requests for processing.
// Panics if called on a handler that is not running.
func (n *Manager) Send(alerts ...*Alert) {
// 锁管理这里不多介绍
n.mtx.Lock()
defer n.mtx.Unlock()
// Attach external labels before relabelling and sending.
// 根据配置文件prometheus.yml的alert_relabel_configs下的relabel_config对告警的label进行重置
for _, a := rang