前言
作者目前打算分享一期关于devOps系列的文章,希望对热爱学习和探索的你有所帮助。
文章主要记录一些简洁、高效的运维部署指令,旨在 记录和能够快速地构建系统。就像运维文档或者手册一样,方便进行系统的重建、改造和优化。每篇文章独立出来,可以单独作为其中一项组件的部署和使用。
本章为 devOps系列(七)grafana+prometheus监控告警
大纲
devOps系列(六)grafana+prometheus搭建
devOps系列(七)grafana+prometheus监控告警
devOps系列(八)efk+prometheus+grafana日志监控和告警
使用 prometheus + blackbox-exporter + alertmanager 做http的接口监控和告警
安装blackbox-exporter
docker run --restart=always -d --name blackbox-exporter -p 9115:9115 prom/blackbox-exporter
好了之后 http://localhost:9115可以访问查看
修改prometheus.yml
rule_files:
- "blackbox_rules.yml"
scrape_configs:
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
file_sd_configs:
- files: ['http_check.yml']
refresh_interval: 10s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.20.2:9115 #blackbox-exporter 所在的机器和端口
rule_files 下面添加blackbox_rules.yml
scrape_configs下面添加job
http_check.yml中添加检查接口
vi blackbox_rules.yml
groups:
- name: 服务探测
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 0m
labels:
severity: critical
team: node
annotations:
summary: Blackbox probe failed (instance {
{ $labels.instance }})
description: "服务在线检查失败\n当前值= {
{ $value }}\nIp = {
{ $labels.ip }}\nDomain= {
{ $labels.domain }}\nEnv= {
{ $labels.env }}\n服务名= {
{ $labels.service }}"
- alert: BlackboxProbeHttpFailure
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
for: 0m
labels:
severity: critical
team: node
annotations:
summary: Blackbox probe HTTP failure (instance {
{ $labels.instance }})
description: "HTTP状态码不在200-399\n当前值= {
{ $value }}\nIp = {
{ $labels.ip }}\nDomain= {
{ $labels.domain }}\nEnv= {
{ $labels.env }}\n服务名= {
{ $labels.service }}"
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 0m
labels:
severity: warning
annotations:
summary: Blackbox SSL certificate will expire soon (instance {
{ $labels.instance }})
description: "SSL