promethus + grafana + node-export + alertmanager

最新推荐文章于 2025-05-06 11:51:20 发布

东京的雪铺满巴黎的道

最新推荐文章于 2025-05-06 11:51:20 发布

阅读量341

点赞数 1

文章标签： linux 服务器运维

本文链接：https://blog.youkuaiyun.com/xiaoxiaoxiao_Ming/article/details/120673719

版权

本文介绍了如何安装和配置Node.js的node_exporter，Prometheus用于数据收集，Grafana用于可视化，以及如何通过Alertmanager和DingTalk实现告警通知。详细步骤包括下载、部署、规则设置和告警通知的配置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、node-export安装（提供数据接口）
1、下载安装包：https://prometheus.io/download/
2、解压：tar xf node_exporter-0.15.0.linux-amd64.tar.gz
3、启动：./node_exporter &
4、验证：curl localhost:9100

二、promethus安装部署
1、下载安装包：https://prometheus.io/download/
2、解压 tar xf prometheus-2.0.0-rc.2.linux-amd64.tar.gz
3、启动：./prometheus --config.file=prometheus.yml &
4、访问 http://<服务器IP地址>:9090，验证Prometheus是否已安装成功
5、在promethus.yml文件中增加配置后重启（查询系统数据）

三、grafana安装
1、下载并安装：yum localinstall https://dl.grafana.com/oss/release/grafana-7.2.0-1.x86_64.rpm
默认安装路径/usr/share/grafana 插件目录/usr/share/grafana/public/app/plugins/
2、启动：service grafana-server start（停止及重启：service grafana-server stop ;service grafana-server restart），如果是通过安装包下载的，启动命令如下：./bin/grafana-server web &
3、访问：访问grafana, http://<服务器IP>:3000 （注：不能访问则需要开通防火墙及安全策略的3000端口）

4、配置数据源

选择data sources

选择prometheus,并输入url

5、设置仪表盘（Dashboards）
下载自己需要的仪表盘：https://grafana.com/grafana/dashboards(推荐12633) 需要注意：下载的仪表盘是否兼容grafana版本，不兼容会导致部分插件不可用

如上图导入下载的仪表盘json文件，并选择自己的promethus:

完成之后点击自己配置的仪表盘如下：

四、安装alertmanage实现钉钉告警

1、prometheus-webhook-dingtalk部署

=============================安装包部署===========================
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

# 启动服务
./prometheus-webhook-dingtalk --ding.profile={服务名称}="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}"



==============================docker部署=========================
docker pull timonwong/prometheus-webhook-dingtalk

# 启动容器
docker run -d -p 8060:8060 --name webhook timonwong/prometheus-webhook --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}

2、alertmanager部署

安装包下载地址：Download | Prometheus

#解压
tar zxf alertmanager-0.23.0.linux-amd64.tar.gz
cd alertmanager-0.23.0.linux-amd64

修改alertmanager.yml文件，url输入prometheus-webhook-dingtalk地址，其中webhook就是上文提到的服务名称

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/webhook/send'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

启动服务./alertmanager --config.file=alertmanager.yml &

3、创建告警规则文件(文件位置无所谓，建议是放在prometheus.yml同一级)

cpu_rule.yml

groups:
- name: CPU报警规则
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "服务器: CPU使用超过90%！(当前值: {{ $value }}%)"

memory_rule.yml

groups:
- name: 内存报警规则
  rules:
  - alert: 内存使用率告警
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "服务器: 内存使用超过80%！(当前值: {{ $value }}%)"

4、修改prometheus.yml文件,主要修改alerting及rule_files模块内容，如下：

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/app/middleware/prometheus-2.31.1/cpu_rule.yml"
  - "/app/middleware/prometheus-2.31.1/memory_rule.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]


  - job_name: node_exporter 
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: node_exporter
          group: node_exporter

重启prometheus，访问http://ip:9090/ ，在Alerts模块可以看到配置的告警规则信息：