promethus + grafana + node-export + alertmanager

本文介绍了如何安装和配置Node.js的node_exporter,Prometheus用于数据收集,Grafana用于可视化,以及如何通过Alertmanager和DingTalk实现告警通知。详细步骤包括下载、部署、规则设置和告警通知的配置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、node-export安装(提供数据接口)
   1、下载安装包:https://prometheus.io/download/
   2、解压:tar xf node_exporter-0.15.0.linux-amd64.tar.gz
   3、启动:./node_exporter &
   4、验证:curl localhost:9100

二、promethus安装部署
   1、下载安装包:https://prometheus.io/download/
   2、解压 tar xf prometheus-2.0.0-rc.2.linux-amd64.tar.gz
   3、启动:./prometheus  --config.file=prometheus.yml &
   4、访问 http://<服务器IP地址>:9090,验证Prometheus是否已安装成功
   5、在promethus.yml文件中增加配置后重启(查询系统数据) 


三、grafana安装
   1、下载并安装:yum localinstall https://dl.grafana.com/oss/release/grafana-7.2.0-1.x86_64.rpm
      默认安装路径/usr/share/grafana  插件目录/usr/share/grafana/public/app/plugins/
   2、启动:service grafana-server start(停止及重启:service grafana-server stop ;service grafana-server restart),如果是通过安装包下载的,启动命令如下:./bin/grafana-server web &
   3、访问:访问grafana, http://<服务器IP>:3000 (注:不能访问则需要开通防火墙及安全策略的3000端口)
  
  4、配置数据源

选择data sources 

 选择prometheus,并输入url

 5、设置仪表盘(Dashboards)
下载自己需要的仪表盘:https://grafana.com/grafana/dashboards(推荐12633) 需要注意:下载的仪表盘是否兼容grafana版本,不兼容会导致部分插件不可用

如上图导入下载的仪表盘json文件,并选择自己的promethus: 

完成之后点击自己配置的仪表盘如下:

四、安装alertmanage实现钉钉告警

1、prometheus-webhook-dingtalk部署

=============================安装包部署===========================
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

# 启动服务
./prometheus-webhook-dingtalk --ding.profile={服务名称}="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}"



==============================docker部署=========================
docker pull timonwong/prometheus-webhook-dingtalk

# 启动容器
docker run -d -p 8060:8060 --name webhook timonwong/prometheus-webhook --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}

2、alertmanager部署

安装包下载地址:Download | Prometheus

#解压
tar zxf alertmanager-0.23.0.linux-amd64.tar.gz
cd alertmanager-0.23.0.linux-amd64

修改alertmanager.yml文件,url输入prometheus-webhook-dingtalk地址,其中webhook就是上文提到的服务名称

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/webhook/send'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

启动服务./alertmanager --config.file=alertmanager.yml &

3、创建告警规则文件(文件位置无所谓,建议是放在prometheus.yml同一级)

cpu_rule.yml

groups:
- name: CPU报警规则
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "服务器: CPU使用超过90%!(当前值: {{ $value }}%)"

memory_rule.yml

groups:
- name: 内存报警规则
  rules:
  - alert: 内存使用率告警
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "服务器: 内存使用超过80%!(当前值: {{ $value }}%)"

4、修改prometheus.yml文件,主要修改alerting及rule_files模块内容,如下:

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/app/middleware/prometheus-2.31.1/cpu_rule.yml"
  - "/app/middleware/prometheus-2.31.1/memory_rule.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]


  - job_name: node_exporter 
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: node_exporter
          group: node_exporter

重启prometheus,访问http://ip:9090/ ,在Alerts模块可以看到配置的告警规则信息:

 调整告警规则,可以在钉钉看到告警

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值