一、node-export安装(提供数据接口)
1、下载安装包:https://prometheus.io/download/
2、解压:tar xf node_exporter-0.15.0.linux-amd64.tar.gz
3、启动:./node_exporter &
4、验证:curl localhost:9100
二、promethus安装部署
1、下载安装包:https://prometheus.io/download/
2、解压 tar xf prometheus-2.0.0-rc.2.linux-amd64.tar.gz
3、启动:./prometheus --config.file=prometheus.yml &
4、访问 http://<服务器IP地址>:9090,验证Prometheus是否已安装成功
5、在promethus.yml文件中增加配置后重启(查询系统数据)
三、grafana安装
1、下载并安装:yum localinstall https://dl.grafana.com/oss/release/grafana-7.2.0-1.x86_64.rpm
默认安装路径/usr/share/grafana 插件目录/usr/share/grafana/public/app/plugins/
2、启动:service grafana-server start(停止及重启:service grafana-server stop ;service grafana-server restart),如果是通过安装包下载的,启动命令如下:./bin/grafana-server web &
3、访问:访问grafana, http://<服务器IP>:3000 (注:不能访问则需要开通防火墙及安全策略的3000端口)
4、配置数据源
选择data sources
选择prometheus,并输入url
5、设置仪表盘(Dashboards)
下载自己需要的仪表盘:https://grafana.com/grafana/dashboards(推荐12633) 需要注意:下载的仪表盘是否兼容grafana版本,不兼容会导致部分插件不可用
如上图导入下载的仪表盘json文件,并选择自己的promethus:
完成之后点击自己配置的仪表盘如下:
四、安装alertmanage实现钉钉告警
1、prometheus-webhook-dingtalk部署
=============================安装包部署===========================
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
# 启动服务
./prometheus-webhook-dingtalk --ding.profile={服务名称}="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}"
==============================docker部署=========================
docker pull timonwong/prometheus-webhook-dingtalk
# 启动容器
docker run -d -p 8060:8060 --name webhook timonwong/prometheus-webhook --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token={替换成自己的dingding token}
2、alertmanager部署
安装包下载地址:Download | Prometheus
#解压
tar zxf alertmanager-0.23.0.linux-amd64.tar.gz
cd alertmanager-0.23.0.linux-amd64
修改alertmanager.yml文件,url输入prometheus-webhook-dingtalk地址,其中webhook就是上文提到的服务名称
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/webhook/send'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
启动服务./alertmanager --config.file=alertmanager.yml &
3、创建告警规则文件(文件位置无所谓,建议是放在prometheus.yml同一级)
cpu_rule.yml
groups:
- name: CPU报警规则
rules:
- alert: CPU使用率告警
expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90
for: 1m
labels:
user: prometheus
severity: warning
annotations:
description: "服务器: CPU使用超过90%!(当前值: {{ $value }}%)"
memory_rule.yml
groups:
- name: 内存报警规则
rules:
- alert: 内存使用率告警
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
user: prometheus
severity: warning
annotations:
description: "服务器: 内存使用超过80%!(当前值: {{ $value }}%)"
4、修改prometheus.yml文件,主要修改alerting及rule_files模块内容,如下:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/app/middleware/prometheus-2.31.1/cpu_rule.yml"
- "/app/middleware/prometheus-2.31.1/memory_rule.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: node_exporter
static_configs:
- targets: ['localhost:9100']
labels:
instance: node_exporter
group: node_exporter
重启prometheus,访问http://ip:9090/ ,在Alerts模块可以看到配置的告警规则信息:
调整告警规则,可以在钉钉看到告警