Prometheus_prometheus csdn-优快云博客

Prometheus 是一个开源的系统监控和告警工具包，特点：

数据模型基于时序数据库（Time Series）

查询语言 PromQL 强大灵活

无需依赖外部存储，内置数据存储引擎

与 Grafana 配合使用极为广泛

一、安装部署

crontab -e

#1. sync time xiaop
*/2 * * * * /sbin/ntpdate ntp1.aliyun.com 
&>/dev/null

#如果服务器时间不准，启用 NTP 同步时间
yum install -y chrony
systemctl enable chronyd --now
chronyc sources
chronyc tracking

mkdir -p /app/    ##创建安装目录

tar xf prometheus-2.33.3.linux-amd64.tar.gz  -C/app/  #自己准备好的包

ln -s /app/prometheus-2.33.3.linux-amd64/ /app/prometheus  ##软链接

prometheus prometheus  ##服务端的命令


prometheus.yml      ##配置文件。

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

##scrape_interval：采集间隔
##scrape_configs：定义采集目标（Job）

二、启动

#前台启动
cd /app/tools/prometheus
 ./prometheus
 
[root@prome ~]# ps -ef |grep prome
root       1699   1533  2 22:19 pts/0    00:00:00 ./prometheus
root       1728   1710  0 22:19 pts/1    00:00:00 grep --color=auto prome

[root@prome ~]# ss -lntup |grep prome
tcp    LISTEN     0      128      :::9090                 :::*                   users:((prometheus",pid=1699,fd=7))

#后台启动
nohup /app/prometheus/prometheus &>>/var/log/prometheus.log &

##访问
192.168.200.151:9090

三、systemctl管理

vim /usr/lib/systemd/system/prometheus.service

[Unit]
Description=xaiop Prometheus Server
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/app/prometheus/prometheus --config.file=/app/prometheus/prometheus.yml
Restart=on-failure
RestartSec=5
KillMode=process

[Install]
WantedBy=multi-user.target


#加载配置文件
systemctl daemon-reload

四、Prometheus的exporter

1、常见 Exporter 列表

Exporter 用途默认端口
Node Exporter 服务器资源监控 9100
Blackbox Exporter 网络探测 9115
MySQL Exporter 数据库监控 9104
Nginx Exporter Nginx 连接数等 9113
cAdvisor 容器监控 8080
kube-state-metrics K8s 状态监控 8080

Exporter	用途	默认端口
Node Exporter	服务器资源监控	9100
Blackbox Exporter	网络探测	9115
MySQL Exporter	数据库监控	9104
Nginx Exporter	Nginx 连接数等	9113
cAdvisor	容器监控	8080
kube-state-metrics	K8s 状态监控	8080

2、部署node_exporter

##解压到 /app/tools/

tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C  /app/tools/

##软链接
ln -s /app/tools/node_exporter-1.3.1.linux-amd64/ /app/tools/node_exporter

ln -s /app/tools/node_exporter/node_exporter /bin/

##仓库版本
node_exporter --version

配置systemctl

vim /usr/lib/systemd/system/nodeexporter.service

[Unit]
Description=prometheus node_exporter
After=network.target
[Service]
Type=simple
ExecStart=/bin/node_exporter
KillMode=process
[Install]
WantedBy=multi-user.target

systemctl daemon-reload 
systemctl enable --now node-exporter

3、配置prometheus服务端

[root@prome prometheus]# cat prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: 'linux-server1'
    static_configs:
      - targets: ['192.168.200.181:9100']

  - job_name: 'linux-server2'
    static_configs:
      - targets: ['192.168.200.183:9100']

[root@prome prometheus]#

4、检查结果

5、过滤数据

五、prometheus动态配置文件

获取的exporter域名+端口，写入配置文件，prometheus可以定时读取并加入到prometheus中。
用于解决，大量主机添加、删除的操作。

cat prometheus.yml

rule_files:
scrape_configs:
  - job_name: "xiaop_prometheus_server"
   static_configs:
    - targets: ["localhost:9090"]

  - job_name: "xiaop_basic_info_node_exporter_discovery"
   file_sd_configs:
    - files:
      - /app/tools/prometheus/discovery_node_exporter.json
     refresh_interval: 5s
     
     
#动态加载的配置文件格式
cat discovery_node_exporter.json 
[
 {
    "targets": [  
                 "prom.linux.cn:9100",
                 "gra.linux.cn:9100"
               ]
 }
]



##说明
file_sd_configs: 动态读取与加载配置文件。
files: 指定要加载的配置文件.
refresh_interval 读取间隔.

六、PromQL 查询语法

直接写上prometheus的键值（key）
比如通过node_exporter获取所有主机可用内存输入 node_memory_MemFree_bytes
node_memory_MemFree_bytes

#加入运算符后
node_memory_MemFree_bytes/1024^2

#过滤负载: 
node_load1

#查看所有键值

1、包含条件的过滤

过滤指定主机的数据
node_cpu_seconds_total{instance="192.168.200:9100"}
过滤出指定主机cpu信息只要iowait信息
node_cpu_seconds_total{instance="web.com:9100",mode="iowait"}

2. 聚合操作

avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance)

3.计算主机 CPU 使用率

1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)

七、Grafana可视化

八、告警配置（Alertmanager）

##使用流程：

部署alertermanager(prometheus服务端)

修改alertmanger配置

配置告警规则rules与修改服务端配置1

1、Alertmanager部署

#下载
wget -P /server/tools/ https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz


tar xf alertmanager-0.24.0.linux-amd64.tar.gz  -C /app/prometheus

ln -s /app/prometheus/alertmanager-0.24.0.linux-amd64/alertmanager /bin/

ln -s /app/prometheus/alertmanager-0.24.0.linux-amd64/ /app/prometheus/alertmanager

#启动参数
--config.file="alertmanager.yml"
--web.listen-address=":9093"

##前台启动测试
alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml



##二方法（提前准备好包）
#1. 解压
 tar xf alertmanager-0.24.0.linux-amd64.tar.gz -C /app/tools/
#2. 软连接
 ln -s /app/tools/alertmanager-0.24.0.linux-amd64/ /app/tools/alertmanager
 ln -s /app/tools/alertmanager/alertmanager /bin/
#3. 检查
 alertmanager --version 
#4. 启动
 alertmanager --config.file=/app/tools/alertmanager/alertmanager.yml

2、Prometheus 中配置告警规则

rule_files:
  - "alert.rules.yml"

示例 alert.rules.yml：

groups:
  - name: node-alerts
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"

Prometheus 指向 Alertmanager

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093  ##Alertmanager地址
rule_files:
  - "/app/prometheus/prometheus_alert_rules.yml"
scrape_configs:
  - job_name: "prometheus-server"
   static_configs:
      - targets: ["localhost:9090"]
       - job_name: "prometheus-all-node_exporter-two"
   file_sd_configs:
    - files:
      - /app/prometheus/discovery_node_exporter.yml
     refresh_interval: 5s
  - job_name: "pushgateway"
   static_configs:
      - targets:
        - "gra.linux.cn:9091"

#####有待补充