prometheus

背景

用于容器的监控和容器的宿主机的监控

prometheus安装

我们可以到官网下载指定版本:https://prometheus.io/download/

wget https://github.com/prometheus/prometheus/releases/download/v2.24.0/prometheus-2.24.0.linux-amd64.tar.gz
tar xf prometheus-2.24.0.linux-amd64.tar.gz
cp prometheus-2.24.0.linux-amd64/prometheus /usr/local/bin/

配置启动配置文件

vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
After=network.target
Documentation=https://prometheus.io/docs/introduction/overview/
 
[Service]
Type=simple
WorkingDirectory=/home/data/prometheus/
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --web.read-timeout=5m \
  --web.max-connections=512 \
  --storage.tsdb.retention=15d \
  --storage.tsdb.path=/home/data/prometheus \
  --query.timeout=2m
 
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

配置配置文件

mkdir /etc/prometheus
cp prometheus-2.24.0.linux-amd64/prometheus.yml /etc/prometheus

自带的配置就可以启动了。

启动

systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus

但是这里我没有选择这种方式启动,而是用docker启动的。
而且为了方便我直接把node_export也下载好了。

node_export

下载地址:https://prometheus.io/download/

wget https://github.com/prometheus/node_exporter/releases/download/v1.2.0/node_exporter-1.2.0.linux-amd64.tar.gz
tar xf node_exporter-1.2.0.linux-amd64.tar.gz
cp node_exporter-1.2.0.linux-amd64/node_export /usr/local/bin/
scp node_exporter-1.2.0.linux-amd64/node_export k8s2:/usr/local/bin/
scp node_exporter-1.2.0.linux-amd64/node_export k8s3:/usr/local/bin/

配置启动文件

vim /etc/systemd/system/node_export.service
[Unit]
Description=Node Export
After=network.target
Documentation=https://prometheus.io/docs/guides/node-exporter/
 
[Service]
Type=simple
WorkingDirectory=/tmp/
ExecStart=/usr/local/bin/node_exporter 
 
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

直接机器启动,这里的配置我在k8s2、k8s3机器都放了。三台机器都需要执行。

systemctl daemon-reload
systemctl start node_export
systemctl enable node_export

启动之后添加配置到prometheus配置文件,如下:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node_export'
    static_configs:
      - targets:
        - k8s1:9100
        - k8s2:9100
        - k8s3:9100

docker启动prometheus

docker run --name prometheus -d -p 0.0.0.0:9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml -v /etc/hosts:/etc/hosts quay.io/prometheus/prometheus

在这里插入图片描述

grafana

  wget https://dl.grafana.com/oss/release/grafana-8.0.6-1.x86_64.rpm
  sudo yum install grafana-8.0.6-1.x86_64.rpm
  systemctl enable grafana-server
  systemctl start grafana-server

默认端口3000。
添加数据源:
在这里插入图片描述
配置上prometheus的端口。默认9090。
在这里插入图片描述

导入模板

先导入一个模板,只需要输入其编号即可。更多的官方 Dashboard 请参见:
https://grafana.com/grafana/dashboards?orderBy=name&direction=asc
在这里插入图片描述

查看指标

在这里插入图片描述

prometheus node_export常用查询

上行带宽:

sum by(instance)(irate(node_network_receive_bytes_total{instance="k8s1:9100",device!~"bond.*?|lo"}[5m] )/128)

在这里插入图片描述
下行带宽:

sum by(instance)(irate(node_network_transmit_bytes_total{instance="k8s1:9100",device!~"bond.*?|lo"}[5m] )/128)

在这里插入图片描述
网卡出包量:

sum by(instance)(rate(node_network_receive_bytes_total{instance="k8s1:9100",device!~"lo"}[5m] ))

网卡入包量:

sum by(instance)(rate(node_network_transmit_bytes_total{instance="k8s1:9100",device!~"lo"}[5m] ))

15分钟负载:

node_load15{instance="k8s1:9100"}

free 内存

node_memory_MemFree_bytes/1024/1024

在这里插入图片描述
磁盘可用率

node_filesystem_free_bytes{instance="k8s1:9100"}/1024/1024

在这里插入图片描述
这里node_export版本不一样,指标应该也是不一样的。有错误可以指点下。感谢。

pushgateway

在prometheus配置文件写上pushgateway的相关信息

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node_export'
    static_configs:
      - targets:
        - k8s1:9100
        - k8s2:9100
        - k8s3:9100
  - job_name: 'pushgateway'
    honor_labels: true #防止pull的job_name被pushgateway的job_name覆盖
    static_configs:
      - targets: ['192.168.0.5:9091']
        labels:
          instance: pushgateway

这里我没有热加载配置,所以我是直接重启了这个prometheus的docker容器。命令省略了。
然后到pushgateway的机器启动容器。

docker run --name pushgateway -d -p 0.0.0.0:9091:9091 -v /etc/hosts:/etc/hosts prom/pushgateway:latest

镜像没有的话直接pull一下即可。
然后打开prometheus的页面就能看到pushgateway了。
在这里插入图片描述
正常情况我们会使用 Client SDK 推送数据到 pushgateway, 但是我们还可以通过 API 来管理, 例如:

echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/some_job

在这里插入图片描述

–data-binary 表示发送二进制数据,注意:它是使用POST方式发送的!

cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/some_job/instance/some_instance
# TYPE some_metric counter
some_metric{label="val1"} 42
# TYPE another_metric gauge
# HELP another_metric Just an example.
another_metric 2398.283
EOF

注意:必须是指定的格式才行。

删除某个组下的某实例的所有数据:

curl -X DELETE http://localhost:9091/metrics/job/some_job/instance/some_instance

删除某个组下的所有数据:

curl -X DELETE http://localhost:9091/metrics/job/some_job

可以发现 pushgateway 中的数据我们通常按照 job 和 instance 分组分类,所以这两个参数不可缺少。

因为 Prometheus 配置 pushgateway 的时候,也会指定 job 和 instance, 但是它只表示 pushgateway 实例,不能真正表达收集数据的含义。所以在 prometheus 中配置 pushgateway 的时候,需要添加 honor_labels: true 参数, 从而避免收集数据本身的 job 和 instance 被覆盖。

注意,为了防止 pushgateway 重启或意外挂掉,导致数据丢失,我们可以通过 -persistence.file 和 -persistence.interval 参数将数据持久化下来。

钉钉告警

首先我们先准备好配置文件

[root@emr-header-1 prometheus]# cat rules/rules.yml 
groups:
- name: pushgateway
  rules:
  - alert: server_status
    expr: up{job="pushgateway"} == 0
    for: 10s
    labels: 
      severity: page
    annotations:
      summary: "机器{{$labels.instance}}挂了"
      description: "注释信息:{{$labels.instance}}挂了"
[root@emr-header-1 prometheus]# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "rules/*"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node_export'
    static_configs:
      - targets:
        - k8s1:9100
        - k8s2:9100
        - k8s3:9100
  - job_name: 'pushgateway'
    static_configs:
      - targets: ['192.168.0.5:9091']
        labels:
          instance: pushgateway

[root@emr-header-1 prometheus]# cat /etc/alertmanager/alertmanager.yml 
global:
  resolve_timeout: 5m
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - send_resolved: true
    url: 'http://localhost:8060/dingtalk/webhook1/send'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

这里是准备了prometheus的配置文件,里面包含rule的位置。然后我们的rule配置里面自定义了一个pushgateway的规则。再就是在alertmanager的配置里面我们配置了端口(这里的端口是dingtalk插件的端口),所以我们还需要dingtalk这个插件。
地址的话在GitHub上搜下就可以了。我是直接选择从docker上pull的。
所以我们这里还是通过docker启动的。

#启动prometheus
docker run --name prometheus -d --net="host" -p 0.0.0.0:9090:9090 -v /etc/prometheus/:/etc/prometheus/ -v /etc/hosts:/etc/hosts quay.io/prometheus/prometheus

#启动alertmanager
docker run --name alertmanager -d --net="host" -p 9093:9093 -v /etc/alertmanager/:/etc/alertmanager/ -v /etc/hosts:/etc/hosts prom/alertmanager:latest --config.file=/etc/alertmanager/alertmanager.yml

#启动dingding插件
docker run -d --name dingtalk --net="host" --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:master --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=6dd77b2f7a27fb59fec031e4e03ad1917cc06f0c95760217a0exxxx"

这里注意钉钉的机器人需要设置包含到你需要告警的关键字,否则会告警失败的。
在这里插入图片描述

告警抑制

在这里插入图片描述
这里需要对应前面的rule写的labels。可以选择抑制时间段。
取消抑制规则如下:
在这里插入图片描述

blackbox_exporter

blockbox_export可以用于端口、网页这些的监控。具体详情的话可以查下比我说的清晰的多,下面介绍具体实现。

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.14.0/blackbox_exporter-0.14.0.linux-amd64.tar.gz

tar解压完后

 mv blackbox_exporter-0.14.0.linux-amd64 /usr/local/blackbox_exporter
vim /lib/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target

[Service]
User=root
Type=simple
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动

systemctl daemon-reload
systemctl start blackbox_exporter
[root@emr-header-1 ~]# netstat -lntup|grep 9115
tcp        0      0 0.0.0.0:9115            0.0.0.0:*               LISTEN      26214/blackbox_expo 

prometheus配置

[root@emr-header-1 rules]# cat ../prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "rules/*"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node_export'
    static_configs:
      - targets:
        - k8s1:9100
        - k8s2:9100
        - k8s3:9100
  - job_name: 'pushgateway'
    static_configs:
      - targets: ['192.168.0.5:9091']
        labels:
          instance: pushgateway

  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['http://www.baidu.com']
        labels:
          instance: http_status
          group: web
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: 
        - 192.168.0.1
        - 192.168.0.2
        - 192.168.0.3
        labels:
          instance: 'ping_status'
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: ping
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'nodemanager'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: 
        - 192.168.0.2:8042
        - 192.168.0.3:8042
        labels:
          instance: 'nodemanager_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: nodemanager
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'datanode'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - 192.168.0.2:50010
        - 192.168.0.3:50010
        labels:
          instance: 'datanode_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: datanode
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115

rule规则

[root@emr-header-1 rules]# cat cpu_over.yml
groups:
- name: CPU报警规则
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "CPU使用率正在飙升。"
      description: "CPU使用率超过50%(当前值:{{ $value }}%)"
[root@emr-header-1 rules]# cat disk_over.yml
groups:
- name: 磁盘使用率报警规则
  rules:
  - alert: 磁盘使用率告警
    expr: 100 - node_filesystem_free_bytes{fstype=~"xfs|ext4"} / node_filesystem_size_bytes{fstype=~"xfs|ext4"} * 100 > 80
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "硬盘分区使用率过高"
      description: "分区使用大于80%(当前值:{{ $value }}%)"

[root@emr-header-1 rules]# cat memory_over.yml
groups:
- name: 内存告警规则
  rules:
  - alert: 内存使用率告警
    expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 60
    for: 30m
    labels:
      serverity: warning
    annotations:
      summary: "服务器可用内存不足"
      description: "机器{{$labels.instance}}内存使用率已经超过60%(当前值{{$value}}%)"
[root@emr-header-1 rules]# cat node_alived.yml
groups:
- name: 实例存活告警规则
  rules:
  - alert: 实例存活告警
    expr: up == 0
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      summary: "主机宕机 !!!"
      description: "主机{{$labels.instance}}已经宕机超过一分钟了。"
[root@emr-header-1 rules]# cat pushgateway_down.yml
groups:
- name: pushgateway告警
  rules:
  - alert: pushgateway告警
    expr: up{job="pushgateway"} == 0
    for: 10s
    labels: 
      severity: page
    annotations:
      summary: "机器{{$labels.instance}}挂了"
      description: "注释信息:{{$labels.instance}}挂了"
[root@emr-header-1 rules]# cat nodemanager_down.yml 
groups:
- name: nodemanager告警 
  rules:
  - alert: nodemanager is down
    expr: probe_success{group="port",instance="nodemanager_status",job="nodemanager"} == 0
    for: 10s
    labels:
      severity: nodemanager
    annotations:
      summary: "nodemanager unavaliable"
      description: "{{$labels.nodemanager}}服务挂了"
[root@emr-header-1 rules]# cat datanode_down.yml 
groups:
- name: datanode告警 
  rules:
  - alert: datanode is down
    expr: probe_success{group="port",instance="datanode_status",job="datanode"} == 0
    for: 10s
    labels:
      severity: datanode
    annotations:
      summary: "datanode unavaliable"
      description: "{{$labels.datanode}}服务挂了"

alertmanager配置

[root@emr-header-1 alertmanager]# cat alertmanager.yml 
global:
  resolve_timeout: 5m
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - send_resolved: true
    url: 'http://localhost:8060/dingtalk/webhook1/send'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

启动

启动prometheus
docker run --name prometheus -d --net="host" -p 0.0.0.0:9090:9090 -v /etc/prometheus/:/etc/prometheus/ -v /etc/hosts:/etc/hosts quay.io/prometheus/prometheus


启动alertmanager
docker run --name alertmanager -d --net="host" -p 9093:9093 -v /etc/alertmanager/:/etc/alertmanager/ -v /etc/hosts:/etc/hosts prom/alertmanager:latest --config.file=/etc/alertmanager/alertmanager.yml



启动dingding插件
docker run -d --name dingtalk --net="host" --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:master --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=6dd77b2f7a27fb59fec031e4e03ad1917cc06f0cxxx217a0edf01fxxxx"

在这里插入图片描述

这里还是要注意钉钉机器人要匹配到对应的字符才能正常发出告警。
在这里插入图片描述

mysql_exporter

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz
tar xf mysqld_exporter-0.12.1.linux-amd64.tar.gz 
mv mysqld_exporter-0.12.1.linux-amd64 /usr/local/mysqld_exporter
cd /usr/local/mysqld_exporter/
#设置访问密码
vi .my.cnf
[client]
user=exporter
password=1qaz@WSX
#登陆到mysql设置用户名密码及授权
CREATE USER 'exporter'@'localhost' IDENTIFIED BY '1qaz@WSX';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';

启动服务
默认端口9104

nohup ./mysqld_exporter --config.my-cnf=.my.cnf &

配置prometheus

  - job_name: 'mysql-exporter'
    static_configs:
      - targets: ['k8s1:9104']
        labels:
          instance: mysql

然后重启prometheus或者热加载一下配置即可。

Grafana 模板 ID:11323
在这里插入图片描述

cadvisor

用于监控docker容器

docker pull google/cadvisor:latest
docker run --privileged=true --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --publish=38080:8080 --detach=true --name=cadvisor google/cadvisor:latest

配置prometheus

  - job_name: docker
    static_configs:
      - targets: ['192.168.0.1:38080']
        labels:
          instance: cadvisor

重启 prometheus(Kill -HUP pid)或者热加载一下
在这里插入图片描述

consul

服务自动发现,解释这里就略过了。直接上配置。

docker pull consul
docker run --name consul -d -p 8500:8500 consul
[root@emr-header-1 prometheus]# cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "rules/*"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['k8s1:9090']
  - job_name: 'node_export'
    static_configs:
      - targets:
        - k8s1:9100
        - k8s2:9100
        - k8s3:9100

#  - job_name: 'consul-prometheus'
#    consul_sd_configs:
#    - server: 'k8s1:8500'
#      services: []
#    relabel_configs:
#      - source_labels: [__meta_consul_tags]
#        regex: .*test.*
#        action: keep
#      - regex: _meta_consul_service_metadata_(.+)
#        action: labelmap

  - job_name: 'consul-node-exporter'
    consul_sd_configs:
    - server: 'k8s1:8500'
      services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*node-exporter.*
        action: keep
      - regex: __meta_consul_service_metadata_(.+)
        action: labelmap

  - job_name: 'consul-cadvisor-exporter'
    consul_sd_configs:
    - server: 'k8s1:8500'
      services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*cadvisor-exporter.*
        action: keep
      - regex: __meta_consul_service_metadata_(.+)
        action: labelmap

  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['192.168.0.5:9091']
        labels:
          instance: pushgateway

  - job_name: 'mysql-exporter'
    static_configs:
      - targets: ['k8s1:9104']
        labels:
          instance: mysql

  - job_name: docker
    static_configs:
      - targets: ['192.168.0.1:38080']
        labels:
          instance: cadvisor

  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['http://www.baidu.com']
        labels:
          instance: http_status
          group: web
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: 
        - 192.168.0.1
        - 192.168.0.2
        - 192.168.0.3
        labels:
          instance: 'ping_status'
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: ping
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'nodemanager'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: 
        - 192.168.0.2:8042
        - 192.168.0.3:8042
        labels:
          instance: 'nodemanager_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: nodemanager
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115

  - job_name: 'datanode'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - 192.168.0.2:50010
        - 192.168.0.3:50010
        labels:
          instance: 'datanode_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: datanode
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.0.1:9115
注册:
curl -X PUT -d '{"id": "node-exporter","name": "node-exporter-k8s1","address": "192.168.0.1","port": 9100,"tags": ["test-k8s1"],"checks": [{"http": "http://192.168.0.1:9100/metrics", "interval": "5s"}]}'  http://192.168.0.1:8500/v1/agent/service/register
删除:
curl -X PUT http://k8s1:8500/v1/agent/service/deregister/node-exporter 
这里使用curl注册上去
[root@emr-header-1 ~]# cat consul-0.json 
{
  "ID": "node-exporter",
  "Name": "node-exporter-k8s1",
  "Tags": [
    "test"
  ],
  "Address": "192.168.0.1",
  "Port": 9100,
  "Meta": {
    "app": "spring-boot",
    "team": "appgroup",
    "project": "bigdata"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://192.168.0.1:9100/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}
curl --request PUT --data @consul-0.json http://k8s1:8500/v1/agent/service/register?replace-existing-checks=1

[root@emr-header-1 ~]# cat consul-1.json 
{
  "ID": "node-exporter",
  "Name": "node-exporter-k8s1",
  "Tags": [
    "node-exporter"
  ],
  "Address": "192.168.0.1",
  "Port": 9100,
  "Meta": {
    "app": "spring-boot",
    "team": "appgroup",
    "project": "bigdata"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://192.168.0.1:9100/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}
curl --request PUT --data @consul-1.json http://k8s1:8500/v1/agent/service/register?replace-existing-checks=1

[root@emr-header-1 ~]# cat consul-2.json
{
  "ID": "cadvisor-exporter",
  "Name": "cadvisor-exporter-k8s1",
  "Tags": [
    "cadvisor-exporter"
  ],
  "Address": "192.168.0.1",
  "Port": 38080,
  "Meta": {
    "app": "docker",
    "team": "cloudgroup",
    "project": "docker-service"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://192.168.0.1:38080/metrics",
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}
curl --request PUT --data @consul-2.json http://k8s1:8500/v1/agent/service/register?replace-existing-checks=1

在这里插入图片描述

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值