k8s监控-Prometheus

目录

一 Prometheus简介

1.1 Prometheus架构

1.1.1 组件功能:

二 在k8s中部署Prometheus

2.1 下载部署Prometheus所需资源

2.2 部署步骤

2.3 登陆grafana

2.4 导入面板

2.5 访问Prometheus主程序

三 监控使用示例

3.1 建立监控项目

3.2 监控调整


一 Prometheus简介

Prometheus是一个开源的服务监控系统和时序数据库

其提供了通用的数据模型和快捷数据采集、存储和查询接口

它的核心组件Prometheus服务器定期从静态配置的监控目标或者基于服务发现自动配置的目标中进行拉取数据

新拉取到啊的 数据大于配置的内存缓存区时,数据就会持久化到存储设备当中

1.1 Prometheus架构

1.1.1 组件功能:

  • 监控代理程序:如node_exporter:收集主机的指标数据,如平均负载、CPU、内存、磁盘、网络等等多个维度的指标数据。

  • kubelet(cAdvisor):收集容器指标数据,也是K8S的核心指标收集,每个容器的相关指标数据包括:CPU使用率、限额、文件系统读写限额、内存使用率和限额、网络报文发送、接收、丢弃速率等等。

  • API Server:收集API Server的性能指标数据,包括控制队列的性能、请求速率和延迟时长等等

  • etcd:收集etcd存储集群的相关指标数据

  • kube-state-metrics:该组件可以派生出k8s相关的多个指标数据,主要是资源类型相关的计数器和元数据信息,包括制定类型的对象总数、资源限额、容器状态以及Pod资源标签系列等。

  • 每个被监控的主机都可以通过专用的exporter程序提供输出监控数据的接口,并等待Prometheus服务器周期性的进行数据抓取

  • 如果存在告警规则,则抓取到数据之后会根据规则进行计算,满足告警条件则会生成告警,并发送到Alertmanager完成告警的汇总和分发

  • 当被监控的目标有主动推送数据的需求时,可以以Pushgateway组件进行接收并临时存储数据,然后等待Prometheus服务器完成数据的采集

  • 任何被监控的目标都需要事先纳入到监控系统中才能进行时序数据采集、存储、告警和展示

  • 监控目标可以通过配置信息以静态形式指定,也可以让Prometheus通过服务发现的机制进行动态管理

二 在k8s中部署Prometheus

2.1 下载部署Prometheus所需资源

#在helm中添加Prometheus仓库(网络巨好才做)
[root@k8s-master helm]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories

#下载Prometheus项目
[root@k8s-master helm]# helm pull prometheus-community/kube-prometheus-stack
[root@k8s-master helm]# ls
kube-prometheus-stack-62.6.0.tgz

================================================================

2.2 部署步骤

根据所有项目中的values.yaml中指定的image路径下载容器镜像并上传至harbor仓库

[root@k8s-master ~]# mkdir prometheus
[root@k8s-master ~]# cd prometheus/

[root@k8s-master prometheus]# ls
grafana-11.2.0.tar                kube-state-metrics-2.13.0.tar  nginx-exporter-1.3.0-debian-12-r2.tar  prometheus-62.6.0.tar
kube-prometheus-stack-62.6.0.tgz  nginx-18.1.11.tgz              node-exporter-1.8.2.tar

-----------------------------------------------------------------------------------
[root@k8s-master prometheus]# tar zxf kube-prometheus-stack-62.6.0.tgz 
[root@k8s-master prometheus]# cd kube-prometheus-stack/
[root@k8s-master kube-prometheus-stack]# ls
Chart.lock  charts  Chart.yaml  CONTRIBUTING.md  README.md  templates  values.yaml

#修改到本地harbor仓库
[root@k8s-master kube-prometheus-stack]# vim values.yaml
 227   imageRegistry: "reg.exam.com"

----------------------------------------------------------------------------------
#导入镜像
[root@k8s-master prometheus]# docker load -i prometheus-62.6.0.tar 
Loaded image: quay.io/prometheus/prometheus:v2.54.1
Loaded image: quay.io/thanos/thanos:v0.36.1
Loaded image: quay.io/prometheus/alertmanager:v0.27.0
Loaded image: quay.io/prometheus-operator/admission-webhook:v0.76.1
Loaded image: registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
Loaded image: quay.io/prometheus-operator/prometheus-operator:v0.76.1
Loaded image: quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1

#打包并上传镜像
[root@k8s-master prometheus]# docker tag quay.io/prometheus/prometheus:v2.54.1 reg.exam.com/prometheus/prometheus:v2.54.1
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus/prometheus:v2.54.1

[root@k8s-master prometheus]# docker tag quay.io/thanos/thanos:v0.36.1 reg.exam.com/thanos/thanos:v0.36.1
[root@k8s-master prometheus]# docker push reg.exam.com/thanos/thanos:v0.36.1

[root@k8s-master prometheus]# docker tag quay.io/prometheus/alertmanager:v0.27.0 reg.exam.com/prometheus/alertmanager:v0.27.0
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus/alertmanager:v0.27.0

[root@k8s-master prometheus]# docker tag quay.io/prometheus-operator/admission-webhook:v0.76.1 reg.exam.com/prometheus-operator/admission-webhook:v0.76.1
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus-operator/admission-webhook:v0.76.1

[root@k8s-master prometheus]# docker tag registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6 reg.exam.com/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
[root@k8s-master prometheus]# docker push reg.exam.com/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6

[root@k8s-master prometheus]# docker tag quay.io/prometheus-operator/prometheus-operator:v0.76.1 reg.exam.com/prometheus-operator/prometheus-operator:v0.76.1
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus-operator/prometheus-operator:v0.76.1

[root@k8s-master prometheus]# docker tag quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1 reg.exam.com/prometheus-operator/prometheus-config-reloader:v0.76.1
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus-operator/prometheus-config-reloader:v0.76.1

#更改仓库地址
[root@k8s-master prometheus]# cd kube-prometheus-stack/charts/grafana/

[root@k8s-master grafana]# pwd
/root/prometheus/kube-prometheus-stack/charts/grafana

[root@k8s-master grafana]# vim values.yaml 
3   	imageRegistry: "reg.exam.com"
418     tag: "latest"

-----------------------------------------------------------------------------

#导入grafana镜像包
[root@k8s-master prometheus]# docker load -i grafana-11.2.0.tar 
Loaded image: grafana/grafana:11.2.0
Loaded image: quay.io/kiwigrid/k8s-sidecar:1.27.4
Loaded image: grafana/grafana-image-renderer:latest
Loaded image: bats/bats:v1.4.1

#打包上传到harbor仓库
[root@k8s-master prometheus]# docker tag grafana/grafana:11.2.0 reg.exam.com/grafana/grafana:11.2.0
[root@k8s-master prometheus]# docker push reg.exam.com/grafana/grafana:11.2.0

[root@k8s-master prometheus]# docker tag quay.io/kiwigrid/k8s-sidecar:1.27.4 reg.exam.com/kiwigrid/k8s-sidecar:1.27.4
[root@k8s-master prometheus]# docker push reg.exam.com/kiwigrid/k8s-sidecar:1.27.4

[root@k8s-master prometheus]# docker tag grafana/grafana-image-renderer:latest reg.exam.com/grafana/grafana-image-renderer:latest
[root@k8s-master prometheus]# docker push reg.exam.com/grafana/grafana-image-renderer:latest

[root@k8s-master prometheus]# docker tag bats/bats:v1.4.1 reg.exam.com/bats/bats:v1.4.1
[root@k8s-master prometheus]# docker push reg.exam.com/bats/bats:v1.4.1 

#修改配置文件中的仓库地址
[root@k8s-master kube-state-metrics]# pwd
/root/prometheus/kube-prometheus-stack/charts/kube-state-metrics
[root@k8s-master kube-state-metrics]# ls
Chart.yaml  README.md  templates  values.yaml
[root@k8s-master kube-state-metrics]# vim values.yaml 
  4   registry: reg.exam.com
 29   imageRegistry: "reg.exam.com"

-------------------------------------------------------------------------------------

#导入镜像
[root@k8s-master prometheus]# docker load -i kube-state-metrics-2.13.0.tar 
Loaded image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
Loaded image: quay.io/brancz/kube-rbac-proxy:v0.18.0

#打包上传
[root@k8s-master prometheus]# docker tag registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0 reg.exam.com/kube-state-metrics/kube-state-metrics:v2.13.0
[root@k8s-master prometheus]# docker push reg.exam.com/kube-state-metrics/kube-state-metrics:v2.13.0

[root@k8s-master prometheus]# docker tag quay.io/brancz/kube-rbac-proxy:v0.18.0 reg.exam.com/brancz/kube-rbac-proxy:v0.18.0
[root@k8s-master prometheus]# docker push reg.exam.com/brancz/kube-rbac-proxy:v0.18.0

#修改node监控配置文件仓库地址
[root@k8s-master prometheus-node-exporter]# pwd
/root/prometheus/kube-prometheus-stack/charts/prometheus-node-exporter
[root@k8s-master prometheus-node-exporter]# ls
Chart.yaml  ci  README.md  templates  values.yaml

[root@k8s-master prometheus-node-exporter]# vim values.yaml 
  5   registry: reg.exam.com
 36   imageRegistry: "reg.exam.com"
 
------------------------------------------------------------------------------

#导入node镜像
[root@k8s-master prometheus]# docker load -i node-exporter-1.8.2.tar 
Loaded image: quay.io/prometheus/node-exporter:v1.8.2
Loaded image: quay.io/brancz/kube-rbac-proxy:v0.18.0	#已上传

#打包上传
[root@k8s-master prometheus]# docker tag quay.io/prometheus/node-exporter:v1.8.2 reg.exam.com/prometheus/node-exporter:v1.8.2
[root@k8s-master prometheus]# docker push reg.exam.com/prometheus/node-exporter:v1.8.2

[root@k8s-master prometheus]# docker tag quay.io/brancz/kube-rbac-proxy:v0.18.0 reg.exam.com/brancz/kube-rbac-proxy:v0.18.0 
[root@k8s-master prometheus]# docker push reg.exam.com/brancz/kube-rbac-proxy:v0.18.0

================================================================================

创建命名空间

[root@k8s-master prometheus]# kubectl create namespace kube-prometheus-stack
namespace/kube-prometheus-stack created

[root@k8s-master prometheus]# kubectl get namespaces 
NAME                    STATUS   AGE
default                 Active   154m
kube-node-lease         Active   154m
kube-prometheus-stack   Active   8s
kube-public             Active   154m
kube-system             Active   154m
metallb-system          Active   54m

利用helm安装Prometheus !注意,在安装过程中千万别ctrl+c!

[root@k8s-master prometheus]# cd kube-prometheus-stack/
[root@k8s-master kube-prometheus-stack]# 

# . 代表当前位置/root/prometheus/kube-prometheus-stack
[root@k8s-master kube-prometheus-stack]# helm -n kube-prometheus-stack install kube-prometheus-stack .
NAME: kube-prometheus-stack
LAST DEPLOYED: Thu Sep 12 20:54:37 2024
NAMESPACE: kube-prometheus-stack
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace kube-prometheus-stack get pods -l "release=kube-prometheus-stack"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.


查看所有pod是否运行
[root@k8s-master kube-prometheus-stack]# kubectl --namespace kube-prometheus-stack get pods
NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          23m
kube-prometheus-stack-grafana-548c8fb6c4-29qdc              3/3     Running   0          23m
kube-prometheus-stack-kube-state-metrics-6688476957-n26gn   1/1     Running   0          23m
kube-prometheus-stack-operator-587f4b669b-8ztmk             1/1     Running   0          23m
kube-prometheus-stack-prometheus-node-exporter-j6j4t        1/1     Running   0          23m
kube-prometheus-stack-prometheus-node-exporter-pccpc        1/1     Running   0          23m
kube-prometheus-stack-prometheus-node-exporter-t77b8        1/1     Running   0          23m
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          23m

查看svc
[root@k8s-master kube-prometheus-stack]# kubectl -n kube-prometheus-stack get svc
NAME                                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                            ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   23m
kube-prometheus-stack-alertmanager               ClusterIP   10.104.96.90     <none>        9093/TCP,8080/TCP            23m
kube-prometheus-stack-grafana                    ClusterIP   10.103.122.224   <none>        80/TCP                       23m
kube-prometheus-stack-kube-state-metrics         ClusterIP   10.104.185.222   <none>        8080/TCP                     23m
kube-prometheus-stack-operator                   ClusterIP   10.98.25.116     <none>        443/TCP                      23m
kube-prometheus-stack-prometheus                 ClusterIP   10.102.144.68    <none>        9090/TCP,8080/TCP            23m
kube-prometheus-stack-prometheus-node-exporter   ClusterIP   10.98.117.125    <none>        9100/TCP                     23m
prometheus-operated                              ClusterIP   None             <none>        9090/TCP                     23m

修改暴漏方式
[root@k8s-master kube-prometheus-stack]# kubectl -n kube-prometheus-stack edit svc kube-prometheus-stack-grafana 
39   type: LoadBalancer

各个svc的作用
alertmanager-operated 			告警管理
kube-prometheus-stack-grafana 	 展示prometheus采集到的指标
kube-prometheus-stack-prometheus-node-exporter 收集节点级别的指标的工具
kube-prometheus-stack-prometheus 主程序

各个svc的作用

alertmanager-operated 告警管理

kube-prometheus-stack-grafana 展示prometheus采集到的指标

kube-prometheus-stack-prometheus-node-exporter 收集节点级别的指标的工具

kube-prometheus-stack-prometheus 主程

2.3 登陆grafana

查看grafana密码
[root@k8s-master helm]# kubectl -n kube-prometheus-stack get secrets kube-prometheus-stack-grafana -o yaml
apiVersion: v1
data:
  admin-password: cHJvbS1vcGVyYXRvcg==
  admin-user: YWRtaW4=
  ldap-toml: ""
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: kube-prometheus-stack
  creationTimestamp: "2024-09-12T12:54:47Z"
  labels:
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: grafana
    app.kubernetes.io/version: 11.2.0
    helm.sh/chart: grafana-8.5.1
  name: kube-prometheus-stack-grafana
  namespace: kube-prometheus-stack
  resourceVersion: "16943"
  uid: d19640ae-4b79-4013-ba03-e039fc98b493
type: Opaque

查看密码
[root@k8s-master helm]# echo -n "cHJvbS1vcGVyYXRvcg==" | base64 -d
prom-operator		#密码

prom-operator[root@k8s-master helm]# echo "YWRtaW4=" | base64 -d
admin				#用户

[root@k8s-master helm]# kubectl -n kube-prometheus-stack get svc
NAME                                             TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
alertmanager-operated                            ClusterIP      None             <none>          9093/TCP,9094/TCP,9094/UDP   29m
kube-prometheus-stack-alertmanager               ClusterIP      10.104.96.90     <none>          9093/TCP,8080/TCP            29m
kube-prometheus-stack-grafana                    LoadBalancer   10.103.122.224   172.25.250.50   80:31471/TCP                 29m
kube-prometheus-stack-kube-state-metrics         ClusterIP      10.104.185.222   <none>          8080/TCP                     29m
kube-prometheus-stack-operator                   ClusterIP      10.98.25.116     <none>          443/TCP                      29m
kube-prometheus-stack-prometheus                 ClusterIP      10.102.144.68    <none>          9090/TCP,8080/TCP            29m
kube-prometheus-stack-prometheus-node-exporter   ClusterIP      10.98.117.125    <none>          9100/TCP                     29m
prometheus-operated                              ClusterIP      None             <none>          9090/TCP                     29m

#用分配的IP在网页查看

2.4 导入面板

官方监控模板:Grafana dashboards | Grafana Labs

面板不行就换

2.5 访问Prometheus主程序

[root@k8s-master helm]# kubectl -n kube-prometheus-stack edit svc kube-prometheus-stack-prometheus
48   type: LoadBalancer

[root@k8s-master helm]# kubectl -n kube-prometheus-stack get svc kube-prometheus-stack-prometheus
NAME                               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                         AGE
kube-prometheus-stack-prometheus   LoadBalancer   10.102.144.68   172.25.250.51   9090:30607/TCP,8080:32132/TCP   43m

网页登录172.25.250.51:9090

三 监控使用示例

3.1 建立监控项目

[root@k8s-master ~]# mkdir test
[root@k8s-master ~]# cd test/
[root@k8s-master test]# ls
nginx-18.1.11.tgz  nginx-exporter-1.3.0-debian-12-r2.tar

[root@k8s-master test]# tar zxf nginx-18.1.11.tgz 
[root@k8s-master test]# cd nginx/

修改项目开启监控
[root@k8s-master nginx]# vim values.yaml 
 925 metrics:
 926   ## @param metrics.enabled Start a Prometheus exporter sidecar container
 927   ##
 928   enabled: true		#改为true
...
1015   serviceMonitor:
1016     ## @param metrics.serviceMonitor.enabled Creates a Prometheus Operator ServiceMonitor (also requires `metrics.enable     d` to be `true`)
1017     ##
1018     enabled: true			#改为true
1019     ## @param metrics.serviceMonitor.namespace Namespace in which Prometheus is running
1020     ##
1021     namespace: "kube-prometheus-stack"		#更改命名空间
1022     ## @param metrics.serviceMonitor.jobLabel The name of the label on the target service to use as the job name in prom     etheus.
1023     ##
...
1046     labels: 
1047       release: kube-prometheus-stack		#添加指定监控标签

#查看标签
[root@k8s-master nginx]# kubectl -n kube-prometheus-stack get servicemonitors.monitoring.coreos.com --show-labels

安装项目,在安装之前一定要上传镜像到仓库中

[root@k8s-master nginx]# ls
nginx-1.27.1-debian-12-r2.tar

#第一个nginx
[root@k8s-master nginx]# docker load -i nginx-1.27.1-debian-12-r2.tar
30f5b1069b7f: Loading layer [==================================================>]  190.1MB/190.1MB
Loaded image: bitnami/nginx:1.27.1-debian-12-r2

[root@k8s-master nginx]# docker tag bitnami/nginx:1.27.1-debian-12-r2 reg.exam.com/bitnami/nginx:1.27.1-debian-12-r2
[root@k8s-master nginx]# docker push reg.exam.com/bitnami/nginx:1.27.1-debian-12-r2

#第二个nginx
[root@k8s-master nginx]# docker load -i nginx-exporter-1.3.0-debian-12-r2.tar 
016ff07f0ae3: Loading layer [==================================================>]  149.3MB/149.3MB
Loaded image: bitnami/nginx-exporter:1.3.0-debian-12-r2

[root@k8s-master nginx]# docker tag bitnami/nginx-exporter:1.3.0-debian-12-r2 reg.exam.com/bitnami/nginx-exporter:1.3.0-debian-12-r2
[root@k8s-master nginx]# docker push reg.exam.com/bitnami/nginx-exporter:1.3.0-debian-12-r2


#安装chart包
[root@k8s-master nginx]# helm install howe .
NAME: howe
LAST DEPLOYED: Thu Sep 12 21:52:15 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: nginx
CHART VERSION: 18.1.11
APP VERSION: 1.27.1

[root@k8s-master nginx]# kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
howe-nginx-54c97cb888-x5hhh   2/2     Running   0          21s

[root@k8s-master nginx]# kubectl get svc
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                     AGE
kubernetes   ClusterIP      10.96.0.1       <none>          443/TCP                                     4h56m
test-nginx   LoadBalancer   10.102.161.61   172.25.250.52   80:30614/TCP,443:31390/TCP,9113:32254/TCP   30s

[root@k8s-master nginx]# curl 172.25.250.52
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>


压力测试:
[root@k8s-master nginx]# ab -c 5 -n 100 http://172.25.250.52/index.html
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 172.25.250.52 (be patient).....done


Server Software:        nginx
Server Hostname:        172.25.250.52
Server Port:            80

Document Path:          /index.html
Document Length:        615 bytes

Concurrency Level:      5
Time taken for tests:   0.033 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      87000 bytes
HTML transferred:       61500 bytes
Requests per second:    2991.15 [#/sec] (mean)
Time per request:       1.672 [ms] (mean)
Time per request:       0.334 [ms] (mean, across all concurrent requests)
Transfer rate:          2541.31 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.1      0       1
Processing:     0    1   0.9      1       6
Waiting:        0    1   0.8      1       6
Total:          1    1   0.9      1       6
ERROR: The median and mean for the initial connection time are more than twice the standard
       deviation apart. These results are NOT reliable.

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      2
  90%      2
  95%      3
  98%      6
  99%      6
 100%      6 (longest request)

3.2 监控调整

<think>好的,用户想了解如何在Prometheus中设置和配置Kubernetes监控规则,包括教程和最佳实践。首先,我需要回忆一下Prometheus监控K8S的基本流程。通常,这涉及到部署Prometheus到集群中,可能使用Operator来简化管理,然后配置服务发现、抓取规则、告警规则等。 接下来,用户可能需要具体的步骤,比如如何部署Prometheus。这时候需要提到常用的方法,比如使用Helm Chart或者手动部署。同时,服务发现的配置是关键,Prometheus需要自动发现K8S中的各种资源,比如Pod、Service、Node等,这通常通过kubernetes_sd_configs来实现。这部分需要详细说明配置文件的写法,尤其是relabel_configs的使用,因为标签重写对于过滤和标记目标很重要。 然后是监控规则的配置,包括哪些核心指标是必须监控的,比如节点资源使用率、Pod状态、Deployment可用性等。对于每个指标,需要给出具体的PromQL表达式,例如CPU使用率的计算方式。同时,要提到如何将这些规则写入Prometheus的规则文件,可能需要使用ConfigMap来管理这些规则,特别是当使用Prometheus Operator时,可以通过自定义资源定义(CRD)来声明告警规则。 告警配置部分,用户可能需要知道如何将Prometheus与Alertmanager集成,设置通知方式,比如邮件或Slack。这部分需要提到Alertmanager的配置文件和路由规则,确保告警能够正确路由并通知到相关人员。 可视化方面,Grafana的使用是必不可少的,需要指导用户如何导入现有的仪表盘模板,比如Kubernetes Cluster Monitoring,或者根据需求自定义仪表盘。这里可以提到Grafana的数据源配置以及如何利用PromQL查询来创建图表。 最佳实践部分,应该包括标签管理、资源限制、定期审查监控规则、备份配置和安全性考虑。例如,建议使用有意义的标签来区分环境,避免监控组件本身消耗过多资源,以及如何定期更新告警阈值以适应集群的变化。 最后,用户可能需要参考资料,比如官方文档和社区资源,确保信息的准确性和可靠性。同时,生成的相关问题需要覆盖用户可能进一步询问的方向,比如性能优化、高可用性配置、排除常见问题的方法等。 需要注意的是,回答中要引用提供的参考资料,比如PrometheusK8S中的最佳实践,Grafana的集成方法等,确保引用标识正确添加。同时,保持语言的中文和结构清晰,分步骤说明,便于用户理解和操作。</think>### 在Prometheus中配置Kubernetes监控规则的步骤 #### 1. **部署PrometheusKubernetes集群** - **推荐方式**:使用Helm Chart或Prometheus Operator快速部署。 ```bash # 使用Helm安装Prometheus Stack(包含Prometheus、Alertmanager、Grafana) helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/kube-prometheus-stack ``` - **手动配置**:通过`Deployment`和`Service`资源部署,需配置RBAC权限和存储卷[^1]。 #### 2. **配置服务发现(Service Discovery)** Prometheus通过`kubernetes_sd_configs`自动发现K8S资源。在`prometheus.yml`中添加以下配置: ```yaml scrape_configs: - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node # 监控节点资源 relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:9100' # 重写端口为kubelet暴露的metrics端口 target_label: __address__ ``` - **关键标签重写**:通过`relabel_configs`过滤无关目标,例如仅监控生产环境Pod: ```yaml - source_labels: [__meta_kubernetes_namespace] action: keep regex: production ``` #### 3. **定义核心监控规则** - **节点资源监控**: - **CPU使用率**: ```promql (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) * 100 ``` - **内存使用率**: ```promql (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 ``` - **Pod健康状态**: ```promql kube_pod_status_phase{phase=~"Pending|Unknown"} > 0 ``` - **Deployment副本可用性**: ```promql kube_deployment_status_replicas_available / kube_deployment_spec_replicas < 0.8 ``` #### 4. **告警规则配置(alert.rules.yml)** 将规则文件挂载到Prometheus容器,或通过Prometheus Operator的`PrometheusRule` CRD定义: ```yaml groups: - name: Kubernetes-Alerts rules: - alert: NodeCPUHigh expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) * 100 > 80 for: 10m labels: severity: critical annotations: summary: "节点CPU使用率过高: {{ $labels.instance }}" ``` #### 5. **集成Alertmanager** - **路由配置**(alertmanager.yml): ```yaml route: group_by: [alertname, cluster] receiver: 'slack-notifications' receivers: - name: 'slack-notifications' slack_configs: - api_url: 'https://hooks.slack.com/services/XXX' ``` #### 6. **可视化配置(Grafana)** - **导入官方仪表盘**:ID为`3119`的“Kubernetes Cluster Monitoring”模板。 - **自定义查询**:例如监控Service的请求延迟: ```promql histogram_quantile(0.95, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le, service)) ``` --- ### **最佳实践** 1. **标签管理**:为监控对象添加`env=prod`或`team=backend`等标签,便于过滤和聚合[^3]。 2. **资源限制**:为Prometheus Pod设置内存限制(如8GiB以上),避免OOM问题。 3. **定期审查规则**:每季度优化过时的告警阈值,删除冗余指标。 4. **备份配置**:将`prometheus.yml`和告警规则纳入版本控制(如Git)。 5. **安全加固**:启用HTTPS和RBAC,限制`/metrics`端口的访问权限[^2]。 --- ### **引用示例** 通过Prometheus Operator可简化K8S监控的配置流程,实现声明式管理。Grafana与Prometheus的集成能显著提升指标可视化的效率。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值