从0到1:Kubernetes Prometheus适配器实战指南

从0到1:Kubernetes Prometheus适配器实战指南

引言:解决K8s自定义指标难题

你是否曾因Kubernetes HPA无法获取Prometheus指标而束手无策?是否在面对"metrics not found"错误时倍感挫折?作为容器编排领域的事实标准,Kubernetes提供了强大的自动扩缩容能力,但原生仅支持CPU和内存指标。当业务需要基于自定义指标(如请求量、队列长度)进行弹性伸缩时,Prometheus适配器(Prometheus Adapter)成为连接监控数据与业务弹性的关键桥梁。

本文将系统讲解Prometheus适配器的部署配置、指标转换和HPA集成全流程。通过10个实战章节和23个代码示例,你将掌握从环境准备到高级配置的完整技能链,最终实现基于任意Prometheus指标的Kubernetes智能扩缩容。

项目概述:Prometheus适配器核心价值

Prometheus适配器是Kubernetes自定义指标API的实现者,它将Prometheus时序数据转换为Kubernetes API可识别的指标格式,从而支持HPA控制器基于业务指标进行决策。其核心功能包括:

  • 指标转换:将Prometheus查询结果转换为Kubernetes Quantity类型
  • API适配:实现custom.metrics.k8s.io和external.metrics.k8s.io API组
  • 动态发现:自动发现Prometheus中符合规则的指标序列
  • 灵活配置:通过规则定义指标与Kubernetes资源的关联关系

与metrics-server仅提供节点/Pod资源指标不同,Prometheus适配器支持任意Prometheus采集的指标,包括应用自定义指标、中间件指标甚至外部系统指标,极大扩展了Kubernetes弹性伸缩的决策维度。

mermaid

环境准备:部署前的必要检查

在部署适配器前,需确保Kubernetes集群满足以下条件:

检查项最低要求推荐配置
Kubernetes版本1.16+1.22+
聚合层已启用--enable-aggregator-routing=true
Prometheus2.0+2.30+,带持久化存储
RBAC权限cluster-admin专用Service Account
网络连通性Prometheus可访问网络策略允许6443端口

通过以下命令验证集群版本和聚合层状态:

kubectl version --short
# 检查API聚合层是否启用
kubectl api-versions | grep metrics
# 应输出custom.metrics.k8s.io/v1beta1等

对于未启用聚合层的集群,需在API Server启动参数中添加:

--enable-aggregator-routing=true \
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \
--requestheader-allowed-names=front-proxy-client \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User

安装部署:两种部署方式对比

Helm快速部署(推荐)

Helm是安装Prometheus适配器的最简便方式,社区维护的helm-chart包含完整的部署配置:

# 添加Helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 安装稳定版(生产环境)
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.url=http://prometheus-server.monitoring.svc:80 \
  --set tls.enabled=true

# 安装开发版(测试环境)
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --create-namespace \
  --version 4.1.0 \
  --set image.repository=registry.k8s.io/prometheus-adapter/prometheus-adapter \
  --set image.tag=v0.11.2

手动部署(自定义场景)

对于需要深度定制的场景,可通过 manifests 手动部署。核心组件包括:

  1. Service Account及权限
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-adapter
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-adapter
rules:
- apiGroups: [""]
  resources: ["pods", "nodes", "namespaces"]
  verbs: ["get", "list", "watch"]
  1. 部署配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-adapter
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus-adapter
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus-adapter
    spec:
      serviceAccountName: prometheus-adapter
      containers:
      - name: prometheus-adapter
        image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.2
        args:
        - --secure-port=6443
        - --config=/etc/adapter/config.yaml
        - --prometheus-url=http://prometheus-server.monitoring.svc:80
        - --metrics-relist-interval=1m
        volumeMounts:
        - name: config
          mountPath: /etc/adapter
      volumes:
      - name: config
        configMap:
          name: adapter-config
  1. API服务注册
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

通过以下命令完成部署:

kubectl apply -f deploy/manifests/
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter

核心配置:指标规则深度解析

适配器的核心能力来源于其灵活的配置系统,通过ConfigMap定义的规则集控制指标的发现、关联和转换过程。一个完整的配置文件结构如下:

rules:
- seriesQuery: '<PromQL筛选表达式>'
  resources:
    overrides:
      <标签名>: {group: "<API组>", resource: "<资源名>"}
  name:
    matches: '<正则表达式>'
    as: '<转换后的指标名>'
  metricsQuery: '<PromQL模板>'
externalRules:
- seriesQuery: '<外部指标筛选表达式>'
  resources:
    namespaced: <布尔值>
  metricsQuery: '<外部指标查询模板>'

关键配置项详解

  1. seriesQuery:Prometheus指标发现规则,使用PromQL的筛选语法,例如:
seriesQuery: '{__name__=~"^http_requests_.*",namespace!="",pod!=""}'

该表达式将匹配所有以http_requests_开头且包含namespace和pod标签的指标。

  1. resources:定义Prometheus标签与Kubernetes资源的映射关系,支持两种方式:

    • overrides:显式映射特定标签到资源
    resources:
      overrides:
        microservice: {group: "apps", resource: "deployment"}
        module: {resource: "pod"}
    
    • template:通过模板自动映射
    resources:
      template: "kube_<<.Group>>_<<.Resource>>"
    
  2. name:指标命名转换规则,通过正则表达式提取和重命名:

name:
  matches: "^http_requests_(.*)_total$"
  as: "http_requests_$1_per_second"

http_requests_get_total转换为http_requests_get_per_second

  1. metricsQuery:指标查询模板,决定如何从Prometheus获取和计算指标值:
metricsQuery: |
  sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)

模板变量说明:

  • <<.Series>>:匹配的Prometheus指标名
  • <<.LabelMatchers>>:Kubernetes资源标签筛选条件
  • <<.GroupBy>>:按指定标签分组

内置变量与函数

配置模板支持以下内置变量和函数:

变量描述示例输出
.Series匹配的Prometheus指标名http_requests_total
.LabelMatchers资源标签筛选条件namespace="default",pod=~"app-.*"
.GroupBy分组标签列表pod,deployment
.Group资源API组apps
.Resource资源类型deployment

典型配置示例

1. 基本计数器转换

- seriesQuery: '{__name__=~".*_total",namespace!="",pod!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  name:
    matches: "^(.*)_total$"
    as: "${1}_per_second"
  metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)"

2. 外部指标配置

externalRules:
- seriesQuery: '{__name__="queue_depth",topic!=""}'
  resources:
    namespaced: false
  metricsQuery: sum(queue_depth{<<.LabelMatchers>>}) by (topic)

实战案例:从零构建自定义指标HPA

本案例将完整演示从应用埋点到HPA配置的全过程,最终实现基于HTTP请求量的自动扩缩容。

步骤1:部署示例应用

部署一个暴露http_requests_total指标的测试应用:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: http-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-demo
  template:
    metadata:
      labels:
        app: http-demo
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
      - name: http-demo
        image: luxas/autoscale-demo:v0.1.2
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: http-demo
spec:
  selector:
    app: http-demo
  ports:
  - port: 80
    targetPort: 8080

部署后验证指标暴露:

kubectl port-forward svc/http-demo 8080:80
curl http://localhost:8080/metrics | grep http_requests_total

步骤2:配置Prometheus采集

创建ServiceMonitor确保Prometheus能采集该应用指标:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: http-demo
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: http-demo
  endpoints:
  - port: http
    interval: 15s

在Prometheus UI中验证指标是否被正确采集:

http_requests_total{job="http-demo"}

步骤3:配置适配器规则

创建针对HTTP请求指标的转换规则:

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |-
    rules:
    - seriesQuery: '{__name__="http_requests_total",namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^http_requests_total$"
        as: "http_requests"
      metricsQuery: |
        sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)

应用配置并重启适配器:

kubectl apply -f adapter-config.yaml
kubectl rollout restart deployment prometheus-adapter -n monitoring

步骤4:验证自定义指标API

通过Kubernetes API验证指标是否可用:

# 查看所有可用的自定义指标
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

# 获取特定指标值
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests | jq .

预期输出应包含类似以下的指标数据:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "http-demo-7f9658b8c4-2xqzv",
        "apiVersion": "/v1"
      },
      "metricName": "http_requests",
      "timestamp": "2023-05-15T08:30:45Z",
      "value": "12m",
      "selector": null
    }
  ]
}

步骤5:创建基于自定义指标的HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: http-demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: http-demo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests
      target:
        type: AverageValue
        averageValue: 500m

应用HPA配置并观察行为:

kubectl apply -f hpa.yaml
kubectl get hpa http-demo -w

步骤6:生成负载测试扩缩容

使用hey工具生成测试流量:

kubectl run -it --rm load-generator --image=williamyeh/hey -- -z 5m -q 10 -c 2 http://http-demo.default.svc/metrics

同时观察HPA状态变化:

kubectl get hpa http-demo -o wide

随着请求量增加,HPA将逐步增加副本数;流量减少后,又会自动缩容至最小副本数。

高级配置:多维度指标处理

按标签维度聚合

通过修改metricsQuery实现按特定标签聚合指标,例如按HTTP方法聚合请求数:

metricsQuery: |
  sum by (<<.GroupBy>>, method) (
    rate(<<.Series>>{<<.LabelMatchers>>}[2m])
  )

此时指标名将自动附加标签维度,可通过以下方式在HPA中引用:

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests
      selector:
        matchLabels:
          method: "GET"
    target:
      type: AverageValue
      averageValue: 300m

比率指标计算

通过配置规则可实现复杂的比率指标计算,例如请求成功率:

rules:
- seriesQuery: '{__name__=~"^http_(requests|successes)_total$",namespace!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      deployment: {group: "apps", resource: "deployment"}
  name:
    matches: "^http_(.*)_total$"
    as: "http_$1_rate"
  metricsQuery: |
    sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^http_successes_total$",namespace!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      deployment: {group: "apps", resource: "deployment"}
  name:
    as: "http_success_rate"
  metricsQuery: |
    sum(rate(http_successes_total{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>) 
    / 
    sum(rate(http_requests_total{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)

外部指标配置

对于与Kubernetes资源无关的外部系统指标(如消息队列深度),使用externalRules配置:

externalRules:
- seriesQuery: '{__name__="rabbitmq_queue_messages",vhost!="",queue!=""}'
  resources:
    namespaced: false
  metricsQuery: sum(rabbitmq_queue_messages{<<.LabelMatchers>>}) by (queue)

在HPA中引用外部指标:

metrics:
- type: External
  external:
    metric:
      name: rabbitmq_queue_messages
      selector:
        matchLabels:
          queue: "order-processing"
    target:
      type: Value
      value: 1000

故障排除:常见问题与解决方案

指标不显示问题排查流程

当自定义

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值