从0到1:Kubernetes Prometheus适配器实战指南
引言:解决K8s自定义指标难题
你是否曾因Kubernetes HPA无法获取Prometheus指标而束手无策?是否在面对"metrics not found"错误时倍感挫折?作为容器编排领域的事实标准,Kubernetes提供了强大的自动扩缩容能力,但原生仅支持CPU和内存指标。当业务需要基于自定义指标(如请求量、队列长度)进行弹性伸缩时,Prometheus适配器(Prometheus Adapter)成为连接监控数据与业务弹性的关键桥梁。
本文将系统讲解Prometheus适配器的部署配置、指标转换和HPA集成全流程。通过10个实战章节和23个代码示例,你将掌握从环境准备到高级配置的完整技能链,最终实现基于任意Prometheus指标的Kubernetes智能扩缩容。
项目概述:Prometheus适配器核心价值
Prometheus适配器是Kubernetes自定义指标API的实现者,它将Prometheus时序数据转换为Kubernetes API可识别的指标格式,从而支持HPA控制器基于业务指标进行决策。其核心功能包括:
- 指标转换:将Prometheus查询结果转换为Kubernetes Quantity类型
- API适配:实现custom.metrics.k8s.io和external.metrics.k8s.io API组
- 动态发现:自动发现Prometheus中符合规则的指标序列
- 灵活配置:通过规则定义指标与Kubernetes资源的关联关系
与metrics-server仅提供节点/Pod资源指标不同,Prometheus适配器支持任意Prometheus采集的指标,包括应用自定义指标、中间件指标甚至外部系统指标,极大扩展了Kubernetes弹性伸缩的决策维度。
环境准备:部署前的必要检查
在部署适配器前,需确保Kubernetes集群满足以下条件:
| 检查项 | 最低要求 | 推荐配置 |
|---|---|---|
| Kubernetes版本 | 1.16+ | 1.22+ |
| 聚合层 | 已启用 | --enable-aggregator-routing=true |
| Prometheus | 2.0+ | 2.30+,带持久化存储 |
| RBAC权限 | cluster-admin | 专用Service Account |
| 网络连通性 | Prometheus可访问 | 网络策略允许6443端口 |
通过以下命令验证集群版本和聚合层状态:
kubectl version --short
# 检查API聚合层是否启用
kubectl api-versions | grep metrics
# 应输出custom.metrics.k8s.io/v1beta1等
对于未启用聚合层的集群,需在API Server启动参数中添加:
--enable-aggregator-routing=true \
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \
--requestheader-allowed-names=front-proxy-client \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User
安装部署:两种部署方式对比
Helm快速部署(推荐)
Helm是安装Prometheus适配器的最简便方式,社区维护的helm-chart包含完整的部署配置:
# 添加Helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 安装稳定版(生产环境)
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--create-namespace \
--set prometheus.url=http://prometheus-server.monitoring.svc:80 \
--set tls.enabled=true
# 安装开发版(测试环境)
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--create-namespace \
--version 4.1.0 \
--set image.repository=registry.k8s.io/prometheus-adapter/prometheus-adapter \
--set image.tag=v0.11.2
手动部署(自定义场景)
对于需要深度定制的场景,可通过 manifests 手动部署。核心组件包括:
- Service Account及权限:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-adapter
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-adapter
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "namespaces"]
verbs: ["get", "list", "watch"]
- 部署配置:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: prometheus-adapter
template:
metadata:
labels:
app.kubernetes.io/name: prometheus-adapter
spec:
serviceAccountName: prometheus-adapter
containers:
- name: prometheus-adapter
image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.2
args:
- --secure-port=6443
- --config=/etc/adapter/config.yaml
- --prometheus-url=http://prometheus-server.monitoring.svc:80
- --metrics-relist-interval=1m
volumeMounts:
- name: config
mountPath: /etc/adapter
volumes:
- name: config
configMap:
name: adapter-config
- API服务注册:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: prometheus-adapter
namespace: monitoring
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
通过以下命令完成部署:
kubectl apply -f deploy/manifests/
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter
核心配置:指标规则深度解析
适配器的核心能力来源于其灵活的配置系统,通过ConfigMap定义的规则集控制指标的发现、关联和转换过程。一个完整的配置文件结构如下:
rules:
- seriesQuery: '<PromQL筛选表达式>'
resources:
overrides:
<标签名>: {group: "<API组>", resource: "<资源名>"}
name:
matches: '<正则表达式>'
as: '<转换后的指标名>'
metricsQuery: '<PromQL模板>'
externalRules:
- seriesQuery: '<外部指标筛选表达式>'
resources:
namespaced: <布尔值>
metricsQuery: '<外部指标查询模板>'
关键配置项详解
- seriesQuery:Prometheus指标发现规则,使用PromQL的筛选语法,例如:
seriesQuery: '{__name__=~"^http_requests_.*",namespace!="",pod!=""}'
该表达式将匹配所有以http_requests_开头且包含namespace和pod标签的指标。
-
resources:定义Prometheus标签与Kubernetes资源的映射关系,支持两种方式:
- overrides:显式映射特定标签到资源
resources: overrides: microservice: {group: "apps", resource: "deployment"} module: {resource: "pod"}- template:通过模板自动映射
resources: template: "kube_<<.Group>>_<<.Resource>>" -
name:指标命名转换规则,通过正则表达式提取和重命名:
name:
matches: "^http_requests_(.*)_total$"
as: "http_requests_$1_per_second"
将http_requests_get_total转换为http_requests_get_per_second。
- metricsQuery:指标查询模板,决定如何从Prometheus获取和计算指标值:
metricsQuery: |
sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
模板变量说明:
<<.Series>>:匹配的Prometheus指标名<<.LabelMatchers>>:Kubernetes资源标签筛选条件<<.GroupBy>>:按指定标签分组
内置变量与函数
配置模板支持以下内置变量和函数:
| 变量 | 描述 | 示例输出 |
|---|---|---|
| .Series | 匹配的Prometheus指标名 | http_requests_total |
| .LabelMatchers | 资源标签筛选条件 | namespace="default",pod=~"app-.*" |
| .GroupBy | 分组标签列表 | pod,deployment |
| .Group | 资源API组 | apps |
| .Resource | 资源类型 | deployment |
典型配置示例
1. 基本计数器转换:
- seriesQuery: '{__name__=~".*_total",namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)"
2. 外部指标配置:
externalRules:
- seriesQuery: '{__name__="queue_depth",topic!=""}'
resources:
namespaced: false
metricsQuery: sum(queue_depth{<<.LabelMatchers>>}) by (topic)
实战案例:从零构建自定义指标HPA
本案例将完整演示从应用埋点到HPA配置的全过程,最终实现基于HTTP请求量的自动扩缩容。
步骤1:部署示例应用
部署一个暴露http_requests_total指标的测试应用:
apiVersion: apps/v1
kind: Deployment
metadata:
name: http-demo
spec:
replicas: 1
selector:
matchLabels:
app: http-demo
template:
metadata:
labels:
app: http-demo
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
containers:
- name: http-demo
image: luxas/autoscale-demo:v0.1.2
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: http-demo
spec:
selector:
app: http-demo
ports:
- port: 80
targetPort: 8080
部署后验证指标暴露:
kubectl port-forward svc/http-demo 8080:80
curl http://localhost:8080/metrics | grep http_requests_total
步骤2:配置Prometheus采集
创建ServiceMonitor确保Prometheus能采集该应用指标:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: http-demo
namespace: monitoring
spec:
selector:
matchLabels:
app: http-demo
endpoints:
- port: http
interval: 15s
在Prometheus UI中验证指标是否被正确采集:
http_requests_total{job="http-demo"}
步骤3:配置适配器规则
创建针对HTTP请求指标的转换规则:
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |-
rules:
- seriesQuery: '{__name__="http_requests_total",namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^http_requests_total$"
as: "http_requests"
metricsQuery: |
sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
应用配置并重启适配器:
kubectl apply -f adapter-config.yaml
kubectl rollout restart deployment prometheus-adapter -n monitoring
步骤4:验证自定义指标API
通过Kubernetes API验证指标是否可用:
# 查看所有可用的自定义指标
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
# 获取特定指标值
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests | jq .
预期输出应包含类似以下的指标数据:
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "http-demo-7f9658b8c4-2xqzv",
"apiVersion": "/v1"
},
"metricName": "http_requests",
"timestamp": "2023-05-15T08:30:45Z",
"value": "12m",
"selector": null
}
]
}
步骤5:创建基于自定义指标的HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: http-demo
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: http-demo
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests
target:
type: AverageValue
averageValue: 500m
应用HPA配置并观察行为:
kubectl apply -f hpa.yaml
kubectl get hpa http-demo -w
步骤6:生成负载测试扩缩容
使用hey工具生成测试流量:
kubectl run -it --rm load-generator --image=williamyeh/hey -- -z 5m -q 10 -c 2 http://http-demo.default.svc/metrics
同时观察HPA状态变化:
kubectl get hpa http-demo -o wide
随着请求量增加,HPA将逐步增加副本数;流量减少后,又会自动缩容至最小副本数。
高级配置:多维度指标处理
按标签维度聚合
通过修改metricsQuery实现按特定标签聚合指标,例如按HTTP方法聚合请求数:
metricsQuery: |
sum by (<<.GroupBy>>, method) (
rate(<<.Series>>{<<.LabelMatchers>>}[2m])
)
此时指标名将自动附加标签维度,可通过以下方式在HPA中引用:
metrics:
- type: Pods
pods:
metric:
name: http_requests
selector:
matchLabels:
method: "GET"
target:
type: AverageValue
averageValue: 300m
比率指标计算
通过配置规则可实现复杂的比率指标计算,例如请求成功率:
rules:
- seriesQuery: '{__name__=~"^http_(requests|successes)_total$",namespace!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
deployment: {group: "apps", resource: "deployment"}
name:
matches: "^http_(.*)_total$"
as: "http_$1_rate"
metricsQuery: |
sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^http_successes_total$",namespace!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
deployment: {group: "apps", resource: "deployment"}
name:
as: "http_success_rate"
metricsQuery: |
sum(rate(http_successes_total{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
/
sum(rate(http_requests_total{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
外部指标配置
对于与Kubernetes资源无关的外部系统指标(如消息队列深度),使用externalRules配置:
externalRules:
- seriesQuery: '{__name__="rabbitmq_queue_messages",vhost!="",queue!=""}'
resources:
namespaced: false
metricsQuery: sum(rabbitmq_queue_messages{<<.LabelMatchers>>}) by (queue)
在HPA中引用外部指标:
metrics:
- type: External
external:
metric:
name: rabbitmq_queue_messages
selector:
matchLabels:
queue: "order-processing"
target:
type: Value
value: 1000
故障排除:常见问题与解决方案
指标不显示问题排查流程
当自定义
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



