ingress-nginx自定义指标:HPA自动扩缩容
引言
在现代云原生架构中,自动扩缩容(Auto Scaling)是确保应用高可用性和资源高效利用的关键技术。Kubernetes Horizontal Pod Autoscaler(HPA,水平Pod自动扩缩容器)通常基于CPU和内存使用率进行扩缩容决策,但对于像ingress-nginx这样的网络入口控制器,基于请求量和性能指标进行扩缩容往往更加精准。
本文将深入探讨如何利用ingress-nginx暴露的Prometheus自定义指标,实现基于实际流量模式的智能HPA自动扩缩容,帮助您构建更加弹性、高效的云原生基础设施。
ingress-nginx监控指标体系
ingress-nginx通过端口10254暴露丰富的Prometheus指标,主要分为以下几类:
请求相关指标
# 请求处理时间直方图(秒)
nginx_ingress_controller_request_duration_seconds
# 响应时间直方图(秒)
nginx_ingress_controller_response_duration_seconds
# 请求头时间直方图(秒)
nginx_ingress_controller_header_duration_seconds
# 连接时间直方图(秒)
nginx_ingress_controller_connect_duration_seconds
# 响应大小直方图
nginx_ingress_controller_response_size
# 请求大小直方图
nginx_ingress_controller_request_size
# 请求总数计数器
nginx_ingress_controller_requests
Nginx进程指标
# 当前连接数
nginx_ingress_controller_nginx_process_connections
# 总连接数
nginx_ingress_controller_nginx_process_connections_total
# CPU使用时间
nginx_ingress_controller_nginx_process_cpu_seconds_total
# 内存使用量
nginx_ingress_controller_nginx_process_resident_memory_bytes
# 虚拟内存使用量
nginx_ingress_controller_nginx_process_virtual_memory_bytes
# 总请求数
nginx_ingress_controller_nginx_process_requests_total
控制器指标
# 配置哈希值
nginx_ingress_controller_config_hash
# 最后重载配置状态
nginx_ingress_controller_config_last_reload_successful
# SSL证书信息
nginx_ingress_controller_ssl_certificate_info
环境准备与配置
启用ingress-nginx指标导出
通过Helm安装或升级ingress-nginx时,需要启用指标导出功能:
helm upgrade ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx \
--set controller.metrics.enabled=true \
--set-string controller.podAnnotations."prometheus\.io/scrape"="true" \
--set-string controller.podAnnotations."prometheus\.io/port"="10254"
配置Prometheus监控
创建ServiceMonitor资源,让Prometheus能够自动发现和抓取ingress-nginx指标:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ingress-nginx
namespace: ingress-nginx
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
endpoints:
- port: prometheus
interval: 30s
path: /metrics
基于自定义指标的HPA配置
方案一:基于请求率的HPA
使用nginx_ingress_controller_requests指标,按每秒请求数进行扩缩容:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-nginx-hpa
namespace: ingress-nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ingress-nginx-controller
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: nginx_ingress_controller_requests
target:
type: AverageValue
averageValue: 1000
方案二:基于响应时间的HPA
使用nginx_ingress_controller_request_duration_seconds指标,确保响应时间在可接受范围内:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-nginx-hpa-latency
namespace: ingress-nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ingress-nginx-controller
minReplicas: 2
maxReplicas: 10
metrics:
- type: Object
object:
metric:
name: nginx_ingress_controller_request_duration_seconds
describedObject:
apiVersion: v1
kind: Service
name: ingress-nginx-controller
target:
type: Value
value: 500m
方案三:多指标组合HPA
结合请求率和错误率进行智能扩缩容:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ingress-nginx-hpa-combined
namespace: ingress-nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ingress-nginx-controller
minReplicas: 2
maxReplicas: 15
metrics:
- type: Pods
pods:
metric:
name: nginx_ingress_controller_requests
target:
type: AverageValue
averageValue: 800
- type: Object
object:
metric:
name: nginx_ingress_controller_request_duration_seconds
describedObject:
apiVersion: v1
kind: Service
name: ingress-nginx-controller
target:
type: Value
value: 300m
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
Prometheus Adapter配置
为了让HPA能够识别自定义指标,需要部署和配置Prometheus Adapter:
apiVersion: adapter.config.openshift.io/v1
kind: PrometheusAdapter
metadata:
name: custom-metrics
namespace: openshift-monitoring
spec:
rules:
- seriesQuery: 'nginx_ingress_controller_requests{namespace!="",ingress!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
ingress: {resource: "ingress"}
name:
matches: "nginx_ingress_controller_requests"
as: "nginx_requests_per_second"
metricsQuery: 'sum(rate(nginx_ingress_controller_requests[2m])) by (namespace, ingress)'
- seriesQuery: 'nginx_ingress_controller_request_duration_seconds{namespace!="",ingress!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
ingress: {resource: "ingress"}
name:
matches: "nginx_ingress_controller_request_duration_seconds"
as: "nginx_request_duration_seconds"
metricsQuery: 'histogram_quantile(0.95, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[2m])) by (le, namespace, ingress))'
Helm高级配置
在values.yaml中直接配置HPA模板:
controller:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 50
autoscalingTemplate:
- type: Pods
pods:
metric:
name: nginx_ingress_controller_requests
target:
type: AverageValue
averageValue: 1000
- type: Object
object:
metric:
name: nginx_ingress_controller_request_duration_seconds
describedObject:
apiVersion: v1
kind: Service
name: ingress-nginx-controller
target:
type: Value
value: 500m
监控与告警配置
Grafana监控看板
创建专门的HPA监控看板,实时跟踪扩缩容状态:
{
"panels": [
{
"title": "HPA Replica Count",
"targets": [{
"expr": "kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler='ingress-nginx-hpa'}",
"legendFormat": "Current Replicas"
}]
},
{
"title": "Request Rate vs HPA Target",
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests[2m]))",
"legendFormat": "Actual Request Rate"
},
{
"expr": "1000",
"legendFormat": "HPA Target (1000rps)"
}
]
}
]
}
Prometheus告警规则
设置智能告警,及时发现扩缩容异常:
groups:
- name: ingress-nginx-hpa-alerts
rules:
- alert: HPA Scaling Failed
expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited", status="true"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "HPA scaling is limited for {{ $labels.horizontalpodautoscaler }}"
description: "HPA {{ $labels.horizontalpodautoscaler }} cannot scale due to resource constraints"
- alert: High Request Latency
expr: histogram_quantile(0.95, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) > 1
for: 2m
labels:
severity: critical
annotations:
summary: "High request latency in ingress-nginx"
description: "95th percentile request latency exceeds 1 second"
最佳实践与优化建议
1. 容量规划与阈值设置
2. 多维度监控策略
| 监控维度 | 关键指标 | 告警阈值 | 应对策略 |
|---|---|---|---|
| 请求量 | nginx_ingress_controller_requests | > 80% 容量 | 提前扩容 |
| 响应时间 | request_duration_seconds | P95 > 500ms | 立即扩容 |
| 错误率 | 5xx请求占比 | > 5% | 检查后端服务 |
| 连接数 | nginx_process_connections | > 80% 限制 | 调整配置 |
3. 性能优化配置
# nginx配置优化
controller:
config:
# 增加工作进程数
worker-processes: "auto"
# 优化连接处理
worker-connections: "10240"
# 启用keepalive
keep-alive: "75"
# 调整缓冲区大小
proxy-buffer-size: "16k"
proxy-buffers: "4 16k"
故障排查与诊断
常见问题及解决方案
-
HPA不生效
- 检查Prometheus Adapter日志
- 验证指标查询语法:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 - 检查HPA事件:
kubectl describe hpa ingress-nginx-hpa
-
指标数据缺失
- 验证Prometheus抓取配置
- 检查ingress-nginx指标端口:
curl http://ingress-nginx-controller:10254/metrics - 查看Pod注解是否正确配置
-
扩缩容过于频繁
- 调整stabilizationWindowSeconds
- 增加metrics查询时间窗口
- 设置合理的扩缩容策略
总结
通过ingress-nginx的自定义指标实现HPA自动扩缩容,能够为您的Kubernetes集群带来以下核心价值:
- 精准扩缩容:基于实际流量模式而非简单的资源使用率
- 成本优化:避免过度配置资源,按需分配计算能力
- 性能保障:确保服务响应时间在可接受范围内
- 高可用性:自动应对流量峰值和突发负载
本文提供的配置方案和最佳实践,帮助您构建更加智能、弹性的云原生入口架构,为业务稳定运行提供坚实保障。
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



