ingress-nginx自定义指标：HPA自动扩缩容-优快云博客

ingress-nginx自定义指标：HPA自动扩缩容

【免费下载链接】ingress-nginx Ingress-NGINX Controller for Kubernetes 项目地址: https://gitcode.com/GitHub_Trending/in/ingress-nginx

引言

在现代云原生架构中，自动扩缩容（Auto Scaling）是确保应用高可用性和资源高效利用的关键技术。Kubernetes Horizontal Pod Autoscaler（HPA，水平Pod自动扩缩容器）通常基于CPU和内存使用率进行扩缩容决策，但对于像ingress-nginx这样的网络入口控制器，基于请求量和性能指标进行扩缩容往往更加精准。

本文将深入探讨如何利用ingress-nginx暴露的Prometheus自定义指标，实现基于实际流量模式的智能HPA自动扩缩容，帮助您构建更加弹性、高效的云原生基础设施。

ingress-nginx监控指标体系

ingress-nginx通过端口10254暴露丰富的Prometheus指标，主要分为以下几类：

请求相关指标

# 请求处理时间直方图（秒）
nginx_ingress_controller_request_duration_seconds

# 响应时间直方图（秒）  
nginx_ingress_controller_response_duration_seconds

# 请求头时间直方图（秒）
nginx_ingress_controller_header_duration_seconds

# 连接时间直方图（秒）
nginx_ingress_controller_connect_duration_seconds

# 响应大小直方图
nginx_ingress_controller_response_size

# 请求大小直方图
nginx_ingress_controller_request_size

# 请求总数计数器
nginx_ingress_controller_requests

Nginx进程指标

# 当前连接数
nginx_ingress_controller_nginx_process_connections

# 总连接数
nginx_ingress_controller_nginx_process_connections_total

# CPU使用时间
nginx_ingress_controller_nginx_process_cpu_seconds_total

# 内存使用量
nginx_ingress_controller_nginx_process_resident_memory_bytes

# 虚拟内存使用量  
nginx_ingress_controller_nginx_process_virtual_memory_bytes

# 总请求数
nginx_ingress_controller_nginx_process_requests_total

控制器指标

# 配置哈希值
nginx_ingress_controller_config_hash

# 最后重载配置状态
nginx_ingress_controller_config_last_reload_successful

# SSL证书信息
nginx_ingress_controller_ssl_certificate_info

环境准备与配置

启用ingress-nginx指标导出

通过Helm安装或升级ingress-nginx时，需要启用指标导出功能：

helm upgrade ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx \
  --set controller.metrics.enabled=true \
  --set-string controller.podAnnotations."prometheus\.io/scrape"="true" \
  --set-string controller.podAnnotations."prometheus\.io/port"="10254"

配置Prometheus监控

创建ServiceMonitor资源，让Prometheus能够自动发现和抓取ingress-nginx指标：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/component: controller
  endpoints:
  - port: prometheus
    interval: 30s
    path: /metrics

基于自定义指标的HPA配置

方案一：基于请求率的HPA

使用nginx_ingress_controller_requests指标，按每秒请求数进行扩缩容：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-nginx-hpa
  namespace: ingress-nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ingress-nginx-controller
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: nginx_ingress_controller_requests
      target:
        type: AverageValue
        averageValue: 1000

方案二：基于响应时间的HPA

使用nginx_ingress_controller_request_duration_seconds指标，确保响应时间在可接受范围内：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-nginx-hpa-latency
  namespace: ingress-nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ingress-nginx-controller
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      metric:
        name: nginx_ingress_controller_request_duration_seconds
      describedObject:
        apiVersion: v1
        kind: Service
        name: ingress-nginx-controller
      target:
        type: Value
        value: 500m

方案三：多指标组合HPA

结合请求率和错误率进行智能扩缩容：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-nginx-hpa-combined
  namespace: ingress-nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ingress-nginx-controller
  minReplicas: 2
  maxReplicas: 15
  metrics:
  - type: Pods
    pods:
      metric:
        name: nginx_ingress_controller_requests
      target:
        type: AverageValue
        averageValue: 800
  - type: Object
    object:
      metric:
        name: nginx_ingress_controller_request_duration_seconds
      describedObject:
        apiVersion: v1
        kind: Service
        name: ingress-nginx-controller
      target:
        type: Value
        value: 300m
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

Prometheus Adapter配置

为了让HPA能够识别自定义指标，需要部署和配置Prometheus Adapter：

apiVersion: adapter.config.openshift.io/v1
kind: PrometheusAdapter
metadata:
  name: custom-metrics
  namespace: openshift-monitoring
spec:
  rules:
  - seriesQuery: 'nginx_ingress_controller_requests{namespace!="",ingress!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        ingress: {resource: "ingress"}
    name:
      matches: "nginx_ingress_controller_requests"
      as: "nginx_requests_per_second"
    metricsQuery: 'sum(rate(nginx_ingress_controller_requests[2m])) by (namespace, ingress)'
  
  - seriesQuery: 'nginx_ingress_controller_request_duration_seconds{namespace!="",ingress!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        ingress: {resource: "ingress"}
    name:
      matches: "nginx_ingress_controller_request_duration_seconds"
      as: "nginx_request_duration_seconds"
    metricsQuery: 'histogram_quantile(0.95, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[2m])) by (le, namespace, ingress))'

Helm高级配置

在values.yaml中直接配置HPA模板：

controller:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 50
  autoscalingTemplate:
  - type: Pods
    pods:
      metric:
        name: nginx_ingress_controller_requests
      target:
        type: AverageValue
        averageValue: 1000
  - type: Object
    object:
      metric:
        name: nginx_ingress_controller_request_duration_seconds
      describedObject:
        apiVersion: v1
        kind: Service
        name: ingress-nginx-controller
      target:
        type: Value
        value: 500m

监控与告警配置

Grafana监控看板

创建专门的HPA监控看板，实时跟踪扩缩容状态：

{
  "panels": [
    {
      "title": "HPA Replica Count",
      "targets": [{
        "expr": "kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler='ingress-nginx-hpa'}",
        "legendFormat": "Current Replicas"
      }]
    },
    {
      "title": "Request Rate vs HPA Target",
      "targets": [
        {
          "expr": "sum(rate(nginx_ingress_controller_requests[2m]))",
          "legendFormat": "Actual Request Rate"
        },
        {
          "expr": "1000",
          "legendFormat": "HPA Target (1000rps)"
        }
      ]
    }
  ]
}

Prometheus告警规则

设置智能告警，及时发现扩缩容异常：

groups:
- name: ingress-nginx-hpa-alerts
  rules:
  - alert: HPA Scaling Failed
    expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited", status="true"} == 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "HPA scaling is limited for {{ $labels.horizontalpodautoscaler }}"
      description: "HPA {{ $labels.horizontalpodautoscaler }} cannot scale due to resource constraints"
  
  - alert: High Request Latency
    expr: histogram_quantile(0.95, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) > 1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High request latency in ingress-nginx"
      description: "95th percentile request latency exceeds 1 second"

最佳实践与优化建议

1. 容量规划与阈值设置

mermaid

2. 多维度监控策略

监控维度	关键指标	告警阈值	应对策略
请求量	nginx_ingress_controller_requests	> 80% 容量	提前扩容
响应时间	request_duration_seconds	P95 > 500ms	立即扩容
错误率	5xx请求占比	> 5%	检查后端服务
连接数	nginx_process_connections	> 80% 限制	调整配置

3. 性能优化配置

# nginx配置优化
controller:
  config:
    # 增加工作进程数
    worker-processes: "auto"
    # 优化连接处理
    worker-connections: "10240"
    # 启用keepalive
    keep-alive: "75"
    # 调整缓冲区大小
    proxy-buffer-size: "16k"
    proxy-buffers: "4 16k"

故障排查与诊断

常见问题及解决方案

HPA不生效
- 检查Prometheus Adapter日志
- 验证指标查询语法：kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
- 检查HPA事件：kubectl describe hpa ingress-nginx-hpa
指标数据缺失
- 验证Prometheus抓取配置
- 检查ingress-nginx指标端口：curl http://ingress-nginx-controller:10254/metrics
- 查看Pod注解是否正确配置
扩缩容过于频繁
- 调整stabilizationWindowSeconds
- 增加metrics查询时间窗口
- 设置合理的扩缩容策略

总结

通过ingress-nginx的自定义指标实现HPA自动扩缩容，能够为您的Kubernetes集群带来以下核心价值：

精准扩缩容：基于实际流量模式而非简单的资源使用率
成本优化：避免过度配置资源，按需分配计算能力
性能保障：确保服务响应时间在可接受范围内
高可用性：自动应对流量峰值和突发负载

本文提供的配置方案和最佳实践，帮助您构建更加智能、弹性的云原生入口架构，为业务稳定运行提供坚实保障。

【免费下载链接】ingress-nginx Ingress-NGINX Controller for Kubernetes 项目地址: https://gitcode.com/GitHub_Trending/in/ingress-nginx

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考