ingress-nginx自动化运维:Ansible/Terraform实战指南
引言:为什么需要自动化运维?
在当今云原生时代,Kubernetes集群中的Ingress控制器扮演着至关重要的角色。ingress-nginx作为最流行的Ingress控制器之一,负责处理集群入口流量、SSL终止、负载均衡等关键功能。然而,手动部署和管理多个环境的ingress-nginx实例往往面临以下挑战:
- 环境一致性难以保证:开发、测试、生产环境配置差异导致问题
- 部署效率低下:重复的手动操作消耗大量时间和精力
- 配置管理复杂:多环境、多集群的配置版本控制困难
- 扩展性受限:手动操作难以应对大规模集群部署需求
本文将深入探讨如何使用Ansible和Terraform实现ingress-nginx的自动化部署与管理,帮助您构建高效、可靠的Ingress控制器运维体系。
技术架构概览
Terraform基础设施自动化
1. 基础环境准备
首先,我们需要使用Terraform创建Kubernetes集群和相关的网络资源:
# main.tf - 基础基础设施配置
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.0"
}
}
}
provider "aws" {
region = var.aws_region
}
# 创建VPC和网络资源
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 3.0"
name = "ingress-nginx-vpc"
cidr = "10.0.0.0/16"
azs = ["us-west-2a", "us-west-2b", "us-west-2c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}
2. EKS集群创建
# eks.tf - EKS集群配置
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.0"
cluster_name = "ingress-nginx-cluster"
cluster_version = "1.27"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
ingress_nodes = {
min_size = 2
max_size = 5
desired_size = 2
instance_types = ["t3.medium"]
capacity_type = "SPOT"
labels = {
role = "ingress"
}
taints = {
dedicated = {
key = "role"
value = "ingress"
effect = "NO_SCHEDULE"
}
}
}
}
}
3. Helm Provider配置
# helm.tf - Helm Provider配置
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
}
data "aws_eks_cluster_auth" "this" {
name = module.eks.cluster_name
}
Ansible配置管理
1. 目录结构设计
ingress-nginx-automation/
├── ansible/
│ ├── inventory/
│ │ ├── production
│ │ ├── staging
│ │ └── development
│ ├── group_vars/
│ │ ├── all.yml
│ │ ├── production.yml
│ │ └── staging.yml
│ ├── roles/
│ │ └── ingress-nginx/
│ │ ├── tasks/
│ │ ├── templates/
│ │ ├── vars/
│ │ └── handlers/
│ └── playbooks/
│ ├── deploy-ingress-nginx.yml
│ ├── upgrade-ingress-nginx.yml
│ └── validate-ingress-nginx.yml
└── terraform/
├── main.tf
├── variables.tf
├── outputs.tf
└── modules/
2. Ansible Playbook实现
# deploy-ingress-nginx.yml
---
- name: Deploy ingress-nginx controller
hosts: kubernetes_masters
become: yes
vars_files:
- "../group_vars/{{ environment }}.yml"
tasks:
- name: Check Kubernetes cluster status
command: kubectl cluster-info
register: cluster_info
changed_when: false
- name: Create ingress-nginx namespace
kubernetes.core.k8s:
api_version: v1
kind: Namespace
name: "{{ ingress_nginx_namespace }}"
state: present
- name: Add ingress-nginx Helm repository
community.kubernetes.helm_repository:
name: ingress-nginx
repo_url: "https://kubernetes.github.io/ingress-nginx"
- name: Deploy ingress-nginx using Helm
community.kubernetes.helm:
name: ingress-nginx
chart_ref: ingress-nginx/ingress-nginx
release_namespace: "{{ ingress_nginx_namespace }}"
create_namespace: yes
values: "{{ ingress_nginx_values }}"
wait: yes
timeout: 300
- name: Validate ingress-nginx deployment
command: |
kubectl -n {{ ingress_nginx_namespace }} wait \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=300s
register: validation_result
failed_when: validation_result.rc != 0
- name: Configure monitoring and logging
include_role:
name: ingress-nginx
tasks_from: monitoring.yml
when: enable_monitoring
handlers:
- name: restart ingress-nginx
community.kubernetes.helm:
name: ingress-nginx
chart_ref: ingress-nginx/ingress-nginx
release_namespace: "{{ ingress_nginx_namespace }}"
state: present
3. 配置变量管理
# group_vars/production.yml
---
# ingress-nginx配置
ingress_nginx_namespace: ingress-nginx
ingress_nginx_version: "4.13.2"
ingress_nginx_chart_version: "4.13.2"
# Helm values配置
ingress_nginx_values:
controller:
replicaCount: 3
minAvailable: 2
metrics:
enabled: true
serviceMonitor:
enabled: true
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
externalTrafficPolicy: "Local"
config:
use-forwarded-headers: "true"
compute-full-forwarded-for: "true"
use-proxy-protocol: "false"
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
# 监控配置
enable_monitoring: true
prometheus_namespace: monitoring
grafana_namespace: monitoring
# 网络配置
load_balancer_timeout: 3600
enable_ssl_termination: true
多环境部署策略
环境差异配置表
| 配置项 | 开发环境 | 测试环境 | 生产环境 |
|---|---|---|---|
| 副本数 | 1 | 2 | 3 |
| 资源限制 | CPU: 100m, Memory: 128Mi | CPU: 200m, Memory: 256Mi | CPU: 500m, Memory: 512Mi |
| HPA配置 | 禁用 | 启用(2-5) | 启用(3-10) |
| 监控 | 基础监控 | 完整监控 | 完整监控+告警 |
| SSL证书 | 自签名 | Let's Encrypt staging | Let's Encrypt production |
| 负载均衡器 | 内部 | 外部测试 | 外部生产 |
环境特定的Terraform配置
# variables.tf - 环境变量定义
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "ingress_nginx_config" {
description = "ingress-nginx configuration based on environment"
type = map(object({
replica_count = number
resource_limits = object({
cpu = string
memory = string
})
autoscaling = object({
enabled = bool
min = number
max = number
})
}))
default = {
dev = {
replica_count = 1
resource_limits = {
cpu = "100m"
memory = "128Mi"
}
autoscaling = {
enabled = false
min = 1
max = 1
}
}
staging = {
replica_count = 2
resource_limits = {
cpu = "200m"
memory = "256Mi"
}
autoscaling = {
enabled = true
min = 2
max = 5
}
}
prod = {
replica_count = 3
resource_limits = {
cpu = "500m"
memory = "512Mi"
}
autoscaling = {
enabled = true
min = 3
max = 10
}
}
}
}
高级配置与优化
1. 自定义NGINX配置
# templates/nginx-custom-config.yml.j2
controller:
config:
# 连接超时设置
proxy-connect-timeout: "30s"
proxy-send-timeout: "30s"
proxy-read-timeout: "30s"
# 缓冲区配置
proxy-buffer-size: "16k"
proxy-buffers: "4 16k"
# 上游服务器配置
upstream-keepalive-connections: "200"
upstream-keepalive-timeout: "60s"
upstream-keepalive-requests: "1000"
# 日志配置
log-format-upstream: '{{ "{{" }} $remote_addr {{ "}}" }} - {{ "{{" }} $remote_user {{ "}}" }} [{{ "{{" }} $time_local {{ "}}" }}] "{{ "{{" }} $request {{ "}}" }}" {{ "{{" }} $status {{ "}}" }} {{ "{{" }} $body_bytes_sent {{ "}}" }} "{{ "{{" }} $http_referer {{ "}}" }}" "{{ "{{" }} $http_user_agent {{ "}}" }}" {{ "{{" }} $request_time {{ "}}" }} {{ "{{" }} $upstream_response_time {{ "}}" }}'
# 安全头设置
server-tokens: "false"
hide-headers: "Server X-Powered-By X-AspNet-Version"
2. 自动证书管理
# cert-manager.tf - 自动SSL证书管理
resource "helm_release" "cert_manager" {
name = "cert-manager"
repository = "https://charts.jetstack.io"
chart = "cert-manager"
version = "v1.11.0"
namespace = "cert-manager"
create_namespace = true
set {
name = "installCRDs"
value = "true"
}
set {
name = "prometheus.enabled"
value = "false"
}
}
resource "kubernetes_manifest" "cluster_issuer" {
manifest = {
apiVersion = "cert-manager.io/v1"
kind = "ClusterIssuer"
metadata = {
name = "letsencrypt-${var.environment}"
}
spec = {
acme = {
server = var.environment == "prod" ? "https://acme-v02.api.letsencrypt.org/directory" : "https://acme-staging-v02.api.letsencrypt.org/directory"
email = var.acme_email
private_key_secret_ref = {
name = "letsencrypt-${var.environment}"
}
solvers = [
{
http01 = {
ingress = {
class = "nginx"
}
}
}
]
}
}
}
depends_on = [helm_release.cert_manager]
}
监控与告警体系
1. Prometheus监控配置
# templates/prometheus-monitoring.yml.j2
controller:
metrics:
enabled: true
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10254"
serviceMonitor:
enabled: true
additionalLabels:
release: prometheus
namespace: "{{ prometheus_namespace }}"
interval: 30s
scrapeTimeout: 10s
# 自定义指标
config:
enable-opentracing: "false"
enable-ocsp: "false"
error-log-level: "notice"
# 性能指标收集
extraArgs:
enable-ssl-chain-completion: "false"
http-port: "8080"
https-port: "8443"
default-ssl-certificate: "{{ ingress_nginx_namespace }}/tls-secret"
2. Grafana仪表板配置
# templates/grafana-dashboard.yml.j2
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: ingress-nginx-dashboard
namespace: "{{ grafana_namespace }}"
labels:
app: grafana
spec:
json: |
{
"title": "Ingress NGINX Controller",
"tags": ["nginx", "ingress", "kubernetes"],
"timezone": "browser",
"panels": [
{
"title": "Requests per Second",
"type": "graph",
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests[1m])) by (ingress)",
"legendFormat": "{{ingress}}"
}
]
},
{
"title": "Upstream Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(nginx_ingress_controller_response_duration_seconds_bucket[5m])) by (le, ingress))",
"legendFormat": "P95 - {{ingress}}"
}
]
}
]
}
自动化运维工作流
最佳实践与故障排除
1. 性能优化建议
| 优化项 | 推荐配置 | 说明 |
|---|---|---|
| Worker进程数 | worker-processes: auto | 自动根据CPU核心数调整 |
| 连接超时 | proxy-connect-timeout: 30s | 适当延长连接超时时间 |
| 缓冲区大小 | proxy-buffer-size: 16k | 根据平均请求大小调整 |
| 上游连接池 | upstream-keepalive-connections: 200 | 减少TCP连接建立开销 |
| 日志级别 | `error-log-level |
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



