生产环境 Kubespray 安装 Kubernetes 集群详解

以下是针对生产环境的 Kubespray 安装 Kubernetes 集群的深度优化指南,涵盖高可用架构、安全加固、性能调优和灾备方案,满足企业级需求。


一、生产环境架构设计

推荐拓扑
外部负载均衡器
Master 1
Master 2
Master 3
etcd集群
Worker Pool
关键组件
组件生产级配置说明
Master节点3节点(跨机架/可用区)避免单点故障
etcd集群3节点独立部署(SSD磁盘,低延迟网络)与Master分离,避免资源争用
负载均衡器HAProxy + Keepalived(Active-Standby)提供虚拟IP(VIP)
Worker节点按业务分池(CPU密集型/GPU/内存优化)资源隔离
网络插件Calico with IPIP/BGP模式支持NetworkPolicy

二、生产环境部署流程

1. 节点准备(所有节点)
# 禁用Swap并优化内核参数
sudo swapoff -a
sudo sed -i '/swap/s/^/#/' /etc/fstab

# 设置sysctl参数
cat <<EOF | sudo tee /etc/sysctl.d/99-k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
vm.swappiness = 0
vm.max_map_count = 262144
EOF
sudo sysctl -p /etc/sysctl.d/99-k8s.conf

# 安装基础工具
sudo apt-get update && sudo apt-get install -y \
    apt-transport-https ca-certificates curl \
    ipvsadm ipset conntrack ntp
2. 负载均衡器配置(HAProxy + Keepalived)

/etc/haproxy/haproxy.cfg

frontend k8s-api
    bind *:6443
    mode tcp
    default_backend k8s-masters

backend k8s-masters
    balance roundrobin
    mode tcp
    server master1 192.168.1.10:6443 check
    server master2 192.168.1.11:6443 check
    server master3 192.168.1.12:6443 check

/etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER   # 备节点设为BACKUP
    virtual_router_id 51
    priority 100   # 备节点设为更低值
    virtual_ipaddress {
        192.168.1.100/24
    }
    track_script {
        chk_haproxy
    }
}

三、Kubespray 生产级配置

1. 关键参数优化 (inventory/mycluster/group_vars)

all.yml

# 容器运行时
container_manager: containerd

# 网络插件(Calico生产推荐)
kube_network_plugin: calico
calico_ipip_mode: "CrossSubnet"  # 跨子网用IPIP,同子网用BGP
calico_vxlan_mode: "Never"

# 镜像仓库(私有仓库认证)
gcr_image_repo: "registry.example.com/google_containers"
docker_registry_auths:
  "registry.example.com":
    username: "user"
    password: "pass"

k8s-cluster.yml

# 高可用配置
kubernetes_ha_cluster: true
apiserver_loadbalancer_domain_name: "k8s-api.example.com"
loadbalancer_apiserver:
  address: 192.168.1.100  # VIP地址
  port: 6443

# 资源限制
kubelet_max_pods: 250
kubelet_pods_per_core: 10

# 安全加固
kube_encrypt_secret_data: true  # 加密Secrets
enable_pod_security_policies: true

etcd.yml

# etcd独立集群配置
etcd_deployment_type: host
etcd_data_dir: "/var/lib/etcd"
etcd_disk_priority: high
etcd_compaction_retention: "2"  # 小时为单位

四、安全加固方案

1. 证书管理
# 自定义CA根证书
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt

# Kubespray配置
kube_certificates_custom_ca: true
kube_certificates_ca_crt: "{{ lookup('file', '/path/to/ca.crt') }}"
kube_certificates_ca_key: "{{ lookup('file', '/path/to/ca.key') }}"
2. RBAC与策略
# 启用OPA Gatekeeper
gatekeeper_enabled: true
gatekeeper_version: v3.12.0

# 网络策略示例
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

五、监控与日志

1. 监控栈部署
# 启用Prometheus+Alertmanager
prometheus_enabled: true
alertmanager_enabled: true
grafana_enabled: true

# 关键告警规则
prometheus_alert_rules:
  - name: KubeAPIHighLatency
    expr: histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le) > 1
2. 日志收集
# EFK日志栈
efk_enabled: true
elasticsearch_data_storage_class: "ssd"
fluentd_logrotate_enabled: true
kibana_ingress_enabled: true

六、灾备与升级

1. 集群备份
# etcd快照备份
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://etcd1:2379 \
  --cacert=/etc/ssl/etcd/ca.crt \
  --cert=/etc/ssl/etcd/server.crt \
  --key=/etc/ssl/etcd/server.key

# Velero云原生备份
velero install \
  --provider aws \
  --bucket k8s-backup \
  --secret-file ./credentials
2. 滚动升级策略
# 分阶段升级
ansible-playbook upgrade-cluster.yml \
  --limit=workers_first \
  -e kube_version=v1.28.3

ansible-playbook upgrade-cluster.yml \
  --limit=masters \
  -e kube_version=v1.28.3

七、生产验证清单

  1. 高可用测试
    • 模拟Master节点宕机:systemctl stop kubelet
    • 验证API服务连续性:curl -k https://VIP:6443/healthz
  2. 故障恢复
    # 快速恢复etcd节点
    ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
      --data-dir /var/lib/etcd-new
    
  3. 性能压测
    kubectl apply -f https://k8s.io/examples/application/deployment.yaml
    kubectl run stress --image=loadimpact/k6 run -< script.js
    

八、关键生产建议

  1. 网络隔离
    • 使用Calico的 NetworkSet 隔离敏感Pod
    • 启用 egressGateway 控制出口流量
  2. 资源配额
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: prod-quota
    spec:
      hard:
        requests.cpu: "100"
        requests.memory: 200Gi
        limits.cpu: "200"
        limits.memory: 400Gi
    
  3. 审计日志
    # 启用K8s审计
    audit_enabled: true
    audit_log_maxbackup: 10
    audit_policy_path: "/etc/kubernetes/audit-policy.yaml"
    

:生产环境部署后,定期执行 kubespray-check 进行集群健康扫描,并建立持续集成流水线管理集群配置变更。

通过以上配置,可构建符合金融级要求的Kubernetes生产环境,满足等保2.0/PCI-DSS等合规标准。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值