1. 原理介绍
- 设置 HPA 每次最小扩容 Pod 数为可用区数量,以期可用区间 Pod 同步扩容
- 设置 TopologySpreadConstraints 可用区分散 maxSkew 为 1,以尽可能可用区间 Pod 均匀分布
2. 实验验证
2.1. 准备 Kind 集群
准备如下配置文件,命名为 kind-cluster.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e
- role: worker
image: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e
labels:
topology.kubernetes.io/zone: "us-east-1a"
- role: worker
image: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e
labels:
topology.kubernetes.io/zone: "us-east-1c"
上述配置为集群定义了 2 个工作节点,并分别打上了不同的可用区标签。
执行如下命令创建该 Kubernetes 集群:
$ kind create cluster --config cluster-1.24.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
检查集群运行正常:
$ kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
kind-control-plane Ready control-plane 161m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-control-plane,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
kind-worker Ready <none> 160m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-worker,kubernetes.io/os=linux,topology.kubernetes.io/zone=us-east-1a
kind-worker2 Ready <none> 160m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-worker2,kubernetes.io/os=linux,topology.kubernetes.io/zone=us-east-1c
2.2. 安装 metrics-server 组件
HPA 依赖 metrics-server 提供监控指标,通过如下命令安装:
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
提示: 国内网络不能直接下载到 registry.k8s.io/metrics-server/metrics-server:v0.6.4
镜像,可以替换为等同的 shidaqiu/metrics-server:v0.6.4
。同时,关闭 tls 安全校验,如下图:
检查部署后的 metrics-server 运行正常:
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kind-control-plane 238m 5% 667Mi 8%
kind-worker 76m 1% 207Mi 2%
kind-worker2 41m 1% 110Mi 1%
2.3. 部署测试服务
准备如下 YAML,命名为 hpa-php-demo.yaml
注意:Deployment 的 topologySpreadConstraints 配置为可用区分散!
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-web-demo
spec:
selector:
matchLabels:
run: php-web-demo
replicas: 1
template:
metadata:
labels:
run: php-web-demo
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
run: php-web-demo
containers:
- name: php-web-demo
image: shidaqiu/hpademo:latest
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-web-demo
labels:
run: php-web-demo
spec:
ports:
- port: 80
selector:
run: php-web-demo
部署上述服务:
kubectl apply -f hpa-php-demo.yaml
2.4. 部署 HPA 配置
准备 HPA 配置文件,命名为 hpa-demo.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-web-demo
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-web-demo
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 15
- type: Pods
value: 2
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 2
periodSeconds: 15
selectPolicy: Max
部署上述 HPA 配置:
$ kubectl apply -f hpa-demo.yaml
上述 HPA 通过 scaleUp 和 scaleDown 定义了扩容和缩容的行为,每次扩容一倍或 2 个 Pod(取较大者),每次缩容一半或 2 个 Pod(取较大者)。
2.5. 验证扩容
扩容前,观察 Pod 分别运行在两个区:
$ kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
php-web-demo-d6d66c8d5-22tn6 1/1 Running 0 6m57s 10.244.2.3 kind-worker2 <none> <none>
php-web-demo-d6d66c8d5-tz8m9 1/1 Running 0 76s 10.244.1.3 kind-worker <none> <none>
给服务施加压力:
$ kubectl run -it --rm load-generator --image=busybox /bin/sh
进入容器后,执行如下脚本:
while true; do wget -q -O- http://php-web-demo; done
可以观察到 Pod 扩容时,同时在两个可用区进行,实现了可用区同步扩容的效果
停止施加压力,可以观察到 Pod 缩容保持了可用区分散的状态
3. 如何保证缩容后,Pod 仍在多可用区均匀分散?
可以考虑借助 descheduler 的 rebalance 能力,参考 https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#removepodsviolatingtopologyspreadconstraint
部署 descheduler
$ git clone https://github.com/kubernetes-sigs/descheduler.git
$ cd descheduler/charts/descheduler
$ helm upgrade --install descheduler .
部署前,可以修改 values.yaml 文件,关闭不需要的插件:
kind: Deployment # 设置为 Deployment 模式
...
replicas: 2 # 双副本
...
leaderElection:
enabled: true # 启用 leader 选举
leaseDuration: 15s
renewDeadline: 10s
retryPeriod: 2s
resourceLock: "leases"
resourceName: "descheduler"
resourceNamescape: "kube-system"
...
deschedulerPolicy:
strategies:
RemoveDuplicates:
enabled: false
RemovePodsHavingTooManyRestarts:
enabled: false
params:
podsHavingTooManyRestarts:
podRestartThreshold: 100
includingInitContainers: true
RemovePodsViolatingNodeTaints:
enabled: false
RemovePodsViolatingNodeAffinity:
enabled: false
params:
nodeAffinityType:
- requiredDuringSchedulingIgnoredDuringExecution
RemovePodsViolatingInterPodAntiAffinity:
enabled: false
RemovePodsViolatingTopologySpreadConstraint:
enabled: true # 只启用需要的可用区均衡插件
params:
includeSoftConstraints: true # 包含软限制
LowNodeUtilization:
enabled: false
params:
nodeResourceUtilizationThresholds:
thresholds:
cpu: 20
memory: 20
pods: 20
targetThresholds:
cpu: 50
memory: 50
pods: 50
...
反复扩容和缩容,观察 descheduler 自动将失衡的 Pod 重新均衡。
Tip: 由于 maxSkew 最小只能设置为 1,因此不同可用区偏差为 1 个 Pod 是会被认定为 Balance 平衡状态。