kubernetes集群自动伸缩
这里的集群自动伸缩指的是根据集群的资源使用状况,自动的调整集群节点的数目,目的是充分利用集群资源,节省成本。主要应用下面两个场景:
- 集群资源不足,导致pod部署失败,自动弹出新节点
- 集群中有一些节点在很长一段时间内没有得到充分利用,它们的pod可以放置在其他现有节点上,并且删除该节点
目前主流方案是使用上面的cluster-autoscaler做到集群的自动伸缩,支持如下云供应商:
AliCloud
Azure
AWS
BaiduCloud
CloudStack
HuaweiCloud
Packet
IonosCloud
OVHcloud
下面是azure上的配置实例,cluster-autoscaler可以作为addon部署到集群中,下面是aks-engine部署模板中的cluster-autoscaler的配置
- aks-engine模板配置
"addons": [
{
"name": "cluster-autoscaler",
"enabled": true,
"pools": [
{
"name": "prdconapl",
"config": {
"min-nodes": "3",
"max-nodes": "500"
}
},
{
"name": "prdconeny",
"config": {
"min-nodes": "3",
"max-nodes": "3"
}
}
],
"config": {
"scan-interval": "1m"
}
}
集群配置两个agent pool,其中prdconapl的最小节点数为3,最大为500,prdconeny最大和最小都为1,也就是prdconeny不会进行伸缩。
- 部署集群
- 查看auto-sacaler
vmadmin@reh-connectivity-jumpbox:~$ kubectl get po -n kube-system | grep cluster-autoscaler
cluster-autoscaler-86744d8775-d7n8x 1/1 Running 0 17h
vmadmin@reh-connectivity-jumpbox:~$ kubectl describe po cluster-autoscaler -n kube-system
Name: cluster-autoscaler-86744d8775-d7n8x
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: k8s-master-81692357-0/172.16.2.235
Start Time: Tue, 05 Jan 2021 09:41:34 +0000
Labels: app=cluster-autoscaler
pod-template-hash=86744d8775
Annotations: kubernetes.io/psp: privileged
Status: Running
IP: 172.16.2.235
IPs:
IP: 172.16.2.235
Controlled By: ReplicaSet/cluster-autoscaler-86744d8775
Containers:
cluster-autoscaler:
Container ID: docker://fa45e88574d42655ac84002437b84044502e31a19a47a266649e64db0a53a952
Image: mcr.microsoft.com/oss/kubernetes/autoscaler/cluster-autoscaler:v1.17.3
Image ID: docker-pullable://mcr.microsoft.com/oss/kubernetes/autoscaler/cluster-autoscaler@sha256:288952aa6e7eba7b9a4f2bdac6fd0e96c0b58051b3539f1062444b8b8283b1c3
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--logtostderr=true
--cloud-provider=azure
--skip-nodes-with-local-storage=false
--scan-interval=1m
--expendable-pods-priority-cutoff=-10
--ignore-daemonsets-utilization=false
--ignore-mirror-pods-utilization=false
--max-autoprovisioned-node-group-count=15
--max-empty-bulk-delete=10
--max-failing-time=15m0s
--max-graceful-termination-sec=600
--max-inactivity=10m0s
--max-node-provision-time=15m0s
--max-nodes-total=0
--max-total-unready-percentage=45
--memory-total=0:6400000
--min-replica-count=0
--namespace=kube-system
--new-pod-scale-up-delay=0s
--node-autoprovisioning-enabled=false
--ok-total-unready-count=3
--scale-down-candidates-pool-min-count=50
--scale-down-candidates-pool-ratio=0.1
--scale-down-delay-after-add=10m0s
--scale-down-delay-after-delete=1m
--scale-down-delay-after-failure=3m0s
--scale-down-enabled=true
--scale-down-non-empty-candidates-count=30
--scale-down-unneeded-time=10m0s
--scale-down-unready-time=20m0s
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--skip-nodes-with-system-pods=true
--stderrthreshold=2
--unremovable-node-recheck-timeout=5m0s
--v=3
--write-status-configmap=true
--balance-similar-node-groups=true
--nodes=3:500:k8s-prdconapl-81692357-vmss
--nodes=1:1:k8s-prdconeny-81692357-vmss
State: Running
Started: Tue, 05 Jan 2021 09:41:54 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Environment:
ARM_CLOUD: AzureChinaCloud
ARM_SUBSCRIPTION_ID: <set to the key 'SubscriptionID' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_RESOURCE_GROUP: <set to the key 'ResourceGroup' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_TENANT_ID: <set to the key 'TenantID' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_CLIENT_ID: <set to the key 'ClientID' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_CLIENT_SECRET: <set to the key 'ClientSecret' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_VM_TYPE: <set to the key 'VMType' in secret 'cluster-autoscaler-azure'> Optional: false
ARM_USE_MANAGED_IDENTITY_EXTENSION: true
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
/var/lib/waagent/ from waagent (ro)
/var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-token-vbv8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-certificates.crt
HostPathType:
waagent:
Type: HostPath (bare host directory volume)
Path: /var/lib/waagent/
HostPathType:
cluster-autoscaler-token-vbv8b:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-autoscaler-token-vbv8b
Optional: false
QoS Class: Guaranteed
Node-Selectors: kubernetes.azure.com/role=master
kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master=true:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
可以看出,下面两个两个参数指定了vmss的节点个数范围:
–nodes=3:500:k8s-prdconapl-81692357-vmss
–nodes=1:1:k8s-prdconeny-81692357-vmss
伸缩测试
- 查看当前集群
vmadmin@reh-connectivity-jumpbox:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-81692357-0 Ready master 17h v1.17.11
k8s-master-81692357-1 Ready master 17h v1.17.11
k8s-master-81692357-2 Ready master 17h v1.17.11
k8s-prdconapl-81692357-vmss000000 Ready agent 17h v1.17.11
k8s-prdconapl-81692357-vmss000001 Ready agent 17h v1.17.11
k8s-prdconapl-81692357-vmss000002 Ready agent 17h v1.17.11
k8s-prdconeny-81692357-vmss000000 Ready agent 17h v1.17.11
- 部署测试的应用,比如nginx,并且配置resource
vmadmin@reh-connectivity-jumpbox:~$ kubectl create deployment nginx --image nginx
deployment.apps/nginx created
vmadmin@reh-connectivity-jumpbox:~$ kubectl set resources deployment nginx --limits=cpu=3000m,memory=5120Mi
deployment.apps/nginx resource requirements updated
vmadmin@reh-connectivity-jumpbox:~$ kubectl set resources deployment nginx --requests=cpu=2000m,memory=4096Mi
deployment.apps/nginx resource requirements updated
vmadmin@reh-connectivity-jumpbox:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5f6bdd864f-sg8wl 1/1 Running 0 82s
- 查看当前集群状况
vmadmin@reh-connectivity-jumpbox:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-81692357-0 Ready master 18h v1.17.11
k8s-master-81692357-1 Ready master 18h v1.17.11
k8s-master-81692357-2 Ready master 18h v1.17.11
k8s-prdconapl-81692357-vmss000000 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000001 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000002 Ready agent 18h v1.17.11
k8s-prdconeny-81692357-vmss000000 Ready agent 18h v1.17.11
- 调整nginx的replicas到50
vmadmin@reh-connectivity-jumpbox:~$ kubectl scale deployment/nginx --replicas=50
deployment.apps/nginx scaled
vmadmin@reh-connectivity-jumpbox:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5f6bdd864f-25pl6 0/1 Pending 0 32s
nginx-5f6bdd864f-2w2fl 0/1 ContainerCreating 0 33s
nginx-5f6bdd864f-4kw5l 1/1 Running 0 33s
nginx-5f6bdd864f-54qc2 0/1 Pending 0 32s
nginx-5f6bdd864f-5gqh8 0/1 Pending 0 32s
nginx-5f6bdd864f-62zz7 1/1 Running 0 33s
nginx-5f6bdd864f-6d9f2 1/1 Running 0 33s
nginx-5f6bdd864f-6wwp9 1/1 Running 0 33s
nginx-5f6bdd864f-72zzf 1/1 Running 0 33s
nginx-5f6bdd864f-78ztw 1/1 Running 0 33s
nginx-5f6bdd864f-7mnlk 1/1 Running 0 33s
nginx-5f6bdd864f-7qq7c 1/1 Running 0 33s
nginx-5f6bdd864f-9f86r 0/1 ContainerCreating 0 33s
nginx-5f6bdd864f-9smrz 1/1 Running 0 33s
nginx-5f6bdd864f-fwqkc 0/1 Pending 0 32s
nginx-5f6bdd864f-fzph7 1/1 Running 0 33s
nginx-5f6bdd864f-gcwpw 1/1 Running 0 33s
nginx-5f6bdd864f-gp78l 0/1 Pending 0 32s
vmadmin@reh-connectivity-jumpbox:~$ kubectl describe po nginx-5f6bdd864f-54qc2
Name: nginx-5f6bdd864f-54qc2
Namespace: default
Priority: 0
Node: <none>
Labels: app=nginx
pod-template-hash=5f6bdd864f
Annotations: kubernetes.io/psp: privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/nginx-5f6bdd864f
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Limits:
cpu: 3
memory: 5Gi
Requests:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mdkhc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-mdkhc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mdkhc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TriggeredScaleUp 73s cluster-autoscaler pod triggered scale-up: [{k8s-prdconapl-81692357-vmss 3->7 (max: 500)}]
Warning FailedScheduling 5s (x3 over 76s) default-scheduler 0/7 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 4 Insufficient cpu.
可以看出已经出现了有些pod因为资源不足处于Pending状态,并且触发了节点的扩展,prdconapl由3扩展到7台VM,
Events:
Type Reason Age From Message
Normal TriggeredScaleUp 73s cluster-autoscaler pod triggered scale-up: [{k8s-prdconapl-81692357-vmss 3->7 (max: 500)}]
Warning FailedScheduling 5s (x3 over 76s) default-scheduler 0/7 nodes are available: 3 node(s) had taints that the pod didn’t tolerate, 4 Insufficient cpu.
再次查看节点状况:
vmadmin@reh-connectivity-jumpbox:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-81692357-0 Ready master 18h v1.17.11
k8s-master-81692357-1 Ready master 18h v1.17.11
k8s-master-81692357-2 Ready master 18h v1.17.11
k8s-prdconapl-81692357-vmss000000 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000001 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000002 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000004 Ready <none> 12s v1.17.11
k8s-prdconapl-81692357-vmss000005 NotReady <none> 3s v1.17.11
k8s-prdconapl-81692357-vmss000006 Ready <none> 33s v1.17.11
k8s-prdconeny-81692357-vmss000000 Ready agent 18h v1.17.11
可以看出,prdconapl这个节点pool正在增加节点,扩展完成后再次确认应用状况
vmadmin@reh-connectivity-jumpbox:~$ kubectl get pods | grep Pending
可以看出,已经没有Pending的pod了。
- 恢复nginx的replicas到1
vmadmin@reh-connectivity-jumpbox:~$ kubectl scale deployment/nginx --replicas=1
deployment.apps/nginx scaled
vmadmin@reh-connectivity-jumpbox:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5f6bdd864f-2w2fl 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-5gqh8 1/1 Running 0 6m19s
nginx-5f6bdd864f-h7fg5 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-hz4k4 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-jv7rt 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-sgvr7 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-t292t 0/1 Terminating 0 6m20s
nginx-5f6bdd864f-xnz5j 0/1 Terminating 0 6m19s
- 默认配置,10分钟后会触发节点的缩放
I0106 03:58:48.247318 1 scale_down.go:431] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
I0106 03:58:48.247482 1 cluster.go:93] Fast evaluation: k8s-prdconapl-81692357-vmss000000 for removal
I0106 03:58:48.247541 1 cluster.go:124] Fast evaluation: node k8s-prdconapl-81692357-vmss000000 may be removed
I0106 03:58:48.247548 1 cluster.go:93] Fast evaluation: k8s-prdconapl-81692357-vmss000003 for removal
I0106 03:58:48.247572 1 cluster.go:124] Fast evaluation: node k8s-prdconapl-81692357-vmss000003 may be removed
I0106 03:58:48.247692 1 scale_down.go:716] k8s-prdconapl-81692357-vmss000000 was unneeded for 11m2.097579977s
I0106 03:58:48.247740 1 scale_down.go:716] k8s-prdconapl-81692357-vmss000003 was unneeded for 9m1.6866918s
I0106 03:58:48.247873 1 cluster.go:93] Detailed evaluation: k8s-prdconapl-81692357-vmss000000 for removal
I0106 03:58:48.247933 1 cluster.go:124] Detailed evaluation: node k8s-prdconapl-81692357-vmss000000 may be removed
I0106 03:58:48.247953 1 scale_down.go:827] Scale-down: removing node k8s-prdconapl-81692357-vmss000000, utilization: {0.04375 0.010228299454857455 0 cpu 0.04375}, pods to reschedule: kubernetes-dashboard/dashboard-metrics-scraper-95856bb87-lrxkp
vmadmin@reh-connectivity-jumpbox:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-81692357-0 Ready master 18h v1.17.11
k8s-master-81692357-1 Ready master 18h v1.17.11
k8s-master-81692357-2 Ready master 18h v1.17.11
k8s-prdconapl-81692357-vmss000000 NotReady agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000001 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000002 Ready agent 18h v1.17.11
k8s-prdconapl-81692357-vmss000003 Ready agent 17m v1.17.11
k8s-prdconeny-81692357-vmss000000 Ready agent 18h v1.17.11
可以看出,prdconapl的节点数由7又恢复到3。