介绍
默认情况下,K8s 会自动重启任何原因导致的宕机的容器实例。可以通过配置包括 Pod存活探测
以及 服务就绪探测
的健康检查服务来完成对应的工作。想详细了解原理的,可以参考 K8s 健康检查官方文档
Pod存活探测(Liveness probes)
1.创建项目
mkdir -p ~/environment/healthchecks
创建 yaml 文件
cd ~/environment/healthchecks
cat <<eof> liveness-app.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-app
spec:
containers:
- name: liveness
image: brentley/ecsdemo-nodejs
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
EoF
2.部署和确认服务 Ready
cd ~/environment/healthchecks/
kubectl apply -f liveness-app.yaml
kubectl get pod liveness-app
返回类似如下
NAME READY STATUS RESTARTS AGE
liveness-app 0/1 ContainerCreating 0 1s
然后我们查看history
kubectl describe pod liveness-app
返回类似如下
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36s default-scheduler Successfully assigned default/liveness-app to ip-172-31-34-171.eu-west-1.compute.internal
Normal Pulling 35s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Pulling image "brentley/ecsdemo-nodejs"
Normal Pulled 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Successfully pulled image "brentley/ecsdemo-nodejs" in 877.182203ms
Normal Created 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Created container liveness
Normal Started 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Started container liveness
3.人为创造健康检查失败
kubectl get pod liveness-app
kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1
kubectl get pod liveness-app
4.跟踪日志
执行了步骤安的命令后,nodejs应用程序进入debug模式,不在响应健康检查的请求,所以造成Pod损坏的情况,我们可以通过查看日志跟踪详细过程
kubectl logs liveness-app
kubectl logs liveness-app --previous
会发现很多日志,其中有类似如下的章节
::ffff:172.31.34.171 - - [21/May/2021:06:49:06 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:11 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:16 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:21 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:26 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:31 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:36 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:41 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+"
Starting debugger agent.
Debugger listening on [::]:5858
服务就绪探测(Readiness probes)
1.创建服务
cd ~/environment/healthchecks/
cat <<eof> readiness-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: readiness-deployment
spec:
replicas: 3
selector:
matchLabels:
app: readiness-deployment
template:
metadata:
labels:
app: readiness-deployment
spec:
containers:
- name: readiness-deployment
image: alpine
command: ["sh", "-c", "touch /tmp/healthy && sleep 86400"]
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 3
EoF
2.部署和检查服务
cd ~/environment/healthchecks/
kubectl apply -f readiness-deployment.yaml
kubectl get pods -l app=readiness-deployment
kubectl describe deployment readiness-deployment | grep Replicas:
检查出来的副本应该如下所示
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
3.人为创造健康检查失败
我们人为的删掉 /tmp/healthy 这个响应文件,会导致应用健康检查失败
# kubectl exec -it <your-readiness-pod-name> -- rm /tmp/healthy
kubectl exec -it readiness-deployment-644f56898d-4mcdk -- rm /tmp/healthy
kubectl get pods -l app=readiness-deployment
这个时候去查看副本状态
kubectl describe deployment readiness-deployment | grep Replicas:
就会发现其中有一个异常
Replicas: 3 desired | 3 updated | 3 total | 2 available | 1 unavailable
4.修复错误
我们只需要进入那个pod,手工再创建对应的文件即可让应用健康检查恢复正常
kubectl exec -it readiness-deployment-644f56898d-4mcdk -- touch /tmp/healthy
kubectl get pods -l app=readiness-deployment
kubectl describe deployment readiness-deployment | grep Replicas:
恢复后,副本数又都变成了3个
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
清理环境
当你不需要此环境时,可以通过如下方式删除
cd ~/environment/healthchecks/
kubectl delete -f liveness-app.yaml
kubectl delete -f readiness-deployment.yaml
欢迎大家扫码关注,获取更多信息
