EKS 训练营-健康检查(3)-优快云博客

本文链接：https://blog.youkuaiyun.com/wangzan18/article/details/118548901

本文通过实例介绍了Kubernetes中的Pod存活探测(Liveness Probes)和服务就绪探测(Readiness Probes)，演示了如何配置和使用这两种健康检查，以及如何处理健康检查失败的情况，确保应用的稳定性和高可用性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

介绍

默认情况下，K8s 会自动重启任何原因导致的宕机的容器实例。可以通过配置包括 Pod存活探测 以及 服务就绪探测 的健康检查服务来完成对应的工作。想详细了解原理的，可以参考 K8s 健康检查官方文档

Pod存活探测(Liveness probes)

1.创建项目

mkdir -p ~/environment/healthchecks

创建 yaml 文件

cd ~/environment/healthchecks
cat &lt;<eof> liveness-app.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-app
spec:
  containers:
  - name: liveness
    image: brentley/ecsdemo-nodejs
    livenessProbe:
      httpGet:
        path: /health
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5
EoF

2.部署和确认服务 Ready

cd ~/environment/healthchecks/
kubectl apply -f liveness-app.yaml

kubectl get pod liveness-app

返回类似如下

NAME           READY   STATUS              RESTARTS   AGE
liveness-app   0/1     ContainerCreating   0          1s

然后我们查看history

kubectl describe pod liveness-app

返回类似如下

Events:
  Type    Reason     Age   From                                                  Message
  ----    ------     ----  ----                                                  -------
  Normal  Scheduled  36s   default-scheduler                                     Successfully assigned default/liveness-app to ip-172-31-34-171.eu-west-1.compute.internal
  Normal  Pulling    35s   kubelet, ip-172-31-34-171.eu-west-1.compute.internal  Pulling image "brentley/ecsdemo-nodejs"
  Normal  Pulled     34s   kubelet, ip-172-31-34-171.eu-west-1.compute.internal  Successfully pulled image "brentley/ecsdemo-nodejs" in 877.182203ms
  Normal  Created    34s   kubelet, ip-172-31-34-171.eu-west-1.compute.internal  Created container liveness
  Normal  Started    34s   kubelet, ip-172-31-34-171.eu-west-1.compute.internal  Started container liveness

3.人为创造健康检查失败

kubectl get pod liveness-app
kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1

kubectl get pod liveness-app

4.跟踪日志

执行了步骤安的命令后，nodejs应用程序进入debug模式，不在响应健康检查的请求，所以造成Pod损坏的情况，我们可以通过查看日志跟踪详细过程

kubectl logs liveness-app
kubectl logs liveness-app --previous

会发现很多日志，其中有类似如下的章节

::ffff:172.31.34.171 - - [21/May/2021:06:49:06 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:11 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:16 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:21 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:26 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:31 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:36 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+"
::ffff:172.31.34.171 - - [21/May/2021:06:49:41 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+"
Starting debugger agent.
Debugger listening on [::]:5858

服务就绪探测(Readiness probes)

1.创建服务

cd ~/environment/healthchecks/

cat &lt;<eof> readiness-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: readiness-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: readiness-deployment
  template:
    metadata:
      labels:
        app: readiness-deployment
    spec:
      containers:
      - name: readiness-deployment
        image: alpine
        command: ["sh", "-c", "touch /tmp/healthy &amp;&amp; sleep 86400"]
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 5
          periodSeconds: 3
EoF

2.部署和检查服务

cd ~/environment/healthchecks/
kubectl apply -f readiness-deployment.yaml
kubectl get pods -l app=readiness-deployment
kubectl describe deployment readiness-deployment | grep Replicas:

检查出来的副本应该如下所示

Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable

3.人为创造健康检查失败

我们人为的删掉 /tmp/healthy 这个响应文件，会导致应用健康检查失败

# kubectl exec -it <your-readiness-pod-name> -- rm /tmp/healthy
kubectl exec -it readiness-deployment-644f56898d-4mcdk -- rm /tmp/healthy  
kubectl get pods -l app=readiness-deployment

这个时候去查看副本状态

kubectl describe deployment readiness-deployment | grep Replicas:

就会发现其中有一个异常

Replicas:               3 desired | 3 updated | 3 total | 2 available | 1 unavailable

4.修复错误

我们只需要进入那个pod，手工再创建对应的文件即可让应用健康检查恢复正常

kubectl exec -it readiness-deployment-644f56898d-4mcdk  -- touch /tmp/healthy  
kubectl get pods -l app=readiness-deployment
kubectl describe deployment readiness-deployment | grep Replicas:

恢复后，副本数又都变成了3个

Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable

清理环境

当你不需要此环境时，可以通过如下方式删除

cd ~/environment/healthchecks/
kubectl delete -f liveness-app.yaml  
kubectl delete -f readiness-deployment.yaml

欢迎大家扫码关注，获取更多信息

</eof