记录一次kubelet服务失败排查

再尝试使用k8s时,发现kubelet服务总是失败,

I1104 17:41:12.808190  800569 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I1104 17:41:12.808210  800569 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

使用systemctl status kubelet 查看服务发现 服务一直在restart

[root@localhost simple-kbs]# systemctl status kubelet 
● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) 
Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf 
Active: activating (auto-restart) (Result: exit-code) since Mon 2024-11-04 17:41:34 +08; 4s ago 
Docs: https://kubernetes.io/docs/ Process: 801942 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBEL> Main PID: 801942 (code=exited, status=1/FAILURE)

journalctl -u kubelet -b查看kubelet日志发现如下错误:

 localhost.localdomain kubelet[338368]: W1104 13:54:06.029002 338368 watcher.go:93] 
Error while processing event ("/sys/fs/cgroup/perf_event/kubepods-burstable-podb4e21f7e82a88d94f369f898a975087b.slice:cri-containerd:872396bd94fdce18688a0468b0545a71cf882471bd662b192e17355c3453c4f0": 0x40000100 == IN_CREATE|IN_ISDIR): 
inotify_add_watch /sys/fs/cgroup/perf_event/kubepods-burstable-podb4e21f7e82a88d94f369f898a975087b.slice:cri-containerd:872396bd94fdce18688a0468b0545a71cf882471bd662b192e17355c3453c4f0:
 no space left on device

错误信息

  • 错误日志显示 inotify_add_watch 调用失败,原因是 no space left on device
  1. 可能的原因

    • inotify 监视器数量限制:系统对 inotify 监视器的数量有默认限制,当超过这个限制时,新的监视器无法创建。
    • 文件系统空间不足:虽然错误信息提到的是 inotify,但也有可能是文件系统空间不足导致的问题。

解决方法

  • 增加 inotify 监视器数量

  • # 查看当前 inotify 监视器数量限制
    cat /proc/sys/fs/inotify/max_user_watches
    
    # 暂时增加限制
    sudo sysctl fs.inotify.max_user_watches=524288
    
    # 永久增加限制,编辑 /etc/sysctl.conf 文件
    echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
    sudo sysctl -p

    重启 kubelet 服务

  • sudo systemctl restart kubelet

    再次查看服务状态

  • ● kubelet.service - kubelet: The Kubernetes Node Agent
       Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disable>
      Drop-In: /usr/lib/systemd/system/kubelet.service.d
               └─10-kubeadm.conf
       Active: active (running) since Mon 2024-11-04 17:50:41 +08; 12s ago
         Docs: https://kubernetes.io/docs/
     Main PID: 824792 (kubelet)
        Tasks: 38 (limit: 797483)
       Memory: 52.4M
       CGroup: /system.slice/kubelet.service

    问题解决

问题2

[root@localhost scripts]# kubectl get nodes
E1106 21:42:36.951355   98750 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1106 21:42:36.951992   98750 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1106 21:42:36.953561   98750 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1106 21:42:36.955135   98750 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1106 21:42:36.956640   98750 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决办法

mkdir ~/.kube
cp /etc/kubernetes/kubelet.conf  ~/.kube/config

以下与错误无关可忽略

docker代理:

sudo vi /etc/docker/daemon.json
  • 
    
    {
              "experimental": true,
              "registry-mirrors": ["https://registry.cn-hangzhou.aliyuncs.com"],
              "iptables": false,
        "proxies": {
          "http-proxy": "http://127.0.0.1:7890",
          "https-proxy": "http://127.0.0.1:7890",
          "no-proxy": "127.0.0.0/8"
        }     
    }
    sudo systemctl daemon-reload
     
    sudo systemctl restart docker

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值