kubectl命令出现:Unable to connect to the server: dial tcp 192.168.0.106:6443: i/o timeout

本文介绍如何在机器重启后遇到kubectl连接服务器超时的问题,通过创建kubeconfig文件、复制admin.conf并调整权限,以及重启dashboard代理,提供步骤解决方法。

机器重启后,输入kubectl命令,会出现报错的提示:Unable to connect to the server: dial tcp 192.168.0.106:6443: i/o timeout

经过多次试验,发现,重新执行以下就可以了:

  1. $ mkdir -p $HOME/.kube
  2. $ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  3. $ sudo chown $(id -u):$(id -g) $HOME/.kube/config

再重启一下dashboard:kubectl proxy --address='0.0.0.0'  --accept-hosts='^*$' &

guyue@k8s-worker01:~$ # 停止所有服务 sudo systemctl stop kubelet sudo systemctl stop cri-docker.service sudo systemctl stop containerd sudo systemctl stop docker # 彻底清理 kubeadm 配置 sudo kubeadm reset --force --cri-socket unix:///var/run/cri-dockerd.sock # 手动删除所有残留配置 sudo rm -rf /etc/kubernetes/* sudo rm -rf /var/lib/kubelet/* sudo rm -rf /var/lib/etcd/* sudo rm -rf /etc/cni/net.d/* # 清理 iptables 规则 sudo iptables -F sudo iptables -t nat -F sudo iptables -t mangle -F sudo iptables -t raw -F sudo systemctl daemon-reload default | xargs -r sudo ip netns delete Stopping 'docker.service', but its triggering units are still active: docker.socket [preflight] Running pre-flight checks W1202 13:26:55.625312 40333 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] Deleted contents of the etcd data directory: /var/lib/etcd [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" W1202 13:26:55.768797 40333 cleanupnode.go:99] [reset] Failed to remove containers: output: E1202 13:26:55.763484 40339 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: connection refused\"" filter="&PodSandboxFilter{Id:,State:nil,LabelSelector:map[string]string{},}" time="2025-12-02T13:26:55+08:00" level=fatal msg="listing pod sandboxes: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: connection refused\"" , error: exit status 1 [reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. guyue@k8s-worker01:~$ # 创建专用的 kubelet 配置目录 sudo mkdir -p /etc/systemd/system/kubelet.service.d/ # 创建正确的配置文件(使用 .conf 后缀) sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf <<EOF [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" Environment="KUBELET_EXTRA_ARGS=--container-runtime-endpoint=unix:///var/run/cri-dockerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 --v=2" EOF # 重新加载 systemd 配置 sudo systemctl daemon-reload # 验证配置 sudo systemctl cat kubelet [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" Environment="KUBELET_EXTRA_ARGS=--container-runtime-endpoint=unix:///var/run/cri-dockerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 --v=2" # /usr/lib/systemd/system/kubelet.service [Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/home/ Wants=network-online.target After=network-online.target [Service] ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target # /etc/systemd/system/kubelet.service.d/0-containerd.conf [Service] Environment="KUBELET_EXTRA_ARGS=--container-runtime-endpoint=unix:///var/run/cri> lines 1-20 [4]+ 已停止 sudo systemctl cat kubelet guyue@k8s-worker01:~$ # 创建或覆盖 /etc/default/kubelet(最高优先级配置) echo 'KUBELET_EXTRA_ARGS="--container-runtime-endpoint=unix:///var/run/cri-dockerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9"' | sudo tee /etc/default/kubelet # 启动 cri-dockerd sudo systemctl start cri-docker.service sudo systemctl enable cri-docker.service # 验证 CRI 连接 sudo crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock version sudo crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock info KUBELET_EXTRA_ARGS="--container-runtime-endpoint=unix:///var/run/cri-dockerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9" Version: 0.1.0 RuntimeName: docker RuntimeVersion: 29.0.0 RuntimeApiVersion: v1 { "status": { "conditions": [ { "type": "RuntimeReady", "status": true, "reason": "", "message": "" }, { "type": "NetworkReady", "status": false, "reason": "NetworkPluginNotReady", "message": "docker: network plugin is not ready: cni config uninitialized" } ] }, "config": { "sandboxImage": "registry.aliyuncs.com/google_containers/pause:3.10.1" } } guyue@k8s-worker01:~$ # 启动 kubelet sudo systemctl start kubelet sudo systemctl enable kubelet # 监控日志(在新终端中运行) sudo journalctl -u kubelet -f --since "now" 12月 02 13:27:38 k8s-worker01 kubelet[40971]: I1202 13:27:38.069109 40971 server.go:467] "Kubelet version" kubeletVersion="v1.28.2" 12月 02 13:27:38 k8s-worker01 kubelet[40971]: I1202 13:27:38.069243 40971 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" 12月 02 13:27:38 k8s-worker01 kubelet[40971]: I1202 13:27:38.069532 40971 server.go:630] "Standalone mode, no API client" 12月 02 13:27:38 k8s-worker01 kubelet[40971]: E1202 13:27:38.073889 40971 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" 12月 02 13:27:38 k8s-worker01 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE 12月 02 13:27:38 k8s-worker01 systemd[1]: kubelet.service: Failed with result 'exit-code'. 12月 02 13:27:38 k8s-worker01 systemd[1]: kubelet.service: Consumed 1.915s CPU time, 7.0M memory peak, 0B memory swap peak. 12月 02 13:27:48 k8s-worker01 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 42. 12月 02 13:27:48 k8s-worker01 systemd[1]: Started kubelet.service - kubelet: The Kubernetes Node Agent. 12月 02 13:27:48 k8s-worker01 kubelet[41091]: I1202 13:27:48.478732 41091 server.go:467] "Kubelet version" kubeletVersion="v1.28.2" 12月 02 13:27:48 k8s-worker01 kubelet[41091]: I1202 13:27:48.478879 41091 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" 12月 02 13:27:48 k8s-worker01 kubelet[41091]: I1202 13:27:48.479651 41091 server.go:630] "Standalone mode, no API client" 12月 02 13:27:48 k8s-worker01 kubelet[41091]: E1202 13:27:48.485767 41091 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService" 12月 02 13:27:48 k8s-worker01 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE 12月 02 13:27:48 k8s-worker01 systemd[1]: kubelet.service: Failed with result 'exit-code'. ^Z [5]+ 已停止 sudo journalctl -u kubelet -f --since "now" guyue@k8s-worker01:~$ # 在确认 kubelet 正常运行后(看到类似 "Starting to listen on 0.0.0.0:10250" 的日志) sudo kubeadm join 192.168.130.129:6443 \ --token h0epjb.94zz1xhekvf7nfzx \ --discovery-token-ca-cert-hash sha256:a7760aa4d05b6bf0af6be91267c86417999c998318bfe4a8afcb62b757b8f808 \ --cri-socket unix:///var/run/cri-dockerd.sock \ --ignore-preflight-errors=all [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
最新发布
12-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值