错误信息
【出错信息】
在node上删除pod时状态一直为terminating,无法正常删除pod,只能利用--force
强制删除
【错误排查】
//
# systemctl status kubelet
Feb 08 15:57:09 mas02 kubelet[26122]: E0208 15:57:09.012020 26122 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"af342936-44b5-4cd7-abd9-fd251cf4d16f\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"nginx_default\\\" network: invalid version \\\"\\\": the version is empty\"" pod="default/nginx" podUID=af342936-44b5-4cd7-abd9-fd251cf4d16f
Feb 08 15:57:16 mas02 kubelet[26122]: I0208 15:57:16.006565 26122 cni.go:333] "CNI failed to retrieve network namespace path" err="cannot find network namespace for the terminated container \"e41b81153f770a1053f70233687b1c6d235b21af7b9f805fe17888bb61f88267\""
Feb 08 15:57:16 mas02 kubelet[26122]: E0208 15:57:16.010762 26122 cni.go:380] "Error deleting pod from network" err="invalid version \"\": the version is empty" pod="mas-mi/reco-rc-5db9d9cd54-h4bzm" podSandboxID={Type:docker ID:e41b81153f770a1053f70233687b1c6d235b21af7b9f805fe17888bb61f88267} podNetnsPath="" networkType="flannel" networkName="cbr0"
Feb 08 15:57:16 mas02 kubelet[26122]: E0208 15:57:16.011756 26122 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"reco-rc-5db9d9cd54-h4bzm_mas-mi\" network: invalid version \"\": the version is empty" podSandboxID="e41b81153f770a1053f70233687b1c6d235b21af7b9f805fe17888bb61f88267"
Feb 08 15:57:16 mas02 kubelet[26122]: E0208 15:57:16.011815 26122 kuberuntime_manager.go:989] "Failed to stop sandbox" podSandboxID={Type:docker ID:e41b81153f770a1053f70233687b1c6d235b21af7b9f805fe17888bb61f88267}
Feb 08 15:57:16 mas02 kubelet[26122]: E0208 15:57:16.011884 26122 kubelet.go:1789] failed to "KillPodSandbox" for "6ab8b258-e520-4bac-8f22-cb4b64aa8163" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"reco-rc-5db9d9cd54-h4bzm_mas-mi\" network: invalid version \"\": the version is empty"
可以看见时CNI网络配置存在问题,现在可以解决该问题了。。。。。
解决方案
重置kubernetes服务,重置网络
1 先驱逐node节点上的pods,根据需求驱逐,如果node上没有重要的pod,可不用驱逐
~]# kubectl drain mas02(node节点名称) --delete-local-data --ignore-daemonsets --force
2 重置node节点【注意:该步需要在对应的node上运行命令,非master节点】
~]# kubeadm reset
~]# systemctl stop kubelet
~]# systemctl stop docker
~]# rm -rf /var/lib/cni/
~]# rm -rf /var/lib/kubelet/*
~]# rm -rf /etc/cni/
~]# ifconfig cni0 down
~]# ifconfig flannel.1 down
~]# ifconfig docker0 down
~]# ip link delete cni0
~]# ip link delete flannel.1
根据自己的网卡名称进行操作,没有的则不需要理会
3 重新加入集群
// 在重置的node上运行
~]# systemctl start docker
~]# systemctl start kubelet
// 获取master节点token,在master节点运行
~]# kubeadm token create --print-join-command
// 复制生成的token在node节点上运行,添加到集群去
kubeadm join 192.168.86.23:6443 --token 1eu6i4.gjj6yl75baflwj82 --discovery-token-ca-cert-hash sha256:cf73314a511424366a1339d4f48277a38f8d472b9e92fadbd332f27625d03240
4 查看node节点是否运行正常,需要等待一两分钟再查
肯能出现的其他问题
Q1
Unable to update cni config: no networks found in /etc/cni/net.d
没有安装网络插件
下载 kube-flannel.yml或者calico.yaml
然后执行kubectl apply -f calico.yaml
或者执行kubectl apply -f kube-flannel.yml
这样等待1分钟
执行命令kubectl get nodes 状态就好了
2种插件随便选一个