kube-proxy: Failed to load kernel module ip_vs with modprobe.

本文详细介绍了kube-proxy在ipvs模式下因未能自动挂载内核模块而引发的问题,并提供了手动挂载所需内核模块的具体步骤。通过执行modprobe命令加载ip_vs、ip_vs_rr、ip_vs_wrr和ip_vs_sh模块,然后重启kube-proxy,成功解决了该问题,使kube-proxy能够在ipvs代理模式下正常运行。

偶遇 kube-proxy ipvs 模式不能正常工作,报错如下:

W0301 09:14:39.492670       1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.493401       1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.494103       1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.494841       1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.500439       1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.501168       1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.501883       1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0301 09:14:39.502612       1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules

这是由于 kube-proxy 容器没能自动挂载宿主机内核模块导致的。

可以手动在宿主机执行命令,挂载需要的内核模块:

[root@machine ~]# modprobe ip_vs
[root@machine ~]# modprobe ip_vs_rr
[root@machine ~]# modprobe ip_vs_wrr
[root@machine ~]# modprobe ip_vs_sh

然后,重启 kube-proxy,发现已经没有原来的警告,且工作在 ipvs 代理模式了

I0301 09:19:09.567473       1 server_others.go:176] Using ipvs Proxier.
W0301 09:19:09.567860       1 proxier.go:380] clusterCIDR not specified, unable to distinguish between internal and external traffic
W0301 09:19:09.567872       1 proxier.go:386] IPVS scheduler not specified, use rr by default
I0301 09:19:09.568520       1 server.go:562] Version: v1.14.2
I0301 09:19:09.580637       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0301 09:19:09.582465       1 config.go:102] Starting endpoints config controller
I0301 09:19:09.582487       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I0301 09:19:09.582549       1 config.go:202] Starting service config controller
I0301 09:19:09.582557       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0301 09:19:09.682604       1 controller_utils.go:1034] Caches are synced for service config controller
I0301 09:19:09.682604       1 controller_utils.go:1034] Caches are synced for endpoints config controller
<think>我们面对的是kubelet在启动容器时报告ImageInspectError错误,具体是镜像的ID或大小未设置。 根据日志,问题出在kube-scheduler组件的镜像上。 分析: 1. 错误信息表明kubelet无法检查镜像的元数据(ID和大小未设置),这通常意味着镜像不完整或损坏。 2. 可能原因:镜像下载中断、存储损坏、镜像仓库问题等。 解决方案: 步骤1:检查镜像是否存在 使用docker images查看镜像是否存在于本地。 步骤2:尝试重新拉取镜像 如果镜像存在,先删除再重新拉取;如果不存在,直接拉取。 步骤3:检查镜像仓库的可用性 确保镜像仓库(registry.aliyuncs.com)可以从节点访问。 步骤4:重启kubelet 在重新拉取镜像后,重启kubelet服务以重新启动静态Pod。 步骤5:如果问题依旧,考虑手动导入镜像(如果有备份)或使用其他镜像源。 详细命令如下:</think>### Kubernetes镜像元数据错误分析与解决方案 #### 错误原因深度分析 ```mermaid graph LR A[ImageInspectError] --> B[镜像元数据损坏] A --> C[镜像下载不完整] B --> B1[磁盘写入错误] B --> B2[存储驱动问题] C --> C1[网络中断] C --> C2[仓库响应超时] A --> D[kubelet镜像缓存不一致] D --> D1[容器运行时重启] D --> D2[kubelet升级后未清理] ``` 关键错误特征: ```bash Id or size of image... is not set ``` #### 详细解决步骤 1. **诊断镜像状态** ```bash # 检查镜像是否存在 docker images | grep 'kube-scheduler' # 验证镜像元数据完整性 docker inspect registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 | grep -E 'Id|Size' ``` 2. **强制刷新镜像缓存** ```bash # 停止kubelet sudo systemctl stop kubelet # 删除镜像缓存记录 sudo rm -f /var/lib/kubelet/cache/registry.aliyuncs.com/* # 重启容器运行时 sudo systemctl restart docker # 或containerd # 重新拉取问题镜像 docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 ``` 3. **修复损坏镜像** ```bash # 检查镜像层完整性 docker history registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 # 强制重新下载(即使存在) docker pull --force registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 # 如果仍失败,手动下载并导入 wget https://your-mirror/v1.18.8/kube-scheduler.tar docker load < kube-scheduler.tar ``` 4. **重建静态Pod** ```bash # 移动静态Pod定义(触发重建) sudo mkdir -p /etc/kubernetes/manifests_backup sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/manifests_backup/ # 等待10秒让kubelet检测变化 sleep 10 # 恢复Pod定义 sudo cp /etc/kubernetes/manifests_backup/kube-scheduler.yaml /etc/kubernetes/manifests/ ``` 5. **验证修复** ```bash # 检查调度器Pod状态 watch kubectl get pods -n kube-system -l component=kube-scheduler # 检查kubelet日志 sudo journalctl -u kubelet -f | grep -A 10 'kube-scheduler' ``` #### 高级预防措施 1. **配置镜像预检策略** ```bash # 在kubelet配置中添加镜像健康检查 sudo tee /etc/kubernetes/kubelet.conf <<EOF imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 imageMinimumGCAge: 2m0s EOF ``` 2. **设置镜像校验机制** ```bash # 启用Docker内容信任 export DOCKER_CONTENT_TRUST=1 docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 # 使用skopeo验证镜像签名 skopeo inspect docker://registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.8 ``` 3. **创建镜像健康监控** ```bash # 使用kubelet监控API curl -sSL --cacert /etc/kubernetes/pki/ca.crt \ --cert /etc/kubernetes/pki/kubelet.crt \ --key /etc/kubernetes/pki/kubelet.key \ https://localhost:10250/metrics | grep 'image_manager' ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值