在重启K8S集群的某一节点docker之后,使用命令获取pod状态的时候发现,大多数的pod状态为evicted。

原因
eviction,即驱赶的意思,意思是当节点出现异常时,kubernetes将有相应的机制驱赶该节点上的Pod。
多见于资源不足时导致的驱赶。
报错
因此查看kubelet日志发现有报错显示为Pod The node has condition: [DiskPressure]
因为前期没有规划好docker的目录以及根目录的空间,所以才会导致这样,没办法,只能手动清除docker缓存

解决方法:
一般是进入到 /var/lib/docker/ 路径下的大日志文件,可以进行删除。
切记测试、生产环境禁用docker system prune -a此命令,否则会导致难以估计的损失。
手动清除dokcer的缓存,docker system prune -a 删除的不仅是本地的docker缓存还包括运行的镜像以及容器信息,生产环境禁用。测试环境使用的话需要备份所有的数据,否则一旦删除会导致服务不可用。
执行此命令需谨慎
#此处是我的本地环境,嫌麻烦所以执行了该命令
[root@admin01 ~]# docker system prune -a
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
Deleted Containers:
fc063e94a5a92a252fb62619019f586a88441b6e4ce7ddb62968a8d501f881db
f2b974d984301d0fba101b82092182bec766637749b40d2c78cf4535f04488c5
6b01f13418bd1c2d1b8264409c09607009beed02f4a678eaddab48f3dc8bbbef
cecd93f8ffbdd8f026d6a63d0bf715fa98872f0c5d1166b047022fb86e1df9b9
381af286f088d8d9a4be9c069d00176f6abb8979658f5764d3dcd3c5193055ed
de7fe6a3a66d4678517faa806b7a5283216d0c42a66355ca6e73d40e37af3dd5
c5f17619509b7437777a5861716d588dedd3e69dbd1ee731db68496f7ffded23
ec83a4466f5bd452a1fdb051790e8fa33cd7a8ca08080e30edfa6ac1e2672047
8e963668647af18d6cbd5060f7304b22557d0d230f078f73e3fce093144fc59d
8f53318dc0f227fa00b0ba00fd40592da5016ac8976003bd765c0328fb40b32f
fce977ddbff5a16f147243f7ae5d4a0a630feb71438871a53a0dfe4bc184c580
3d2c267283655ed17c008c3cc18d33f2251e4ff42448546c353b71b92ddff7ca
25c15d9d465838f9fee0ee6e871be22219bcec649307f35fadbc66e888b8ec7d
6f57640d07a4b0906958eca8b016671eb346f01acce9025e26acb5b1ba717382
16d48725f2a5ef47105e9d64e951a5377c56bb5f42085267727f069ee73f52ba
3e0bbff72b9698e7b86cc2709a94b0c3adf100ef1c523a8e192dfc2fbccd864e
ef40e38aaae95d8570787a6ccc8843d47ee4b3d2242760d4044a4094a6def3b4
64e58cd0a24e16b96ccde87fc2b8ebc16c260a7f38bba974adcc5f77bdc12827
ed8aedd688d0fd25d4b075e8da8e47bbce64d2e92c89236681e0cdefa3f6b46e
Deleted Images:
untagged: rancher/calico-node:v3.16.5
untagged: rancher/calico-node@sha256:d4043f9fbff07c7e9203d99850e810accf631c801ac2ac1a5832a629b31b057e
deleted: sha256:c1fa37765208c82be03a2265e6aa3fec1dfbf22b52aea28278c2e49988b447de
deleted: sha256:08c2839e07c35ed7c86b8b2a361b0db87fbd3744c0d97bff48032199c11eac36
deleted: sha256:fab1a444aadca5f2b7b581f057b2bb1bc74a084bedee508beafb1187cc0da675
untagged: rancher/nginx-ingress-controller:nginx-0.35.0-rancher2
untagged: rancher/nginx-ingress-controller@sha256:aa1d8132020ce42023e9f3bb673d2cc4b4231db2736a40b3df93bb18e76985f7
deleted: sha256:1f0ca6d9911039e05d7f27d52078f72bb806917837f595071148d508be2c18e8
deleted: sha256:0ea076f4bc656f5ccc4a5fb4ec9efce8f8be165e32bb1248fcf1dc4fd6075ee3
deleted: sha256:782f9b9fa0c90b099a8a8b467c8e3d761d3d361ac851ee3de449662ab2753b43
deleted: sha256:d6a2e93f4f83557b4c06479b0dcdb87a69a0122a4e698552bd800c3b66b4cf9b
deleted: sha256:2174519046a12fe68a7a0e92dac6f99d6dbad4aaa3b2ca5c89a1758bf6f2c9e3
deleted: sha256:94c8edcd7fe9045af1bd6aded12c3ce5bc5cc4a4a40e6d39d652677c5305ac38
deleted: sha256:81758fbd91522cc2a34fb6a0118ecca15ca732ca689eb674f1c5a0457ee0779a
deleted: sha256:dde2bb53d1b4561d87f2ac31c2047b599bc3800d45df30d3a120949e21da5898
deleted: sha256:78db571e84b1e8923c123b08cb49c10ee58be5efe0d273e5fff5496468ff334c
deleted: sha256:755b7d17d3e3bb3c07617d5f86878240301b09fd2be299c332477dd7e64e9743
deleted: sha256:aa4abd16a196e7b2753d5cd581c869ea54c65bc073bcd15ae6eceb37b1ae8ae3
deleted: sha256:488efc67d702e9f9e56007c1fe64053eb0fddb13c496e9a32ab06f4f115338b7
deleted: sha256:3486d0b790e9dc61732835f87615186aad5c2cafdd245ef4718bc61d517faa40
deleted: sha256:661bcdb1a7ba15cbc6892c6549f618061a6c01c667c8da33548a4d43a9ccbfbc
deleted: sha256:6792cc8a31ff4cdabb616a78998d7a2294dd10a80f9b13028ee7b1c57dba6c0b
Total reclaimed space: 7.299GB
释放完磁盘后,接下来删掉所有的异常pod
[mcloud@admin01 rocky]$ kubectl get pods | grep Evicted | awk '{print $1}' | xargs kubectl delete pod
docker system prune后可以加额外的参数,如:
docker system prune -a : 一并清除所有未被使用的镜像和悬空镜像。
docker system prune -f : 用以强制删除,不提示信息。
docker image prune:删除悬空的镜像。
docker container prune:删除无用的容器。
--默认情况下docker container prune命令会清理掉所有处于stopped状态的容器
--如果不想那么残忍统统都删掉,也可以使用--filter标志来筛选出不希望被清理掉的容器。例子:清除掉所有停掉的容器,但24内创建的除外:
--$ docker container prune --filter "until=24h"
docker volume prune:删除无用的卷。
docker network prune:删除无用的网络
手动清除
对于悬空镜像和未使用镜像可以使用手动进行个别删除:
1、删除所有悬空镜像,不删除未使用镜像:
docker rmi $(docker images -f "dangling=true" -q)
2、删除所有未使用镜像和悬空镜像
docker rmi $(docker images -q)
3、清理卷
如果卷占用空间过高,可以清除一些不使用的卷,包括一些未被任何容器调用的卷(-v 详细信息中若显示 LINKS = 0,则是未被调用):
删除所有未被容器引用的卷:
docker volume rm $(docker volume ls -qf dangling=true)
4、容器清理
如果发现是容器占用过高的空间,可以手动删除一些:
删除所有已退出的容器:
docker rm -v $(docker ps -aq -f status=exited)
删除所有状态为dead的容器
docker rm -v $(docker ps -aq -f status=dead)
本文介绍了解决Kubernetes集群中Pod状态变为Evicted的问题。通过清理Docker缓存、删除未使用的镜像及容器等手段,释放节点资源,恢复正常运行。
3724

被折叠的 条评论
为什么被折叠?



