K8S集群中PLEG问题排查

文章讨论了在Kubernetes集群中遇到的PLEG不健康问题,通过源码分析发现是由于容器运行时(如Docker或containerd)超时和接受文件过多导致的。文章详细解释了kubelet的工作原理,以及如何定位和修复此问题,包括检查CRI服务端的OpenFilesLimit设置。

一、背景

k8s集群排障真的很麻烦

今天集群有同事找我,节点报 PLEG is not healthy 集群中有的节点出现了NotReady,这是什么原因呢?

二、kubernetes源码分析

PLEG is not healthy 也是一个经常出现的问题

POD 生命周期事件生成器

先说下PLEG 这部分代码在kubelet 里,我们看一下在kubelet中的注释:

// GenericPLEG is an extremely simple generic PLEG that relies solely on
// periodic listing to discover container changes. It should be used
// as temporary replacement for container runtimes do not support a proper
// event generator yet.
//
// Note that GenericPLEG assumes that a container would not be created,
// terminated, and garbage collected within one relist period. If such an
// incident happens, GenenricPLEG would miss all events regarding this
// container. In the case of relisting failure, the window may become longer.
// Note that this assumption is not unique -- many kubelet internal components
// rely on terminated containers as tombstones for bookkeeping purposes. The
// garbage collector is implemented to work with such situations. However, to
// guarantee that kubelet can handle missing container events, it is
// recommended to set the relist period short and have an auxiliary, longer
// periodic sync in kubelet as the safety net.
type GenericPLEG struct {
	// The period for relisting.
	relistPeriod time.Duration
	// The container runtime.
	runtime kubecontainer.Runtime
	// The channel from which the subscriber listens events.
	eventChannel chan *PodLifecycleEvent
	// The internal cache for pod/container information.
	podRecords podRecords
	// Time of the last relisting.
	relistTime atomic.Value
	// Cache for storing the runtime states required for syncing pods.
	cache kubecontainer.Cache
	// For testability.
	clock clock.Clock
	// Pods that failed to have their status retrieved during a relist. These pods will be
	// retried during the next relisting.
	podsToReinspect map[types.UID]*kubecontainer.Pod
}

也就是说kubelet 会定时把 拉取pod 的列表,然后记录下结果。

运行代码后会执行一个定时任务,定时调用relist函数

// Start spawns a goroutine to relist periodically.
func (g *GenericPLEG) Start() {
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值