Kubelet v1.25.x源码——StatusManager

本文深入剖析了Kubernetes中kubelet组件的StatusManager模块,详细介绍了其工作原理、核心功能及其实现细节。文章重点解读了StatusManager如何与apiserver同步Pod状态,包括Start方法的运作流程、syncPod方法的具体实现等内容。

1. 环境说明

Kubernetes源码版本:remotes/origin/release-1.25
Kubernetes编译出来的Kubelet版本:Kubernetes v1.24.0-beta.0.2463+ee7799bab469d7
Kubernetes集群实验环境:使用Kubernetes v1.25.4二进制的方式搭建了一个单节点集群

K8S 单节点单节点搭建可以参考:Kubernetes v1.25 搭建单节点集群用于Debug K8S源码

Golang版本:go1.19.3 linux/amd64
IDEA版本:2022.2.3
Delve版本:1.9.1

[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]# dlv version
Delve Debugger
Version: 1.9.1
Build: $Id: d81b9fd12bfa603f3cf7a4bc842398bd61c42940 $
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]# go version
go version go1.19.3 linux/amd64
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:29:58Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]# kubectl get nodes -owide
NAME          STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master1   Ready    <none>   31h   v1.25.4   192.168.11.71   <none>        CentOS Linux 7 (Core)   3.10.0-1160.80.1.el7.x86_64   containerd://1.6.10
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]#
[root@k8s-master1 kubernetes]# kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
etcd-0               Healthy   {"health":"true","reason":""}
controller-manager   Healthy   ok
scheduler            Healthy   ok
[root@k8s-master1 kubernetes]#

Kubelet启动参数配置如下:

[root@k8s-master1 kubernetes]# ps -ef|grep "/usr/local/bin/kubelet"
root       7972      1  6 07:06 ?        00:00:06 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet-conf.yml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-labels=node.kubernetes.io/node= --v=8
root       9549   6424  0 07:07 pts/0    00:00:00 grep --color=auto /usr/local/bin/kubelet
[root@k8s-master1 kubernetes]#

Kubelet参数配置如下:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
address: 0.0.0.0
port: 10250
readOnlyPort: 10255
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.pem
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
contentType: application/vnd.kubernetes.protobuf
cpuCFSQuota: true
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 10s
enableControllerAttachDetach: true
enableDebuggingHandlers: true
enforceNodeAllocatable:
- pods
eventBurst: 10
eventRecordQPS: 5
evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 5m0s
failSwapOn: true
fileCheckFrequency: 20s
hairpinMode: promiscuous-bridge
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 20s
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
imageMinimumGCAge: 2m0s
iptablesDropBit: 15
iptablesMasqueradeBit: 14
kubeAPIBurst: 10
kubeAPIQPS: 5
makeIPTablesUtilChains: true
maxOpenFiles: 1000000
maxPods: 110
nodeStatusUpdateFrequency: 10s
oomScoreAdj: -999
podPidsLimit: -1
registryBurst: 10
registryPullQPS: 5
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 2m0s
serializeImagePulls: true
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
volumeStatsAggPeriod: 1m0s

2. 组件概览

StatusManager主要职责如下:

  • 1、StatusManager中维护了Pod的状态,该状态会和最新的v1.PodStatus保持同步
  • 2、于此同时,StatusManager会更新apiserver Pod的状态
    StatusManager组件

在这里插入图片描述

3. 源码分析

Manager

我们先来看看StatusManager的接口定义,如下:

  • Manager.Start:用于和apiserver同步状态
  • Manager.SetPodStatus:更新StatusManager缓存中该Pod的状态
  • Manager.SetContainerReadiness:更新StatusManager缓存中的Container状态,并且会触发状态更新
  • Manager.SetContainerStartup:更新StatusManager缓存中的Container的启动状态,并且会触发状态更新
  • Manager.TerminatePod:重置StatusManager缓存中的Container状态为Terminated,并且触发状态更新
  • Manager.RemoveOrphanedStatuses:扫描StatusManager.cache,并移除已经删除的Pod
  • PodStatusProvider.GetPodStatus:查询容器状态
// pkg/kubelet/status/status_manager.go

type Manager interface {
	PodStatusProvider
	Start()
	SetPodStatus(pod *v1.Pod, status v1.PodStatus)
	SetContainerReadiness(podUID types.UID, containerID kubecontainer.ContainerID, ready bool)
	SetContainerStartup(podUID types.UID, containerID kubecontainer.ContainerID, started bool)
	TerminatePod(pod *v1.Pod)
	RemoveOrphanedStatuses(podUIDs map[types.UID]bool)
}

type PodStatusProvider interface {
	GetPodStatus(uid types.UID) (v1.PodStatus, bool)
}

我们继续来看看StatusManager的具体实现,如下:

// pkg/kubelet/status/status_manager.go

type manager struct {
    // apiserver的客户端
	kubeClient clientset.Interface
	// PodManager用于维护StaticPod以及MirrorPod之间的映射
	podManager kubepod.Manager
	// Pod的状态缓存
	podStatuses      map[types.UID]versionedPodStatus
	// 读写锁,StatusManager的所有的方法都是线程安全的
	podStatusesLock  sync.RWMutex
	// Pod状态发生改变的消费方
	podStatusChannel chan podStatusSyncRequest
	// mirrorPod的版本缓存
	apiStatusVersions map[kubetypes.MirrorPodUID]uint64
	// 用于保证Pod可以被安全的删除
	podDeletionSafety PodDeletionSafetyProvider
}

我们继续看StatusManager的其余方法的实现

3.1. Start

Start

具体逻辑如下:

  • 1、首先判断是否已经初始化好了apiserver的客户端,如果没有初始化,直接退出。因为后续所有的的操作都需要和apiserver交互,所以这里直接退出也没啥
  • 2、只要从podStatusChannel拿到了同步请求,就调用syncPod同步
  • 3、每隔10s进行一次批量同步
// pkg/kubelet/status/status_manager.go

func (m *manager) Start() {
    // StatusManager本来就是用于更新Apiserver Pod状态的组件,现在如果apiserver的客户端没有初始化,那肯定需要直接退出
	if m.kubeClient == nil {
		klog.InfoS("Kubernetes client is nil, not starting status manager")
		return
	}

	klog.InfoS("Starting to sync pod status with apiserver")
    // 用于定期批量同步
	syncTicker := time.NewTicker(syncPeriod).C

    // 可以看到,StatusManager这辈子基本上是交代在这里了,就干两件事,其一是当有Pod更新的时候同步这个Pod, 其二就是每隔10秒钟批量同步
	go wait.Forever(func() {
		for {
			select {
			case syncRequest := <-m.podStatusChannel:
				klog.V(5).InfoS("Status Manager: syncing pod with status from podStatusChannel",
					"podUID", syncRequest.podUID,
					"statusVersion", syncRequest.status.version,
					"status", syncRequest.status.status)
				m.syncPod(syncRequest.podUID, syncRequest.status)
			case <-syncTicker:
				klog.V(5).InfoS("Status Manager: syncing batch")
				for i := len(m.podStatusChannel); i > 0; i-- {
					<-m.podStatusChannel
				}
				m.syncBatch()
			}
		}
	}, 0)
}

3.1.1. syncPod

syncPod

我们来看看SyncPod是如何把Pod的状态同步给apiserver的,主要逻辑如下:

  • 1、判断当前的Pod是否真的需要更新,如果不需要更新就直接返回
  • 2、根据当前PodUID,从apiserver中获取当前Pod
  • 3、如果从PodManager中获取的UID和当前的Pod UID不相等,说明StatusManager中的装状态不是最新的,直接删除老Pod的状态
  • 4、合并新老PodCondition给新的Pod
  • 5、更新apiserver中的Pod状态,更新StatusManager中的Pod状态
  • 6、如果当前Pod已经被删除,那么通过CRI接口删除底层Pod,然后调用apiserver的接口删除Pod
// pkg/kubelet/status/status_manager.go

func (m *manager) syncPod(uid types.UID, status versionedPodStatus) {
    // 先判断当前Pod是否需要更新,如果当前Pod的状态已经是最新的,那么直接退出,估计是通过版本来比较
	if !m.needsUpdate(uid, status) {
		klog.V(1).InfoS("Status for pod is up-to-date; skipping", "podUID", uid)
		return
	}

    // 从apiserver中获取当前Pod
	pod, err := m.kubeClient.CoreV1().Pods(status.podNamespace).Get(context.TODO(), status.podName, metav1.GetOptions{})
	if errors.IsNotFound(err) {// 如果当前Pod在apiserver中没有找到,说明这个Pod已经被删除,这里直接忽略更新
		return
	}
	if err != nil {
		return
	}

	translatedUID := m.podManager.TranslatePodUID(pod.UID)
	// 从这里可以看出,apiserver中的Pod一定是最新的
	if len(translatedUID) > 0 && translatedUID != kubetypes.ResolvedPodUID(uid) {
		klog.V(2).InfoS("Pod was deleted and then recreated, skipping status update",
			"pod", klog.KObj(pod),
			"oldPodUID", uid,
			"podUID", translatedUID)
		// 删除StatusManager中维护的老Pod的状态,相当于是更新StatusManager的Pod状态
		m.deletePodStatus(uid)
		return
	}

    // 合并新老Pod的Condition
	mergedStatus := mergePodStatus(pod.Status, status.status, m.podDeletionSafety.PodCouldHaveRunningContainers(pod))

    // 更新apiserver中该Pod的状态
	newPod, patchBytes, unchanged, err := statusutil.PatchPodStatus(context.TODO(), m.kubeClient, pod.Namespace, pod.Name, pod.UID, pod.Status, mergedStatus)
	klog.V(3).InfoS("Patch status for pod", "pod", klog.KObj(pod), "patch", string(patchBytes))

	if err != nil {
		klog.InfoS("Failed to update status for pod", "pod", klog.KObj(pod), "err", err)
		return
	}
	if unchanged {
		klog.V(3).InfoS("Status for pod is up-to-date", "pod", klog.KObj(pod), "statusVersion", status.version)
	} else {
		klog.V(3).InfoS("Status for pod updated successfully", "pod", klog.KObj(pod), "statusVersion", status.version, "status", mergedStatus)
		pod = newPod
	}

	m.apiStatusVersions[kubetypes.MirrorPodUID(pod.UID)] = status.version

    // 如果当前Pod需要被删除,那么就通知底层的CRI删除Pod
	if m.canBeDeleted(pod, status.status) {
		deleteOptions := metav1.DeleteOptions{
			GracePeriodSeconds: new(int64),
			Preconditions: metav1.NewUIDPreconditions(string(pod.UID)),
		}
		// 如果底层CRI删除Pod成功,就删除apiserver中保存的Pod
		err = m.kubeClient.CoreV1().Pods(pod.Namespace).Delete(context.TODO(), pod.Name, deleteOptions)
		if err != nil {
			klog.InfoS("Failed to delete status for pod", "pod", klog.KObj(pod), "err", err)
			return
		}
		klog.V(3).InfoS("Pod fully terminated and removed from etcd", "pod", klog.KObj(pod))
		m.deletePodStatus(uid)
	}
}
3.1.1.1. needsUpdate
needsUpdate

needUpdate方法主要是用于判断当前需要更新的Pod是否需要更新

func (m *manager) needsUpdate(uid types.UID, status versionedPodStatus) bool {
    // 猜的没错,果然是比较Pod的版本
	latest, ok := m.apiStatusVersions[kubetypes.MirrorPodUID(uid)]
	// 如果StatusManager的缓存中没找到当前Pod或者版本较小,那么肯定需要更新
	if !ok || latest < status.version {
		return true
	}

    // 从PodManager中获取当前Pod
	pod, ok := m.podManager.GetPodByUID(uid)
	if !ok { // 如果PodManager中没有找到这个Pod,则直接返回`False`,为啥呀? 没看懂
		return false
	}
	// 如果在PodManager中找到了当前Pod,
	return m.canBeDeleted(pod, status.status)
}

func (m *manager) canBeDeleted(pod *v1.Pod, status v1.PodStatus) bool {
    // 如果当前Pod被删除了,或者是MirrorPod,则直接返回false
	if pod.DeletionTimestamp == nil || kubetypes.IsMirrorPod(pod) {
		return false
	}
	return m.podDeletionSafety.PodResourcesAreReclaimed(pod, status)
}
3.1.1.2. mergePodStatus
mergePodStatus

该方法主要是用于合并新老PodCondition

func mergePodStatus(oldPodStatus, newPodStatus v1.PodStatus, couldHaveRunningContainers bool) v1.PodStatus {
    // 给定切片的长度,防止动态扩容造成性能损失
	podConditions := make([]v1.PodCondition, 0, len(oldPodStatus.Conditions)+len(newPodStatus.Conditions))

	for _, c := range oldPodStatus.Conditions {
		if !kubetypes.PodConditionByKubelet(c.Type) {
			podConditions = append(podConditions, c)
		}
	}
	for _, c := range newPodStatus.Conditions {
		if kubetypes.PodConditionByKubelet(c.Type) {
			podConditions = append(podConditions, c)
		}
	}
	newPodStatus.Conditions = podConditions


	if !podutil.IsPodPhaseTerminal(oldPodStatus.Phase) && podutil.IsPodPhaseTerminal(newPodStatus.Phase) {
		if couldHaveRunningContainers {
			newPodStatus.Phase = oldPodStatus.Phase
			newPodStatus.Reason = oldPodStatus.Reason
			newPodStatus.Message = oldPodStatus.Message
		}
	}

	if podutil.IsPodPhaseTerminal(newPodStatus.Phase) {
		if podutil.IsPodReadyConditionTrue(newPodStatus) || podutil.IsContainersReadyConditionTrue(newPodStatus) {
			containersReadyCondition := generateContainersReadyConditionForTerminalPhase(newPodStatus.Phase)
			podutil.UpdatePodCondition(&newPodStatus, &containersReadyCondition)

			podReadyCondition := generatePodReadyConditionForTerminalPhase(newPodStatus.Phase)
			podutil.UpdatePodCondition(&newPodStatus, &podReadyCondition)
		}
	}

	return newPodStatus
}

3.1.2. syncBatch

syncBatch

继续看看syncBatch究竟干了啥:

// pkg/kubelet/status/status_manager.go

func (m *manager) syncBatch() {
	var updatedStatuses []podStatusSyncRequest
	podToMirror, mirrorToPod := m.podManager.GetUIDTranslations()
	func() { // Critical section
		m.podStatusesLock.RLock()
		defer m.podStatusesLock.RUnlock()

        // 如果当前Pod在StatusManager中没找到,并且也不是MirrorPod,说明当前Pod早已经被删除了,这里更新Map,是否内存空间
		for uid := range m.apiStatusVersions {
			_, hasPod := m.podStatuses[types.UID(uid)]
			_, hasMirror := mirrorToPod[uid]
			if !hasPod && !hasMirror {
				delete(m.apiStatusVersions, uid)
			}
		}

        // 遍历所有的Pod,看看他们是否需要更新
		for uid, status := range m.podStatuses {
			syncedUID := kubetypes.MirrorPodUID(uid)
			if mirrorUID, ok := podToMirror[kubetypes.ResolvedPodUID(uid)]; ok {
				if mirrorUID == "" {
					klog.V(5).InfoS("Static pod does not have a corresponding mirror pod; skipping",
						"podUID", uid,
						"pod", klog.KRef(status.podNamespace, status.podName))
					continue
				}
				syncedUID = mirrorUID
			}
			if m.needsUpdate(types.UID(syncedUID), status) {
				updatedStatuses = append(updatedStatuses, podStatusSyncRequest{uid, status})
			} else if m.needsReconcile(uid, status.status) {
				delete(m.apiStatusVersions, syncedUID)
				updatedStatuses = append(updatedStatuses, podStatusSyncRequest{uid, status})
			}
		}
	}()

	for _, update := range updatedStatuses {
		klog.V(5).InfoS("Status Manager: syncPod in syncbatch", "podUID", update.podUID)
		m.syncPod(update.podUID, update.status)
	}
}

3.2. SetPodStatus

TODO

3.3. SetContainerReadiness

TODO

3.4. SetContainerStartup

TODO

3.5. TerminatePod

TODO

3.6. RemoveOrphanedStatuses

TODO

4. Reference

Kubernetes源码分析——kubelet

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值