deployment controller简介
deployment controller是kube-controller-manager组件中众多控制器中的一个,是 deployment 资源对象的控制器,其通过对deployment、replicaset、pod三种资源的监听,当三种资源发生变化时会触发 deployment controller 对相应的deployment资源进行调谐操作,从而完成deployment的扩缩容、暂停恢复、更新、回滚、状态status更新、所属的旧replicaset清理等操作。
deployment controller架构图
deployment controller的大致组成和处理流程如下图,deployment controller对pod、replicaset和deployment对象注册了event handler,当有事件时,会watch到然后将对应的deployment对象放入到queue中,然后syncDeployment
方法为deployment controller调谐deployment对象的核心处理逻辑所在,从queue中取出deployment对象,做调谐处理。
deployment controller分析将分为两大块进行,分别是:
(1)deployment controller初始化与启动分析;
(2)deployment controller处理逻辑分析。
1.deployment controller初始化与启动分析
基于tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
直接看到startDeploymentController函数,作为deployment controller初始化与启动分析的入口。
startDeploymentController
startDeploymentController主要逻辑:
(1)调用deployment.NewDeploymentController新建并初始化DeploymentController;
(2)拉起一个goroutine,跑DeploymentController的Run方法。
// cmd/kube-controller-manager/app/apps.go
func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
if !ctx.AvailableResources[schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}] {
return nil, false, nil
}
dc, err := deployment.NewDeploymentController(
ctx.InformerFactory.Apps().V1().Deployments(),
ctx.InformerFactory.Apps().V1().ReplicaSets(),
ctx.InformerFactory.Core().V1().Pods(),
ctx.ClientBuilder.ClientOrDie("deployment-controller"),
)
if err != nil {
return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
}
go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
return nil, true, nil
}
1.1 deployment.NewDeploymentController
从deployment.NewDeploymentController
函数代码中可以看到,deployment controller注册了deployment、replicaset与pod对象的EventHandler,也即对这几个对象的event进行监听,把event放入事件队列并做处理。并且将dc.syncDeployment
方法赋值给dc.syncHandler
,也即注册为核心处理方法,在dc.Run
方法中会调用该核心处理方法来调谐deployment对象(核心处理方法后面会进行详细分析)。
// pkg/controller/deployment/deployment_controller.go
// NewDeploymentController creates a new DeploymentController.
func NewDeploymentController(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) {
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartLogging(klog.Infof)
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: client.CoreV1().Events("")})
if client != nil && client.CoreV1().RESTClient().GetRateLimiter() != nil {
if err := ratelimiter.RegisterMetricAndTrackRateLimiterUsage("deployment_controller", client.CoreV1().RESTClient().GetRateLimiter()); err != nil {
return nil, err
}
}
dc := &DeploymentController{
client: client,
eventRecorder: eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "deployment-controller"}),
queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "deployment"),
}
dc.rsControl = controller.RealRSControl{
KubeClient: client,
Recorder: dc.eventRecorder,
}
dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addDeployment,
UpdateFunc: dc.updateDeployment,
// This will enter the sync loop and no-op, because the deployment has been deleted from the store.
DeleteFunc: dc.deleteDeployment,
})
rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addReplicaSet,
UpdateFunc: dc.updateReplicaSet,
DeleteFunc: dc.deleteReplicaSet,
})
podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
DeleteFunc: dc.deletePod,
})
dc.syncHandler = dc.syncDeployment
dc.enqueueDeployment = dc.enqueue
dc.dLister = dInformer.Lister()
dc.rsLister = rsInformer.Lister()
dc.podLister = podInformer.Lister()
dc.dListerSynced = dInformer.Informer().HasSynced
dc.rsListerSynced = rsInformer.Informer().HasSynced
dc.podListerSynced = podInformer.Informer().HasSynced
return dc, nil
}
1.2 dc.Run
主要看到for循环处,根据workers的值(来源于kcm启动参数concurrent-deployment-syncs
配置),启动相应数量的goroutine,跑dc.worker
方法,主要是调用前面讲到的deployment controller核心处理方法dc.syncDeployment
。
// pkg/controller/deployment/deployment_controller.go
func (dc *DeploymentController) Run(workers int, stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
defer dc.queue.ShutDown()
klog.Infof("Starting deployment controller")
defer klog.Infof("Shutting down deployment controller")
if !cache.WaitForNamedCacheSync("deployment", stopCh, dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) {
return
}
for i := 0; i < workers; i++ {
go wait.Until(dc.worker, time.Second, stopCh)
}
<-stopCh
}
1.2.1 dc.worker
从queue队列中取出事件key,并调用dc.syncHandle
即dc.syncDeployment
做调谐处理。queue队列里的事件来源前面讲过,是deployment controller注册的deployment、replicaset与pod对象的EventHandler,它们的变化event会被监听到然后放入queue中。
// pkg/controller/deployment/deployment_controller.go
func (dc *DeploymentController) worker() {
for dc.processNextWorkItem() {
}
}
func (dc *DeploymentController) processNextWorkItem() bool {
key, quit := dc.queue.Get()
if quit {
return false
}
defer dc.queue.Done(key)
err := dc.syncHandler(key.(string))
dc.handleErr(err, key)
return true
}
2.deployment controller核心处理逻辑分析
进行核心处理逻辑分析前,先来了解几个关键概念。
几个关键概念
进行代码分析前,先来看几个关键的概念。
(1)最新的replicaset对象
怎样的replicaset对象是最新的呢?replicaset对象的pod template与deployment的一致,则代表该replicaset是最新的。
(2)旧的replicaset对象
怎样的replicaset对象是旧的呢?除去最新的replicaset对象,其余的都是旧的replicaset。
(3)ready状态的pod
pod对象的.status.conditions
中,type
为Ready
的condition
中,其status
属性值为True
,则代表该pod属于ready状态。
apiVersion: v1
kind: Pod
...
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-08-04T08:47:03Z"
status: "True"
type: Ready
...
而type
为Ready
的condition
中,其status
属性值会pod的各个容器都ready之后,将其值设置为True
。
pod里的容器何时ready?kubelet会根据容器配置的readiness probe就绪探测策略,在探测成功后更新pod的status将该容器设置为ready,yaml示例如下。
apiVersion: v1
kind: Pod
...
status:
...
containerStatuses:
- containerID: xxx
image: xxx
imageID: xxx
lastState: {}
name: test
ready: true
...
(4)available状态的pod
pod处于ready状态且已经超过了minReadySeconds
时间后,该pod即处于available状态。
syncDeployment
直接看到deployment controller核心处理方法syncDeployment。
主要逻辑:
(1)获取执行方法时的当前时间,并定义defer
函数,用于计算该方法总执行时间,也即统计对一个 deployment 进行同步调谐操作的耗时;
(2)根据 deployment 对象的命名空间与名称,获取 deployment 对象;
(3)调用dc.getReplicaSetsForDeployment
:对集群中与deployment对象相同命名空间下的所有replicaset对象做处理,若发现匹配但没有关联 deployment 的 replicaset 则通过设置 ownerReferences 字段与 deployment 关联,已关联但不匹配的则删除对应的 ownerReferences,最后获取返回集群中与 Deployment 关联匹配的 ReplicaSet对象列表;
(4)调用dc.getPodMapForDeployment
:根据deployment对象的selector,获取当前 deployment 对象关联的 pod,根据 deployment 所属的 replicaset 对象的UID
对 pod 进行分类并返回,返回值类型为map[types.UID][]*v1.Pod
;
(5)如果 deployment 对象的 DeletionTimestamp
属性值不为空,则调用dc.syncStatusOnly
,根据deployment 所属的 replicaset 对象,重新计算出 deployment 对象的status
字段值并更新,调用完成后,直接return,不继续往下执行;
(6)调用dc.checkPausedConditions
:检查 deployment 是否为pause
状态,是则更新deployment对象的status
字段值,为其添加pause
相关的condition
;
(7)判断deployment对象的.Spec.Paused
属性值,为true
时,则调用dc.sync
做处理,调用完成后直接return;
(8)调用getRollbackTo
检查deployment对象的annotations
中是否有以下key:deprecated.deployment.rollback.to
,如果有且值不为空,调用 dc.rollback
方法执行 回滚操作;
(9)调用dc.isScalingEvent
:检查deployment对象是否处于 scaling
状态,是则调用dc.sync
做扩缩容处理,调用完成后直接return;
(10)判断deployment对象的更新策略,当更新策略为Recreate
时调用dc.rolloutRecreate
做进一步处理,也即对deployment进行recreate更新处理;当更新策略为RollingUpdate
时调用dc.rolloutRolling
做进一步处理,也即对deployment进行滚动更新处理。
// pkg/controller/deployment/deployment_controller.go
// syncDeployment will sync the deployment with the given key.
// This function is not meant to be invoked concurrently with the same key.
func (dc *DeploymentController) syncDeployment(key string) error {
startTime := time.Now()
klog.V(4).Infof("Started syncing deployment %q (%v)", key, startTime)
defer func() {
klog.V(4).Infof("Finished syncing deployment %q (%v)", key, time.Since(startTime))
}()
namespace, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
return err
}
deployment, err := dc.dLister.Deployments(namespace).Get(name)
if errors.IsNotFound(err) {
klog.V(2).Infof("Deployment %v has been deleted", key)
return nil
}
if err != nil {
return err
}
// Deep-copy otherwise we are mutating our cache.
// TODO: Deep-copy only when needed.
d := deployment.DeepCopy()
everything := metav1.LabelSelector{}
if reflect.DeepEqual(d.Spec.Selector, &everything) {
dc.eventRecorder.Eventf(d, v1.EventTypeWarning, "SelectingAll", "This deployment is selecting all pods. A non-empty selector is required.")
if d.Status.ObservedGeneration < d.Generation {
d.Status.ObservedGeneration = d.Generation
dc.client.AppsV1().Deployments(d.Namespace).UpdateStatus(d)
}
return nil