Kube-Proxy IPVS模式源码分析

奇技指南

kube-proxy当前支持三种方式实现负载均衡,分别是: userspace, iptables, IPVS. 但前两者随着Service的数量增长,存在性能的瓶颈,在生产环境是不能接受的。所以本篇文章主要对IPVS模式进行源码分析。

本文转载自"360云计算"

kube-proxy 整体逻辑结构

640?wx_fmt=png

这张时序图描述了kube-proxy的整体逻辑结构,由于kub-proxy组件和其它的kube-* 组件一样都是使用pflag和cobra库去构建命令行应用程序。所以先简单介绍下该包的基本使用方式:

func main() {	
  command := &cobra.Command{	
    Use:   "echo [string to echo]",	
    Short: "Echo anything to the screen",	
    Long: `echo is for echoing anything back.Echo works a lot like print, except it has a child command.`,	
    Args: cobra.MinimumNArgs(1),	
    Run: func(cmd *cobra.Command, args []string) {	
      fmt.Println("Print: " + strings.Join(args, " "))	
    },	
  }	

	
  command.Execute()	
}

上面这段代码就是使用cobra包的一个最简单的例子,首先初始化Command结构,其中该结构中的Run就是最终要执行的真正逻辑。当初始化完成Command之后,通过commnad.Execute去启动应用程序。

在Command.Run中主要做了如下几件事,看下面的代码:

// Run runs the specified ProxyServer.	
func (o *Options) Run() error {	
  defer close(o.errCh)	
  	
  //....	
  proxyServer, err := NewProxyServer(o)	
  if err != nil {	
    return err	
  }	

	
  if o.CleanupAndExit {	
    return proxyServer.CleanupAndExit()	
  }	

	
  o.proxyServer = proxyServer	
  return o.runLoop()	
}
  1. 对ProxyServer实例进行初始化。

  2. 如果在启动kube-proxy服务时,CleanupAndExit参数设置为true,则会将userspace, iptables, ipvs三种模式之前设置的所有规则清除掉,然后直接退出。

  3. 如果在启动kube-proxy服务时,CleanupAndExit参数设置为flase,则会调用runLoop来启动ProxyServer服务。

首先先来看看ProxyServer的结构定义:

type ProxyServer struct {	
  Client                 clientset.Interface 	
  EventClient            v1core.EventsGetter	
  IptInterface           utiliptables.Interface	
  IpvsInterface          utilipvs.Interface	
  IpsetInterface         utilipset.Interface	
  execer                 exec.Interface	
  Proxier                proxy.ProxyProvider	
  Broadcaster            record.EventBroadcaster	
  Recorder               record.EventRecorder	
  ConntrackConfiguration kubeproxyconfig.KubeProxyConntrackConfiguration	
  Conntracker            Conntracker // if nil, ignored	
  ProxyMode              string	
  NodeRef                *v1.ObjectReference	
  CleanupIPVS            bool	
  MetricsBindAddress     string	
  EnableProfiling        bool	
  OOMScoreAdj            *int32	
  ConfigSyncPeriod       time.Duration	
  HealthzServer          *healthcheck.HealthzServer	
}
在ProxyServer结构中包含了与kube-apiserver通信的Client、操作Iptables的IptInterface、操作IPVS的IpvsInterface、操作IpSet的IpsetInterface,以及通过ProxyMode参数获取基于userspace, iptables, ipvs三种方式中的哪种使用的Proxier。
接下来重点介绍基于ipvs模式实现的Proxier, 在ipvs模式下Proxier结构的定义:
type Proxier struct {	
  endpointsChanges *proxy.EndpointChangeTracker	
  serviceChanges   *proxy.ServiceChangeTracker	

	
  //...	
  serviceMap   proxy.ServiceMap	
  endpointsMap proxy.EndpointsMap	
  portsMap     map[utilproxy.LocalPort]utilproxy.Closeable	
  	
  //...	
  syncRunner      *async.BoundedFrequencyRunner // governs calls to syncProxyRules	

	
  //...	
  iptables       utiliptables.Interface	
  ipvs           utilipvs.Interface	
  ipset          utilipset.Interface	
  exec           utilexec.Interface	
  	
  //...	
  ipvsScheduler  string	
}
在Proxier结构中,先介绍下
async.BoundedFrequencyRunner 
其它的在介绍ProxyServer.Run的时候介绍。

BoundedFrequencyRunner的定义结构如下:

type BoundedFrequencyRunner struct {	
  name        string        // the name of this instance	
  minInterval time.Duration // the min time between runs, modulo bursts	
  maxInterval time.Duration // the max time between runs	

	
  run chan struct{} // try an async run	

	
  mu      sync.Mutex  // guards runs of fn and all mutations	
  fn      func()      // function to run	
  lastRun time.Time   // time of last run	
  timer   timer       // timer for deferred runs	
  limiter rateLimiter // rate limiter for on-demand runs	
}

BoundedFrequencyRunner结构中的run会异步的去定期的执行任务fn,比如定期的执行proxier.syncProxyRules去创建或者更新VirtuaServer和RealServer并将VirtualServer的VIP绑定到dummy interface(kube-ipvs0)。

下面是在NewProxier方法中初始化BoundedFrequencyRunner对象的示例:

proxier.syncRunner = async.NewBoundedFrequencyRunner(	
    "sync-runner", proxier.syncProxyRules, minSyncPeriod, syncPeriod, burstSyncs)

其中:

minSyncPeriod: 规则最小的更新时间

syncPeriod: 规则最大更新时间

proxier.syncProxyRules: 同步规则的实现函数(也是kube-proxy基于ipvs同步规则的核心实现)

ProxyServer启动流程

这部分介绍下ProxyServer.Run的逻辑实现,ProxyServer启动流程如下图所示:

640?wx_fmt=png

在启动过程中,主要做了下面这几件事情:

1.启动健康检查服务HealthzServer.

接下来详细的介绍下[4-7]这几步的流程。

ServiceConfig的结构定义如下:

type ServiceConfig struct {	
  listerSynced  cache.InformerSynced	
  eventHandlers []ServiceHandler	
}

ServiceHandler的结构定义如下:

type ServiceHandler interface {	
  // OnServiceAdd is called whenever creation of new service object	
  // is observed.	
  OnServiceAdd(service *v1.Service)	
  // OnServiceUpdate is called whenever modification of an existing	
  // service object is observed.	
  OnServiceUpdate(oldService, service *v1.Service)	
  // OnServiceDelete is called whenever deletion of an existing service	
  // object is observed.	
  OnServiceDelete(service *v1.Service)	
  // OnServiceSynced is called once all the initial even handlers were	
  // called and the state is fully propagated to local cache.	
  OnServiceSynced()	
}

创建ServiceConfig实例对象的具体实现如下:

func NewServiceConfig(serviceInformer coreinformers.ServiceInformer, resyncPeriod time.Duration) *ServiceConfig {	
  result := &ServiceConfig{	
    listerSynced: serviceInformer.Informer().HasSynced,	
  }	

	
  serviceInformer.Informer().AddEventHandlerWithResyncPeriod(	
    cache.ResourceEventHandlerFuncs{	
      AddFunc:    result.handleAddService,	
      UpdateFunc: result.handleUpdateService,	
      DeleteFunc: result.handleDeleteService,	
    },	
    resyncPeriod,	
  )	

	
  return result	
}
首先通过执行
serviceInformer.Informer().HasSynced来将kubernetes下的所有Service资源同步到缓存listerSynced中。

其次为AddEventHandlerWithResyncPeriod添加针对Service对象,添加,更新,删除的事件触发函数。当Service有相应的触发动作
就会调用相应的函数:handleAddService、handleUpdateService和handleDeleteService。

我们看看handleAddService触发函数的实现逻辑,具体代码如下:

func (c *ServiceConfig) handleAddService(obj interface{}) {	
  service, ok := obj.(*v1.Service)	
  if !ok {	
    utilruntime.HandleError(fmt.Errorf("unexpected object type: %v", obj))	
    return	
  }	
  for i := range c.eventHandlers {	
    klog.V(4).Info("Calling handler.OnServiceAdd")	
    c.eventHandlers[i].OnServiceAdd(service)	
  }	
}

当watch到kubernetes集群中有新的Service被创建之后,会触发handleAddService函数,并在该函数中遍历eventHandlers分别去调用OnServiceAdd来对proxier结构中的serviceChanages进行更新并去同步相应的规则。

OnServiceAdd的具体实现逻辑如下:

// OnServiceAdd is called whenever creation of new service object is observed.	
func (proxier *Proxier) OnServiceAdd(service *v1.Service) {	
  proxier.OnServiceUpdate(nil, service)	
}	

	
// OnServiceUpdate is called whenever modification of an existing service object is observed.	
func (proxier *Proxier) OnServiceUpdate(oldService, service *v1.Service) {	
  if proxier.serviceChanges.Update(oldService, service) && proxier.isInitialized() {	
    proxier.syncRunner.Run()	
  }	
}
ServiceChangeTracker的结构定义如下:

// ServiceChangeTracker carries state about uncommitted changes to an arbitrary number of	
// Services, keyed by their namespace and name.	
type ServiceChangeTracker struct {	
  // lock protects items.	
  lock sync.Mutex	
  // items maps a service to its serviceChange.	
  items map[types.NamespacedName]*serviceChange	
  // makeServiceInfo allows proxier to inject customized information when processing service.	
  makeServiceInfo makeServicePortFunc	
  // isIPv6Mode indicates if change tracker is under IPv6/IPv4 mode. Nil means not applicable.	
  isIPv6Mode *bool	
  recorder   record.EventRecorder	
}

serviceChanage的结构定义如下:

// serviceChange contains all changes to services that happened since proxy rules were synced.  For a single object,	
// changes are accumulated, i.e. previous is state from before applying the changes,	
// current is state after applying all of the changes.	
type serviceChange struct {	
  previous ServiceMap	
  current  ServiceMap	
}

到这里在回过头来看上面的基于IPVS实现的Proxier的整体流程就完全通了,ProxyServer.Run函数在启动时,通过kubernetes LIST/WATCH机制去实时的感知kubernetes集群Service资源的变化,然后不断的在更新Proxier结构中的ServiceChanges,然后将变化的Service保存在ServiceChanges结构中的ServiceMap中,给后续的async.BoundedFrequencyRunner去执行同步规则函数syncProxyRules来使用。

8.endpointConfig的实现机制和serviceConfig的机制完全一样,这里就不在详细的介绍了。
9.上面做的所有预处理工作
会在informerFactory.Start这步生效。
10.birthCry的作用就是通过event的方式通知kubernetes, kube-proxy这边的所有准备工作都处理好了,我要启动了。
func (s *ProxyServer) birthCry() {	
  s.Recorder.Eventf(s.NodeRef, api.EventTypeNormal, "Starting", "Starting kube-proxy.")	
}

11.最终通过SyncLoop启动kube-proxy服务,并立刻执行syncProxyRules先来一遍同步再说.之后便会通过异步的方式定期的去同步IPVS, Iptables, Ipset的规则。

而syncProxyRules函数是kube-proxy实现的核心。主体逻辑是遍历ServiceMap并遍历ServiceMap下的endpointsMap及创建的Service类型(如: CLusterIP, Loadbalancer, NodePort)去分别创建相应的IPVS规则。

syncProxyRules的函数实现定义如下:

func (proxier *Proxier) syncProxyRules() {	
  //.....	

	
  // Build IPVS rules for each service.	
  for svcName, svc := range proxier.serviceMap {	
    //......	

	
    // Handle traffic that loops back to the originator with SNAT.	
    for _, e := range proxier.endpointsMap[svcName] {	
      //....	
    }	

	
    // Capture the clusterIP.	
    // ipset call	
    entry := &utilipset.Entry{	
      IP:       svcInfo.ClusterIP().String(),	
      Port:     svcInfo.Port(),	
      Protocol: protocol,	
      SetType:  utilipset.HashIPPort,	
    }	
    // add service Cluster IP:Port to kubeServiceAccess ip set for the purpose of solving hairpin.	
    // proxier.kubeServiceAccessSet.activeEntries.Insert(entry.String())	
    if valid := proxier.ipsetList[kubeClusterIPSet].validateEntry(entry); !valid {	
      klog.Errorf("%s", fmt.Sprintf(EntryInvalidErr, entry, proxier.ipsetList[kubeClusterIPSet].Name))	
      continue	
    }	
    proxier.ipsetList[kubeClusterIPSet].activeEntries.Insert(entry.String())	
    // ipvs call	
    serv := &utilipvs.VirtualServer{	
      Address:   svcInfo.ClusterIP(),	
      Port:      uint16(svcInfo.Port()),	
      Protocol:  string(svcInfo.Protocol()),	
      Scheduler: proxier.ipvsScheduler,	
    }	
    // Set session affinity flag and timeout for IPVS service	
    if svcInfo.SessionAffinityType() == v1.ServiceAffinityClientIP {	
      serv.Flags |= utilipvs.FlagPersistent	
      serv.Timeout = uint32(svcInfo.StickyMaxAgeSeconds())	
    }	
    // We need to bind ClusterIP to dummy interface, so set `bindAddr` parameter to `true` in syncService()	
    if err := proxier.syncService(svcNameString, serv, true); err == nil {	
      activeIPVSServices[serv.String()] = true	
      activeBindAddrs[serv.Address.String()] = true	
      // ExternalTrafficPolicy only works for NodePort and external LB traffic, does not affect ClusterIP	
      // So we still need clusterIP rules in onlyNodeLocalEndpoints mode.	
      if err := proxier.syncEndpoint(svcName, false, serv); err != nil {	
        klog.Errorf("Failed to sync endpoint for service: %v, err: %v", serv, err)	
      }	
    } else {	
      klog.Errorf("Failed to sync service: %v, err: %v", serv, err)	
    }	

	
    // Capture externalIPs.	
    for _, externalIP := range svcInfo.ExternalIPStrings() {	
      //....	
    }	

	
    // Capture load-balancer ingress.	
    for _, ingress := range svcInfo.LoadBalancerIPStrings() {	
      //.....	
    }	

	
    if svcInfo.NodePort() != 0 {	
      //....	
    }	
  }	

	
  // sync ipset entries	
  for _, set := range proxier.ipsetList {	
    set.syncIPSetEntries()	
  }	

	
  // Tail call iptables rules for ipset, make sure only call iptables once	
  // in a single loop per ip set.	
  proxier.writeIptablesRules()	

	
  // Sync iptables rules.	
  // NOTE: NoFlushTables is used so we don't flush non-kubernetes chains in the table.	
  proxier.iptablesData.Reset()	
  proxier.iptablesData.Write(proxier.natChains.Bytes())	
  proxier.iptablesData.Write(proxier.natRules.Bytes())	
  proxier.iptablesData.Write(proxier.filterChains.Bytes())	
  proxier.iptablesData.Write(proxier.filterRules.Bytes())	

	
}

总结

kube-proxy的代码逻辑还是比较简洁的,整体的思想就是kube-proxy服务去watch kubernetes集群的Service和Endpoint对象,当这两个资源对象有状态变化时,会把它们保存在ServiceMap和EndPonintMap中。

然后会通过async.BoundedFrequencyRunner去异步的执行syncProxyRules去下发规则。

最新活动

360技术嘉年华第八季——测试之美

精彩内容都在

360技术嘉年华第八季——测试之美

识别下方二维码或点击阅读原文,立即报名

640?wx_fmt=png

界世的你当不

只做你的肩膀

640?wx_fmt=jpeg
640?wx_fmt=jpeg

 360官方技术公众号 

技术干货|一手资讯|精彩活动

空·

640?wx_fmt=gif
点击"阅读原文"了解更多
640?wx_fmt=gif 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值