prometheus-adapter原理分析

本文深入剖析了 Prometheus 与 prometheus-adapter 的工作原理及其实现机制,详细介绍了 Kubernetes API Server 中聚合层的功能及其如何通过 APIService 对象扩展 API 路径,并解释了 prometheus-adapter 如何实现自定义指标的查询与注册。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原理分析

prometheus+prometheus-adapter的工作原理

prometheus通过聚合层扩展kubenetes API原理

聚合层在 kube-apiserver 进程内运行。在扩展资源注册之前,聚合层不做任何事情。 要注册 API,用户必须添加一个 APIService 对象,用它来“申领” Kubernetes API 中的 URL 路径。 自此以后,聚合层将会把发给该 API 路径的所有内容 转发到已注册的 APIService。

总结原理如下:

  • aggregatorServer 通过 APIServices 对象关联到某个 Service 来进行请求的转发,其关联的 Service 类型进一步决定了请求转发的形式。aggregatorServer 包括一个 GenericAPIServer 和维护自身状态的 Controller。其中 GenericAPIServer 主要处理 apiregistration.k8s.io 组下的 APIService 资源请求,而Controller包括:
    • apiserviceRegistrationController:负责根据 APIService 定义的 aggregated server service 构建代理,将CR的请求转发给后端的 aggregated server
    • availableConditionController:维护 APIServices 的可用状态,包括其引用 Service 是否可用等
    • autoRegistrationController:用于保持 API 中存在的一组特定的 APIServices
    • crdRegistrationController:负责将 CRD GroupVersions 自动注册到 APIServices 中
    • openAPIAggregationController:将 APIServices 资源的变化同步至提供的 OpenAPI 文档
  • apiserviceRegistrationController 负责根据 APIService 定义的 aggregated server service 构建代理,将CR的请求转发给后端的 aggregated server。apiService 有两种类型:Local(Service为空)以及Service(Service非空)。apiserviceRegistrationController 负责对这两种类型 apiService 设置代理:Local 类型会直接路由给 kube-apiserver 进行处理;而Service类型则会设置代理并将请求转化为对 aggregated Service 的请求(proxyPath := "/apis/" + apiService.Spec.Group + "/" + apiService.Spec.Version),而请求的负载均衡策略则是优先本地访问 kube-apiserver(如果service为kubernetes default apiserver service:443) =>通过 service ClusterIP:Port 访问(默认)或者通过随机选择 service endpoint backend 进行访问:


画图说明aggregator

prometheus-adapter源码分析

代码地址:https://github.com/kubernetes-sigs/prometheus-adapter

main函数入口

文件位置:cmd/adapter/adapter.go

func main() {
	logs.InitLogs()
	defer logs.FlushLogs()

	// set up flags
	cmd := &PrometheusAdapter{
		PrometheusURL:         "https://localhost",
		MetricsRelistInterval: 10 * time.Minute,
	}
	cmd.Name = "prometheus-metrics-adapter"

	cmd.OpenAPIConfig = genericapiserver.DefaultOpenAPIConfig(generatedopenapi.GetOpenAPIDefinitions, openapinamer.NewDefinitionNamer(api.Scheme, customexternalmetrics.Scheme))
	cmd.OpenAPIConfig.Info.Title = "prometheus-metrics-adapter"
	cmd.OpenAPIConfig.Info.Version = "1.0.0"

    // 加载启动参数
	cmd.addFlags()
	cmd.Flags().AddGoFlagSet(flag.CommandLine) // make sure we get the klog flags
	if err := cmd.Flags().Parse(os.Args); err != nil {
  klog.Fatalf("unable to parse flags: %v", err)
	}

	// if --metrics-max-age is not set, make it equal to --metrics-relist-interval
	if cmd.MetricsMaxAge == 0*time.Second {
		cmd.MetricsMaxAge = cmd.MetricsRelistInterval
	}

	// 生成prometheus client,
	promClient, err := cmd.makePromClient()
	if err != nil {
		klog.Fatalf("unable to construct Prometheus client: %v", err)
	}

	// 加载prometheus-adapter metric配置文件
	if err := cmd.loadConfig(); err != nil {
		klog.Fatalf("unable to load metrics discovery config: %v", err)
	}

	// stop channel closed on SIGTERM and SIGINT
	stopCh := genericapiserver.SetupSignalHandler()

	// 指标生产者,获取metric指标并缓存,runner会定时更新metric指标库
	cmProvider, err := cmd.makeProvider(promClient, stopCh)
	if err != nil {
		klog.Fatalf("unable to construct custom metrics provider: %v", err)
	}

	// attach the provider to the server, if it's needed
	if cmProvider != nil {
		cmd.WithCustomMetrics(cmProvider)
	}

	// construct the external provider
	emProvider, err := cmd.makeExternalProvider(promClient, stopCh)
	if err != nil {
		klog.Fatalf("unable to construct external metrics provider: %v", err)
	}

	// attach the provider to the server, if it's needed
	if emProvider != nil {
		cmd.WithExternalMetrics(emProvider)
	}

	// attach resource metrics support, if it's needed
	if err := cmd.addResourceMetricsAPI(promClient, stopCh); err != nil {
		klog.Fatalf("unable to install resource metrics API: %v", err)
	}

	// run the server,注册自定义api到kube-apiserver中
	if err := cmd.Run(stopCh); err != nil {
		klog.Fatalf("unable to run custom metrics adapter: %v", err)
	}
}
  1. 运行参数
  2. prometheus规则配置
  3. 生成prometheus-client
  4. 根据规则生成指标
  5. 注册api接口到kube-apiserver

核心代码

prometheus-adapter指标规则

对prometheus的指标进行处理。查询规则

rules:
  - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
    seriesFilters: []
    resources:
      overrides:
        namespace:
          resource: namespace
        pod_name:
          resource: pod
    name:
      matches: ^container_(.*)_seconds_total$
      as: ""
    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
  - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
    seriesFilters:
      - isNot: ^container_.*_seconds_total$
    resources:
      overrides:
        namespace:
          resource: namespace
        pod_name:
          resource: pod
    name:
      matches: ^container_(.*)_total$
      as: ""
    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
  - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
    seriesFilters:
      - isNot: ^container_.*_total$
    resources:
      overrides:
        namespace:
          resource: namespace
        pod_name:
          resource: pod
    name:
      matches: ^container_(.*)$
      as: ""
    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>)
  - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
    seriesFilters:
      - isNot: .*_total$
    resources:
      template: <<.Resource>>
    name:
      matches: ""
      as: ""
    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
  - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
    seriesFilters:
      - isNot: .*_seconds_total
    resources:
      template: <<.Resource>>
    name:
      matches: ^(.*)_total$
      as: ""
    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
  - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
    seriesFilters: []
    resources:
      template: <<.Resource>>
    name:
      matches: ^(.*)_seconds_total$
      as: ""
    metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
resourceRules:
  cpu:
    containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
    nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
    resources:
      overrides:
        instance:
          resource: node
        namespace:
          resource: namespace
        pod_name:
          resource: pod
    containerLabel: container_name
  memory:
    containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
    nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
    resources:
      overrides:
        instance:
          resource: node
        namespace:
          resource: namespace
        pod_name:
          resource: pod
    containerLabel: container_name
  window: 1m

生成指标

文件位置:prometheus-adapter/cmd/adapter/adapter.go

func (cmd *PrometheusAdapter) makeProvider(promClient prom.Client, stopCh <-chan struct{}) (provider.CustomMetricsProvider, error) {
	if len(cmd.metricsConfig.Rules) == 0 {
		return nil, nil
	}

	if cmd.MetricsMaxAge < cmd.MetricsRelistInterval {
		return nil, fmt.Errorf("max age must not be less than relist interval")
	}
	
    // 获取资源对象
	// grab the mapper and dynamic client
	mapper, err := cmd.RESTMapper()
	if err != nil {
		return nil, fmt.Errorf("unable to construct RESTMapper: %v", err)
	}
	dynClient, err := cmd.DynamicClient()
	if err != nil {
		return nil, fmt.Errorf("unable to construct Kubernetes client: %v", err)
	}

    // 解析adapter规则配置
	// extract the namers
	namers, err := naming.NamersFromConfig(cmd.metricsConfig.Rules, mapper)
	if err != nil {
		return nil, fmt.Errorf("unable to construct naming scheme from metrics rules: %v", err)
	}

	// construct the provider and start it
	cmProvider, runner := cmprov.NewPrometheusProvider(mapper, dynClient, promClient, namers, cmd.MetricsRelistInterval, cmd.MetricsMaxAge)
	runner.RunUntil(stopCh)

	return cmProvider, nil
}



func (l *cachingMetricsLister) RunUntil(stopChan <-chan struct{}) {
	go wait.Until(func() {
		if err := l.updateMetrics(); err != nil {
			utilruntime.HandleError(err)
		}
	}, l.updateInterval, stopChan)
}
1. 生成adapter规则配置
2. 定时更新prometheus指标缓存

指标查询

指标查询接口

文件位置:/pkg/mod/sigs.k8s.io/custom-metrics-apiserver@v1.22.0/pkg/provider/interfaces.go

type CustomMetricsProvider interface {
	// GetMetricByName fetches a particular metric for a particular object.
	// The namespace will be empty if the metric is root-scoped.
	GetMetricByName(ctx context.Context, name types.NamespacedName, info CustomMetricInfo, metricSelector labels.Selector) (*custom_metrics.MetricValue, error)

	// GetMetricBySelector fetches a particular metric for a set of objects matching
	// the given label selector.  The namespace will be empty if the metric is root-scoped.
	GetMetricBySelector(ctx context.Context, namespace string, selector labels.Selector, info CustomMetricInfo, metricSelector labels.Selector) (*custom_metrics.MetricValueList, error)

	// ListAllMetrics provides a list of all available metrics at
	// the current time.  Note that this is not allowed to return
	// an error, so it is reccomended that implementors cache and
	// periodically update this list, instead of querying every time.
	ListAllMetrics() []CustomMetricInfo
}
指标查询时,通过hpa查询对应指标,具体实现在prometheus-adapter

文件位置:prometheus-adapter/pkg/custom-provider/provider.go

func (p *prometheusProvider) GetMetricBySelector(ctx context.Context, namespace string, selector labels.Selector, info provider.CustomMetricInfo, metricSelector labels.Selector) (*custom_metrics.MetricValueList, error) {
	// 获取先关资源
	resourceNames, err := helpers.ListObjectNames(p.mapper, p.kubeClient, namespace, selector, info)
	if err != nil {
		klog.Errorf("unable to list matching resource names: %v", err)
		// don't leak implementation details to the user
		return nil, apierr.NewInternalError(fmt.Errorf("unable to list matching resources"))
	}

	// 向prometheus查询metric指标
	queryResults, err := p.buildQuery(ctx, info, namespace, metricSelector, resourceNames...)
	if err != nil {
		return nil, err
	}

	// return the resulting metrics
	return p.metricsFor(queryResults, namespace, resourceNames, info, metricSelector)
}

prometheus-adapter API接口注册

参考

一文读懂 Kubernetes APIServer 原理 - k8s-kb - 博客园

### 安装 Prometheus Adapter 为了在 Kubernetes 集群中安装 Prometheus Adapter,可以遵循官方文档中的指导[^1]。通常情况下,Prometheus Adapter 的部署涉及几个关键步骤。 #### 准备工作 确保已经有一个运行良好的 Kubernetes 集群,并且已配置好 `kubectl` 工具以便能够访问该集群。另外,还需要确认 Prometheus 实例已经在集群内部署并正常运作。 #### 下载 Helm Chart 或 使用 YAML 文件 一种方法是通过 Helm 来简化这一过程;另一种则是直接利用预先准备好的 YAML 清单文件来定义资源对象。对于后者而言,可以从 GitHub 上获取最新的稳定版本清单文件: ```bash git clone https://github.com/DirectXMan12/k8s-prometheus-adapter.git cd k8s-prometheus-adapter/deploy/manifests/ ``` 或者下载特定版本的压缩包并解压至本地目录下操作。 #### 创建自定义指标适配器实例 接下来就是应用这些 YAML 文件以创建必要的服务账户、角色绑定以及 Deployment 资源等组件: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-adapter spec: replicas: 1 selector: matchLabels: app: prometheus-adapter ... --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-adapter ... --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus-adapter subjects: - kind: ServiceAccount name: prometheus-adapter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-rules ``` 上述片段展示了部分典型的资源配置示例[^2]。实际使用的具体参数可能会有所不同,请参照所选方式对应的最新指南调整设置。 #### 应用配置 一旦所有的 YAML 文件都准备好之后,则可以通过如下命令将其应用于当前上下文中指定的目标环境: ```bash kubectl apply -f . ``` 这将会依次读取同级目录下的每一个 `.yml` 或者 `.yaml` 结尾的文件并向 API Server 发送相应的 REST 请求完成相应资源对象的创建动作。 #### 测试验证 最后一步是要检验 Prometheus Adapter 是否成功启动并且处于健康状态。可通过查看 Pod 日志或执行简单的查询测试来进行初步判断: ```bash kubectl logs deployment/prometheus-adapter curl http://localhost:6060/metrics ``` 如果一切顺利的话,在返回的结果集中应该能看到由 adapter 提供的一些额外度量数据项。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值