1. 配置 Kubernetes API 访问
为了让外部的 Prometheus 能够使用 kubernetes_sd_configs
进行服务发现,你需要确保 Prometheus 可以访问 Kubernetes API 服务器,并且具备足够的权限。
1.1 创建 Kubernetes Service Account 并授予权限
首先,在 Kubernetes 集群中创建一个 ServiceAccount
和对应的 ClusterRoleBinding
,以便 Prometheus 能够访问 Kubernetes API 进行服务发现。
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: prom
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: prom
1.2 获取 Kubernetes API Server 的访问凭证
- 通过
kubectl
命令获取ServiceAccount
的 token:
1.24 之前的版本
kubectl -n prom get secret $(kubectl -n prom get sa/prometheus -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode
1.24 开始及以后
创建临时的 token,会过期
kubectl create token prometheus -n prom
创建永久 toke
在 Kubernetes 中,生成的 Token 默认是临时的。要生成永久的 Token,你需要为 ServiceAccount 创建一个与之关联的 Secret
,并确保 Token 没有过期时间。下面是生成永久 Token 的步骤。
1. 创建 ServiceAccount
首先,确保你已经为 **prometheus**
创建了一个 ServiceAccount。如果还没有,你可以使用以下命令创建:
kubectl create serviceaccount prometheus -n prom
2. 创建与 ServiceAccount 关联的 Secret
接下来,为这个 ServiceAccount 创建一个 **Secret**
,这个 **Secret**
会包含永久的 Token。
apiVersion: v1
kind: Secret
metadata:
name: prometheus-token
namespace: prom
annotations:
kubernetes.io/service-account.name: "prometheus"
type: kubernetes.io/service-account-token
将上述 YAML 文件保存为 **prometheus-token-secret.yaml**
,然后应用它:
kubectl apply -f prometheus-token-secret.yaml
3. 获取生成的永久 Token
应用上面的配置后,Kubernetes 会自动为 **prometheus**
ServiceAccount 生成一个永久的 Token。你可以使用以下命令获取它:
kubectl get secret prometheus-token -n prom -o go-template='{{.data.token | base64decode}}'
这个命令将输出一个长字符串,即为生成的 Token。
将 token 存放在文件中
mkdir -p /etc/prometheus/token
kubectl get secret prometheus-token -n prom -o go-template='{{.data.token | base64decode}}' > /etc/prometheus/token/prometheus_bearer_token
- 记录 Kubernetes API 服务器的地址:
kubectl cluster-info
创建 ca 文件
mkdir -p /etc/prometheus/certs
kubectl get configmap -n kube-system kube-root-ca.crt -o jsonpath='{.data.ca\.crt}' > /etc/prometheus/certs/ca.crt
chown prometheus.prometheus /etc/prometheus/certs/ca.crt
2. **配置 Prometheus 的 **kubernetes_sd_configs
在 Prometheus 的 prometheus.yml
配置文件中,配置 kubernetes_sd_configs
使用刚才获取的 API 访问凭证来采集 exporter 暴露的指标,因为 Prometheus 在 k8s 集群外部不方便访问 k8s 内部(当然可以用 LoadBalancer、Ingress 的形式暴露,但有些情况不适合用这些种方式,因为他们自带负载均衡的效果,而采集指标是采集每一个,不希望是负载均衡的方式采集),因此,采用通过 APIserver 服务发现和代理访问的方式采集 exporter 暴露的指标。
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
scrape_timeout: 10s
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Alertmanager configuration
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- rules/alert-rules-*.yml
- rules/record-rules-*.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
# 监控的是 k8s 中的资源对象 node,pod,service,endpoint,ingress等
- job_name: 'kube-state-metrics'
scheme: https
metrics_path: /api/v1/namespaces/prom/services/kube-state-metrics:8080/proxy/metrics
#metrics_path: /api/v1/namespaces/prom/services/kube-state-metrics:http/proxy/metrics
kubernetes_sd_configs:
- api_server: 'https://139.196.12.198:6443'
role: pod
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
relabel_configs:
- separator: ;
regex: (.*)
target_label: __address__
replacement: 139.196.12.198:6443
action: replace
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
# 监控 kubernetes 的 apiservers
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# 监控 kubelet 的 CAdvisor
- job_name: 'kubernetes-cadvisor'
honor_timestamps: true #表示 Prometheus 会遵循从监控目标返回的时间戳
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: 'https://139.196.12.198:6443'
role: node
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- separator: ;
regex: (.*)
target_label: __address__
replacement: 139.196.12.198:6443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
action: replace
# 监控 pod
- job_name: 'k8s-pods-metrics'
scheme: https
kubernetes_sd_configs:
- api_server: 'https://139.196.12.198:6443'
role: pod
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
bearer_token_file: /etc/prometheus/token/prometheus_bearer_token
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
relabel_configs:
- separator: ;
regex: (.*)
target_label: __address__
replacement: 139.196.12.198:6443
action: replace
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name]
action: replace
target_label: __metrics_path__
replacement: /api/v1/namespaces/${1}/pods/${2}/proxy/metrics
regex: (.+);(.+)
# AlertManager
- job_name: 'alertmanager'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
metrics_path: /metrics
static_configs:
- targets:
- localhost:9093
2.1 可选:通过 relabel_configs
进行过滤
你可以通过 relabel_configs
来进一步过滤或修改抓取的目标。例如,只抓取 node-exporter
服务的指标。
relabel_configs:
- source_labels: [__meta_kubernetes_node_label_name]
action: keep
regex: node-exporter
3. 验证配置
- 确保 Prometheus 配置文件语法正确,并重新启动 Prometheus 服务。
- 在 Prometheus 的 Web 界面 (
http://<prometheus-server>/targets
) 中检查是否成功发现了node-exporter
节点。
curl -k -H "Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" https://10.0.0.100:10250/metrics/cadvisor