Containerd container_fs 缺失

文章讲述了在将k8s集群从1.19升级到1.25时,监控系统中关于containerd和cadvisor的metrics显示不完整。问题源于kubelet与containerd的适配问题。解决方法包括尝试使用cadvisor替换kubelet提供的metrics,自编译cadvisor以匹配containerd,以及调整docker镜像和kube-prometheus配置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题

当把 k8s 从 1.19 升级到 1.25 之后,发现监控中管理存储部分都是空值;containerimage等都没有值。

kubectl get --raw /api/v1/nodes/node5/proxy/metrics/cadvisor | grep container_fs
...
container_fs_writes_total{container="",device="/dev/sda",id="/system.slice/kubelet.service",image="",name="",namespace="",pod=""} 664 1698223746230
container_fs_writes_total{container="",device="/dev/sda2",id="/",image="",name="",namespace="",pod=""} 423 1698223744850
container_fs_writes_total{container="",device="/dev/shm",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run/lock",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run/snapd/ns",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/sys/fs/cgroup",id="/",image="",name="",namespace="",pod=""} 0 1698223744850

原因

Some metrics do not work for internal containerd
归纳总结就是 cadvisor\kubelet 没有完全适配 containerd

处理

思路: 使用 cadvisor 替换 kubelet 提供的 metrics

社区镜像

使用镜像: gcr.io/cadvisor/cadvisor:v0.45.0-containerd-cri(但是这个镜像获取到的是整个文件系统的值 fs_usage, 而不是 container 的值)

自编译 cadvisor

containerd-cri,branch 不要选错了,就container-cri

  • 编译
  1. 修改
    需要将’build/assets.sh’ 中的 go install github.com/kevinburke/go-bindata/go-bindata@latestgo install github.com/kevinburke/go-bindata@latest 否则编译出错
### 
# make docker-build
>> building assets
go: downloading github.com/kevinburke/go-bindata v1.1.0
go: github.com/kevinburke/go-bindata/go-bindata@latest: module github.com/kevinburke/go-bindata@latest found (v1.1.0), but does not contain package github.com/kevinburke/go-bindata/go-bindata
make: *** [Makefile:65: assets] Error 1
make: *** [Makefile:75: docker-build] Error 2

# git diff build/assets.sh
diff --git a/build/assets.sh b/build/assets.sh
index 7faf3300..66f3dced 100755
--- a/build/assets.sh
+++ b/build/assets.sh
@@ -30,7 +30,7 @@ FORCE="${FORCE:-}" # Force assets to be rebuilt if FORCE=true
 
 # Install while in a temp dir to avoid polluting go.mod/go.sum
 pushd "${TMPDIR:-/tmp}" > /dev/null
-go install github.com/kevinburke/go-bindata/go-bindata@latest
+go install github.com/kevinburke/go-bindata@latest
 popd > /dev/null
 
 build_asset () {

# make docker-build
# _output/cadvisor --version
cAdvisor version v0.45.0-containerd-cri-dirty (f31dffa9)

  1. docker 镜像
    直接 make docker-image一堆堆错误,难得折腾了.
FROM gcr.io/cadvisor/cadvisor:v0.45.0

COPY _output/cadvisor /usr/bin/cadvisor

docker build -t cadvisor:v0.45.0-containerd-cri .

部署

采用的kube-prometheus来部署 prometheus stack

  1. cadvisor.yaml
    借鉴定义 cadvisor 服务的相关 yaml 文件: BUG, RKE1, Monitoring V2 RKE1 1.24 seems to be omitting relevant cadvisor container labels and metric series that break Monitoring V2 dashboards
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cadvisor
  namespace: kube-system
---
---
apiVersion: v1
kind: Service
metadata:
  name: cadvisor
  labels:
    app: cadvisor
  namespace: kube-system
spec:
  type: NodePort
  selector:
    name: cadvisor
  ports:
  - name: cadvisor
    port: 8080
    protocol: TCP
    targetPort: 8080
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: kube-system
  annotations:
      seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
  selector:
    matchLabels:
      name: cadvisor
  template:
    metadata:
      labels:
        name: cadvisor
    spec:
      serviceAccountName: cadvisor
      containers:
      - name: cadvisor
        image: cadvisor:v0.45.0-containerd-cri
        args:
        - --housekeeping_interval=10s
        - --max_housekeeping_interval=15s
        - --event_storage_event_limit=default=0
        - --event_storage_age_limit=default=0
        - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
        - --docker_only
        - --store_container_labels=false
        - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
        resources:
          requests:
            memory: 400Mi
            cpu: 400m
          limits:
            memory: 2000Mi
            cpu: 800m
        volumeMounts:
        - name: rootfs
          mountPath: /rootfs
          readOnly: true
        - name: var-run
          mountPath: /var/run
          readOnly: true
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: docker
          mountPath: /var/lib/docker
          readOnly: true
        - name: disk
          mountPath: /dev/disk
          readOnly: true
        ports:
          - name: http
            containerPort: 8080
            protocol: TCP
      automountServiceAccountToken: false
      terminationGracePeriodSeconds: 30
      volumes:
      - name: rootfs
        hostPath:
          path: /
      - name: var-run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/lib/docker
      - name: disk
        hostPath:
          path: /dev/disk
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    name: cadvisor
  name: cadvisor
  namespace: monitoring
spec:
  endpoints:
  - metricRelabelings:
    - action: replace
      sourceLabels:
      - container_label_io_kubernetes_pod_name
      targetLabel: pod
    - action: replace
      sourceLabels:
      - container_label_io_kubernetes_container_name
      targetLabel: container
    - action: replace
      sourceLabels:
      - container_label_io_kubernetes_pod_namespace
      targetLabel: namespace
    - action: labeldrop
      regex: container_label_io_kubernetes_pod_name
    - action: labeldrop
      regex: container_label_io_kubernetes_container_name
    - action: labeldrop
      regex: container_label_io_kubernetes_pod_namespace
    port: cadvisor
    relabelings:
    - replacement: {{ product.global.cluster.name }}
      targetLabel: k8scluster
    - action: replace
      sourceLabels:
      - __meta_kubernetes_pod_node_name
      targetLabel: node
    - action: replace
      replacement: /metrics/cadvisor
      sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    - action: replace
      replacement: kubelet
      sourceLabels:
      - job
      targetLabel: job
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: cadvisor

  1. 部署
    kubectl delete -f kubernetesControlPlane-serviceMonitorKubelet.yaml
    kubectl apply -f cadvisor.yaml
# kubectl -n kube-system  exec -it cadvisor-92nfp -- cadvisor --version
cAdvisor version v0.45.0-containerd-cri-dirty (f31dffa9)
### 10.233.100.186 为某个 cadvisor 的 podIP
# curl 10.233.100.186:8080/metrics | grep container_fs
......
container_fs_writes_total{container_label_io_kubernetes_container_name="vpn",container_label_io_kubernetes_pod_name="vpnserver-hnbkg",container_label_io_kubernetes_pod_namespace="default",device="/dev/mapper/ubuntu--vg-ubuntu--lv",id="/kubepods/burstable/podc2d8ad76-be77-4d03-ab67-3a8d4756cc11/89bb34b723fd0eb0a65a065d4eff686620c3da5ecbbc8f2cf0aa80838aa0718f",image="172.30.3.150/cloud/vpnserver@sha256:54f9ad7a9fa5ecfda183dfaca274358cc47f30706058a3f500556e49db0b9d39",name="k8s_vpnserver-hnbkg_default_c2d8ad76-be77-4d03-ab67-3a8d4756cc11_0"} 5.1353627e+07 1698227386170
container_fs_writes_total{container_label_io_kubernetes_container_name="vpn",container_label_io_kubernetes_pod_name="vpnserver-hnbkg",container_label_io_kubernetes_pod_namespace="default",device="/dev/sda",id="/kubepods/burstable/podc2d8ad76-be77-4d03-ab67-3a8d4756cc11/89bb34b723fd0eb0a65a065d4eff686620c3da5ecbbc8f2cf0aa80838aa0718f",image="172.30.3.150/cloud/vpnserver@sha256:54f9ad7a9fa5ecfda183dfaca274358cc47f30706058a3f500556e49db0b9d39",name="k8s_vpnserver-hnbkg_default_c2d8ad76-be77-4d03-ab67-3a8d4756cc11_0"} 0 1698227386170
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值