问题
当把 k8s 从 1.19 升级到 1.25 之后,发现监控中管理存储部分都是空值;container
、image
等都没有值。
kubectl get --raw /api/v1/nodes/node5/proxy/metrics/cadvisor | grep container_fs
...
container_fs_writes_total{container="",device="/dev/sda",id="/system.slice/kubelet.service",image="",name="",namespace="",pod=""} 664 1698223746230
container_fs_writes_total{container="",device="/dev/sda2",id="/",image="",name="",namespace="",pod=""} 423 1698223744850
container_fs_writes_total{container="",device="/dev/shm",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run/lock",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/run/snapd/ns",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
container_fs_writes_total{container="",device="/sys/fs/cgroup",id="/",image="",name="",namespace="",pod=""} 0 1698223744850
原因
Some metrics do not work for internal containerd
归纳总结就是 cadvisor\kubelet 没有完全适配 containerd
处理
思路: 使用 cadvisor 替换 kubelet 提供的 metrics
社区镜像
使用镜像: gcr.io/cadvisor/cadvisor:v0.45.0-containerd-cri(但是这个镜像获取到的是整个文件系统的值 fs_usage, 而不是 container 的值)
自编译 cadvisor
containerd-cri,branch 不要选错了,就container-cri
- 编译
- 修改
需要将’build/assets.sh’ 中的go install github.com/kevinburke/go-bindata/go-bindata@latest
为go install github.com/kevinburke/go-bindata@latest
否则编译出错
###
# make docker-build
>> building assets
go: downloading github.com/kevinburke/go-bindata v1.1.0
go: github.com/kevinburke/go-bindata/go-bindata@latest: module github.com/kevinburke/go-bindata@latest found (v1.1.0), but does not contain package github.com/kevinburke/go-bindata/go-bindata
make: *** [Makefile:65: assets] Error 1
make: *** [Makefile:75: docker-build] Error 2
# git diff build/assets.sh
diff --git a/build/assets.sh b/build/assets.sh
index 7faf3300..66f3dced 100755
--- a/build/assets.sh
+++ b/build/assets.sh
@@ -30,7 +30,7 @@ FORCE="${FORCE:-}" # Force assets to be rebuilt if FORCE=true
# Install while in a temp dir to avoid polluting go.mod/go.sum
pushd "${TMPDIR:-/tmp}" > /dev/null
-go install github.com/kevinburke/go-bindata/go-bindata@latest
+go install github.com/kevinburke/go-bindata@latest
popd > /dev/null
build_asset () {
# make docker-build
# _output/cadvisor --version
cAdvisor version v0.45.0-containerd-cri-dirty (f31dffa9)
- docker 镜像
直接make docker-image
一堆堆错误,难得折腾了.
FROM gcr.io/cadvisor/cadvisor:v0.45.0
COPY _output/cadvisor /usr/bin/cadvisor
docker build -t cadvisor:v0.45.0-containerd-cri .
部署
采用的kube-prometheus来部署 prometheus stack
- cadvisor.yaml
借鉴定义 cadvisor 服务的相关 yaml 文件: BUG, RKE1, Monitoring V2 RKE1 1.24 seems to be omitting relevant cadvisor container labels and metric series that break Monitoring V2 dashboards
apiVersion: v1
kind: ServiceAccount
metadata:
name: cadvisor
namespace: kube-system
---
---
apiVersion: v1
kind: Service
metadata:
name: cadvisor
labels:
app: cadvisor
namespace: kube-system
spec:
type: NodePort
selector:
name: cadvisor
ports:
- name: cadvisor
port: 8080
protocol: TCP
targetPort: 8080
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: kube-system
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
labels:
name: cadvisor
spec:
serviceAccountName: cadvisor
containers:
- name: cadvisor
image: cadvisor:v0.45.0-containerd-cri
args:
- --housekeeping_interval=10s
- --max_housekeeping_interval=15s
- --event_storage_event_limit=default=0
- --event_storage_age_limit=default=0
- --enable_metrics=app,cpu,disk,diskIO,memory,network,process
- --docker_only
- --store_container_labels=false
- --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
resources:
requests:
memory: 400Mi
cpu: 400m
limits:
memory: 2000Mi
cpu: 800m
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker
readOnly: true
- name: disk
mountPath: /dev/disk
readOnly: true
ports:
- name: http
containerPort: 8080
protocol: TCP
automountServiceAccountToken: false
terminationGracePeriodSeconds: 30
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/docker
- name: disk
hostPath:
path: /dev/disk
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: cadvisor
name: cadvisor
namespace: monitoring
spec:
endpoints:
- metricRelabelings:
- action: replace
sourceLabels:
- container_label_io_kubernetes_pod_name
targetLabel: pod
- action: replace
sourceLabels:
- container_label_io_kubernetes_container_name
targetLabel: container
- action: replace
sourceLabels:
- container_label_io_kubernetes_pod_namespace
targetLabel: namespace
- action: labeldrop
regex: container_label_io_kubernetes_pod_name
- action: labeldrop
regex: container_label_io_kubernetes_container_name
- action: labeldrop
regex: container_label_io_kubernetes_pod_namespace
port: cadvisor
relabelings:
- replacement: {{ product.global.cluster.name }}
targetLabel: k8scluster
- action: replace
sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: node
- action: replace
replacement: /metrics/cadvisor
sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
replacement: kubelet
sourceLabels:
- job
targetLabel: job
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: cadvisor
- 部署
kubectl delete -f
kubernetesControlPlane-serviceMonitorKubelet.yaml
kubectl apply -f cadvisor.yaml
# kubectl -n kube-system exec -it cadvisor-92nfp -- cadvisor --version
cAdvisor version v0.45.0-containerd-cri-dirty (f31dffa9)
### 10.233.100.186 为某个 cadvisor 的 podIP
# curl 10.233.100.186:8080/metrics | grep container_fs
......
container_fs_writes_total{container_label_io_kubernetes_container_name="vpn",container_label_io_kubernetes_pod_name="vpnserver-hnbkg",container_label_io_kubernetes_pod_namespace="default",device="/dev/mapper/ubuntu--vg-ubuntu--lv",id="/kubepods/burstable/podc2d8ad76-be77-4d03-ab67-3a8d4756cc11/89bb34b723fd0eb0a65a065d4eff686620c3da5ecbbc8f2cf0aa80838aa0718f",image="172.30.3.150/cloud/vpnserver@sha256:54f9ad7a9fa5ecfda183dfaca274358cc47f30706058a3f500556e49db0b9d39",name="k8s_vpnserver-hnbkg_default_c2d8ad76-be77-4d03-ab67-3a8d4756cc11_0"} 5.1353627e+07 1698227386170
container_fs_writes_total{container_label_io_kubernetes_container_name="vpn",container_label_io_kubernetes_pod_name="vpnserver-hnbkg",container_label_io_kubernetes_pod_namespace="default",device="/dev/sda",id="/kubepods/burstable/podc2d8ad76-be77-4d03-ab67-3a8d4756cc11/89bb34b723fd0eb0a65a065d4eff686620c3da5ecbbc8f2cf0aa80838aa0718f",image="172.30.3.150/cloud/vpnserver@sha256:54f9ad7a9fa5ecfda183dfaca274358cc47f30706058a3f500556e49db0b9d39",name="k8s_vpnserver-hnbkg_default_c2d8ad76-be77-4d03-ab67-3a8d4756cc11_0"} 0 1698227386170