【实践笔记】Telegraf如何采集docker容器性能指标

本文介绍了如何在已部署Grafana+Telegraf+Influxdb的环境中,通过配置Telegraf的[[inputs.docker]]插件,解决权限问题,实现对Docker容器性能指标的采集。在解决过程中,涉及到修改docker配置文件以开启TCP监听端口,解决socket冲突,并最终成功采集数据。
部署运行你感兴趣的模型镜像

这里写自定义目录标题

问题背景

目标:实现对 docker 容器的性能监控
背景:已部署 GTI(Grafana+Telegraf+Influxdb)
问题:如何让 telegraf 能够采集到 docker 容器的性能指标?

解决过程

已知 telegraf 自带多种插件可用于对多种类型对象的指标采集,查看 /etc/telegraf/telegraf.conf 配置文件能发现配置项 [[inputs.docker]],去掉该项注释并合理进行配置即可。较完整的配置项内容如下:

[[inputs.docker]]
#   ## Docker Endpoint
#   ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
#   ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
#   endpoint = "unix:///var/run/docker.sock"
#
#   ## Set to true to collect Swarm metrics(desired_replicas, running_replicas)
#   gather_services = false
#
#   ## Only collect metrics for these containers, collect all if empty
#   container_names = []
#
#   ## Set the source tag for the metrics to the container ID hostname, eg first 12 chars
#   source_tag = false
#
#   ## Containers to include and exclude. Globs accepted.
#   ## Note that an empty array for both will include all containers
#   container_name_include = []
#   container_name_exclude = []
#
#   ## Container states to include and exclude. Globs accepted.
#   ## When empty only containers in the "running" state will be captured.
#   ## example: container_state_include = ["created", "restarting", "running", "removing", "paused", "exited", "dead"]
#   ## example: container_state_exclude = ["created", "restarting", "running", "removing", "paused", "exited", "dead"]
#   # container_state_include = []
#   # container_state_exclude = []
#
#   ## Timeout for docker list, info, and stats commands
#   timeout = "5s"
#
#   ## Whether to report for each container per-device blkio (8:0, 8:1...),
#   ## network (eth0, eth1, ...) and cpu (cpu0, cpu1, ...) stats or not.
#   ## Usage of this setting is discouraged since it will be deprecated in favor of 'perdevice_include'.
#   ## Default value is 'true' for backwards compatibility, please set it to 'false' so that 'perdevice_include' setting
#   ## is honored.
#   perdevice = false
#
#   ## Specifies for which classes a per-device metric should be issued
#   ## Possible values are 'cpu' (cpu0, cpu1, ...), 'blkio' (8:0, 8:1, ...) and 'network' (eth0, eth1, ...)
#   ## Please note that this setting has no effect if 'perdevice' is set to 'true'
#   perdevice_include = ["cpu"]
#
#   ## Whether to report for each container total blkio and network stats or not.
#   ## Usage of this setting is discouraged since it will be deprecated in favor of 'total_include'.
#   ## Default value is 'false' for backwards compatibility, please set it to 'true' so that 'total_include' setting
#   ## is honored.
#   total = true
#
#   ## Specifies for which classes a total metric should be issued. Total is an aggregated of the 'perdevice' values.
#   ## Possible values are 'cpu', 'blkio' and 'network'
#   ## Total 'cpu' is reported directly by Docker daemon, and 'network' and 'blkio' totals are aggregated by this plugin.
#   ## Please note that this setting has no effect if 'total' is set to 'false'
#   total_include = ["cpu", "blkio", "network"]
#
#   ## Which environment variables should we use as a tag
#   ##tag_env = ["JAVA_HOME", "HEAP_SIZE"]
#
#   ## docker labels to include and exclude as tags.  Globs accepted.
#   ## Note that an empty array for both will include all labels as tags
#   docker_label_include = []
#   docker_label_exclude = []
#
#   ## Optional TLS Config
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false

简单修改如下:

[[inputs.docker]]
#指定docker启动的api接口,并指定需要采集那些容器指标
endpoint = "unix:///var/run/docker.sock"
container_names = []

保存修改后重启 telegraf,查看服务状态发现启动失败,出现以下报错:

[inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.so…rmission denied

似乎是无权限访问 docker.sock。检索后大概了解到,docker daemon socket 是 docker 的远程访问 API 之一,分别是 unix 套接字文件、tcp 监听端口和 fd 文件描述符。那么就尝试打开 docker daemon 的 TCP 监听端口吧。
Step1:新建 /etc/docker/daemon.json 文件,添加内容:

{
  "hosts": ["tcp://0.0.0.0:2375", "unix:///var/run/docker.sock"]
}

“unix:///var/run/docker.sock”:unix socket,本地客户端将通过这个来连接 Docker Daemon。
“tcp://0.0.0.0:2375”:tcp socket,表示允许任何远程客户端通过 2375 端口连接 Docker Daemon。

修改配置以后让 docker 重新读取配置文件,并重启 docker 服务

systemctl daemon-reload
systemctl restart docker

然后出现了如下报错

Job for docker.service failed because the control process exited with error code. See “systemctl status docker.service” and “journalctl -xe” for details.

是因为 docker 的 socket 配置出现了冲突,在 docker 的启动入口文件中配置了 host 相关的信息, 而在 docker 的配置文件中也配置了 host 的信息,所以发生了冲突。解决办法:建议将 docker 启动入口文件(/usr/lib/systemd/system/docker.service)中的“ -H fd:// ”删除,再重启 docker 服务即可。

[root()@hzoffice40-152-163 system]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2022-03-18 17:26:37 CST; 8s ago
     Docs: https://docs.docker.com
 Main PID: 287684 (dockerd)
    Tasks: 9
   Memory: 28.4M
   CGroup: /system.slice/docker.service
           └─287684 /usr/bin/dockerd --containerd=/run/containerd/containerd.sock

Mar 18 17:26:36 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:36.796311180+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Mar 18 17:26:36 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:36.807023536+08:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Mar 18 17:26:36 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:36.836291988+08:00" level=info msg="Loading containers: start."
Mar 18 17:26:36 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:36.992154660+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be use...red IP address"
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:37.045762583+08:00" level=info msg="Loading containers: done."
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:37.070431960+08:00" level=info msg="Docker daemon" commit=e2f740d graphdriver(s)=overlay2 version=20.10.10
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:37.070528736+08:00" level=info msg="Daemon has completed initialization"
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org systemd[1]: Started Docker Application Container Engine.
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:37.108059171+08:00" level=info msg="API listen on [::]:2375"
Mar 18 17:26:37 hzoffice40-152-163.iflyos.org dockerd[287684]: time="2022-03-18T17:26:37.113034897+08:00" level=info msg="API listen on /var/run/docker.sock"
Hint: Some lines were ellipsized, use -l to show in full.

Docker 守护进程打开了一个 HTTP Socket,这样才能实现远程通信。

再次编辑 telegraf.conf,修改 [[inputs.docker]] 的 endpoint

endpoint = "tcp://xxx.xxx.xxx.xxx:2375"

重启 telegraf 服务。至此,Telegraf 顺利采集到容器的性能指标。
在这里插入图片描述

您可能感兴趣的与本文相关的镜像

FLUX.1-dev

FLUX.1-dev

图片生成
FLUX

FLUX.1-dev 是一个由 Black Forest Labs 创立的开源 AI 图像生成模型版本,它以其高质量和类似照片的真实感而闻名,并且比其他模型更有效率

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值