k8s部署metrics server资源监控及日志查看

查看资源的常用命令

kubectl get

查看资源信息

kubectl get <资源类型> <资源名称>
kubectl get <资源类型> <资源名称> -o wide  #显示详细信息
kubectl get <资源类型> <资源名称> -o yaml  #导出yaml文件配置

例如

kubectl get node
kubectl get node k8s-master 

查看节点标签信息

#给节点打标签
[root@k8s-master ~]# kubectl label node k8s-node1 node-role.kubernetes.io/node1=

[root@k8s-master ~]# kubectl get node
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   33h   v1.23.0
k8s-node1    Ready    node1                  33h   v1.23.0
k8s-node2    Ready    <none>                 33h   v1.23.0

[root@k8s-master ~]# kubectl get node k8s-master --show-labels

几个常用缩写

kubectl get cs # 查看control-manager和scheduler组件状态
kubectl get po  # 相当于kubectl get pods
kubectl get svc  # 相当于kubectl get service

查看集群中所有API资源信息

kubectl api-resources

kubectl describe

查看资源详细描述

kubectl describe <资源类型> <资源名称>

可以把kubectl getkubectl describe结合使用:

[root@k8s-master ~]# kubectl get po
NAME                         READY   STATUS    RESTARTS   AGE
web-demo1-5ff6d576bb-5292c   1/1     Running   0          4h4m
web-demo1-5ff6d576bb-92gqk   1/1     Running   0          4h4m
web-demo1-5ff6d576bb-d26cf   1/1     Running   0          4h4m
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod web-demo1-5ff6d576bb-5292c -o wide
NAME                         READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES
web-demo1-5ff6d576bb-5292c   1/1     Running   0          4h10m   10.244.169.134   k8s-node2   <none>           <none>
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl describe pod web-demo1-5ff6d576bb-5292c
Name: ...
Namespace: ...
Node: ...
Start Time: ...
Labels: ...
IP: ...
COntainers: ...
Conditions: ...
Volumes: ...
Tolerations: ...


[root@k8s-master ~]# kubectl get node
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   34h   v1.23.0
k8s-node1    Ready    node1                  34h   v1.23.0
k8s-node2    Ready    <none>                 34h   v1.23.0
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get node k8s-node2 -o wide
NAME        STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
k8s-node2   Ready    <none>   34h   v1.23.0   192.168.136.98    <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://20.10.17
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl describe node k8s-node2
Name: ...
Roles: ...
Labels: ...
Taints: ...
Conditions: ...
Addresses: ...
Capacity:
  cpu: ...
  memory: ...
System Info: ...
PodCIDR: ...
Events: ...

使用kubectl top命令可以监控集群资源利用率,但是需要先安装Metrics Server,否则会报错。

[root@k8s-master ~]# kubectl top node
error: Metrics API not available
[root@k8s-master ~]# kubectl top pod
error: Metrics API not available

Metrics Server部署

通过yaml文件部署metrics-server

下载yaml配置文件:

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

修改配置文件components.yaml,添加--kubelet-insecure-tls参数,告诉metrics server不验证kubelet提供的https证书。

containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --kubelet-insecure-tls

部署Metrics Server

kubectl apply -f components.yaml

检查是否部署成功

[root@k8s-master ~]# kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (MissingEndpoints)   8m27s
[root@k8s-master ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request

如果状态为True且能够返回数据,说明Metrics Server运行正常。可以看到,本次部署失败了。

部署报错排查

查看镜像部署状态:

kubectl get po -n kube-system
NAME                                       READY   STATUS             RESTARTS       AGE
...
metrics-server-574849569f-svt2v            0/1     ImagePullBackOff   0              8m8s

查看Pod日志:

[root@k8s-master ~]# kubectl logs metrics-server-574849569f-svt2v -n kube-system
Error from server (BadRequest): container "metrics-server" in pod "metrics-server-574849569f-svt2v" is waiting to start: trying and failing to pull image

结合上面的输出分析,应该是镜像拉取失败了,需要将yaml文件中的镜像下载地址替换为国内的镜像仓库地址。

删除当前出错的Metrics Server部署:

kubectl delete -f components.yaml

替换镜像下载地址

替换镜像下载地址为

image: registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0

重新部署

kubectl apply -f components.yaml

安装成功,Pod状态变为Running,但是一直没有Ready

[root@k8s-master ~]# kubectl get deployment,po -n kube-system
NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-kube-controllers   1/1     1            1           35h
deployment.apps/coredns                   2/2     2            2           35h
deployment.apps/metrics-server            0/1     1            0           87s

NAME                                           READY   STATUS    RESTARTS        AGE
...
pod/metrics-server-798c598bb8-rv827            0/1     Running   0               87s

检查Metrics Server的Pod日志:

[root@k8s-master ~]# kubectl logs metrics-server-798c598bb8-rv827 -n kube-system
E0724 14:26:19.149545       1 scraper.go:139] "Failed to scrape node" err="GET \"https://192.168.x.x:10250/stats/summary?only_cpu_and_memory=true\": bad status code \"403 Forbidden\"" node="k8s-node1"

[root@k8s-master ~]# kubectl describe pod metrics-server-798c598bb8-rv827 -n kube-system
...
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  6m25s                 default-scheduler  Successfully assigned kube-system/metrics-server-798c598bb8-rv827 to k8s-node2
  Normal   Pulling    6m24s                 kubelet            Pulling image "registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0"
  Normal   Pulled     5m51s                 kubelet            Successfully pulled image "registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0" in 32.739844006s
  Normal   Created    5m51s                 kubelet            Created container metrics-server
  Normal   Started    5m51s                 kubelet            Started container metrics-server
  Warning  Unhealthy  75s (x29 over 5m25s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

注意到以下GET请求被拒绝(403 Forbidden

GET \"https://192.168.x.x:10250/stats

原因是我们下载的yaml文件是最新的,是基于0.6.x写的,而镜像下载地址被手动改成了0.5.x。在Metrics Server中,0.5.x的配置文件中需要的权限配置与0.6.x不一样。0.5.x中需要对nodes/stats的访问权限,但是0.6.x中改成了nodes/metrics

删除当前出错的Metrics Server部署:

kubectl delete -f components.yaml

metrics-server资源访问权限修改

检查配置文件中metrics-server角色权限:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch

增加访问nodes/stats的权限,修改为

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes/stats
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch

重新部署

kubectl apply -f components.yaml

[root@k8s-master ~]# kubectl get deployment,po -n kube-system
NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-kube-controllers   1/1     1            1           35h
deployment.apps/coredns                   2/2     2            2           36h
deployment.apps/metrics-server            1/1     1            1           37s

NAME                                           READY   STATUS             RESTARTS        AGE
pod/metrics-server-798c598bb8-j69cc            1/1     Running            0               36s

检查部署是否成功

[root@k8s-master ~]# kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        80s
[root@k8s-master ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"k8s-master","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-master","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":"","node-role.kubernetes.io/master":"","node.kubernetes.io/exclude-from-external-load-balancers":""}},"timestamp":"2022-07-24T15:06:11Z","window":"10s","usage":{"cpu":"201981233n","memory":"1244120Ki"}},{"metadata":{"name":"k8s-node1","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-node1","kubernetes.io/os":"linux","node-role.kubernetes.io/node1":""}},"timestamp":"2022-07-24T15:06:14Z","window":"20s","usage":{"cpu":"86504224n","memory":"846880Ki"}},{"metadata":{"name":"k8s-node2","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-node2","kubernetes.io/os":"linux"}},"timestamp":"2022-07-24T15:06:10Z","window":"10s","usage":{"cpu":"77667531n","memory":"883624Ki"}}]}

查看资源监控

[root@k8s-master ~]# kubectl top node
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-master   208m         10%    1217Mi          70%
k8s-node1    83m          4%     829Mi           48%
k8s-node2    80m          4%     864Mi           50%
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl top pod
NAME                         CPU(cores)   MEMORY(bytes)
web-demo1-5ff6d576bb-5292c   1m           5Mi
web-demo1-5ff6d576bb-92gqk   1m           7Mi
web-demo1-5ff6d576bb-d26cf   1m           3Mi

查看日志的常用命令

kubelet日志

kubelet组件使用systemd管理服务,查看日志的命令为

journalctl -u kubelet -f

或者查看系统日志

tail -f /var/log/messages

Pod日志

其他k8s组件采用容器部署,查看日志的命令为

kubectl get po -n <命名空间>
kubectl logs <Pod名称> -n <命名空间>
kubectl logs <Pod名称> -n <命名空间> -f #实时查看

标准输出在宿主机的路径为

/var/lib/docker/containers/<container-id>/<container-id>-json.log

查看容器ID的方法为

kubectl get pods -o wide
#到Pod所在节点查看容器ID
docker ps | grep web-demo1-5ff6d576bb-92gqk

也可以进入容器终端日志目录查看日志

kubectl exec -it <Pod名称> --bash

参考文章
【1】https://stackoverflow.com/questions/70362216/getting-error-while-implementing-metric-server-inside-the-kubernetes
【2】https://github.com/kubernetes-sigs/metrics-server/releases

<think>好的,用户想在Kubernetes集群上安装和配置Metrics Server监控资源使用情况。我需要根据提供的引用资料整理出步骤。首先,检查Kubernetes版本,确保兼容性,因为不同版本的Metrics Server对应不同的Kubernetes版本。比如,0.5.x需要K8s 1.8+,用户需要确认集群版本是否符合。 接下来是下载和安装。引用2提到可以直接使用kubectl apply在线安装0.5.0版本,对应的YAML文件地址是给定的GitHub链接。但用户可能会遇到镜像拉取问题,尤其是国内环境,可能需要修改镜像地址为国内源,比如将k8s.gcr.io的镜像替换成registry.cn-hangzhou.aliyuncs.com/google_containers。 然后是配置部分,可能需要修改YAML文件中的参数,比如添加--kubelet-insecure-tls参数,绕过证书验证,特别是在使用自签名证书的环境里。引用3提到安装后测试时出现“metrics not available yet”,这可能是因为Metrics Server还没有收集到数据,需要等待一段时间,或者检查配置是否正确。 另外,用户可能需要关闭代理,如引用3提到的,或者检查网络策略是否允许Metrics Server与节点通信。安装完成后,通过kubectl top nodes和kubectl top pods命令验证是否成功。 还要注意,如果安装后仍然报错,可以查看Metrics Server的日志,使用kubectl logs命令来排查问题,比如证书错误或网络连接问题。最后,确保Metrics API已经注册,可以使用kubectl get apiservices检查。 需要把这些步骤整理成清晰的教程,分步骤说明,同时提醒用户常见的错误和解决方法。可能还需要生成相关问题,比如版本兼容性、故障排除等,帮助用户进一步理解。</think>### Kubernetes集群安装Metrics Server教程 #### 一、版本兼容性检查 Metrics Server版本需与Kubernetes版本匹配: - $0.5.x$ 支持 Kubernetes $1.8+$ - $0.6.x$ 需要 Kubernetes $1.19+$[^1] 可通过命令验证集群版本: ```bash kubectl version --short ``` #### 二、安装部署步骤 1. **直接在线安装(推荐)** 执行以下命令安装v0.5.0版本: ```bash kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml ``` 此方式会创建`metrics-server`命名空间及相关资源[^2]。 2. **离线安装(镜像加速)** 若遇到镜像拉取失败,修改YAML文件中的镜像地址: ```yaml spec: containers: - name: metrics-server image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.5.0 ``` 然后手动应用修改后的文件: ```bash kubectl apply -f components.yaml ``` #### 三、关键配置调整 在部署文件中添加以下参数确保正常通信: ```yaml args: - --kubelet-insecure-tls # 忽略证书验证 - --kubelet-preferred-address-types=InternalIP # 优先使用内网IP ``` #### 四、验证安装结果 1. 检查Pod状态: ```bash kubectl get pods -n kube-system | grep metrics-server ``` 正常状态应为`Running` 2. 测试资源监控(需等待1-2分钟数据采集): ```bash kubectl top nodes # 节点资源监控 kubectl top pods # Pod资源监控 ``` 若出现`error: metrics not available yet`,检查网络策略或等待数据同步[^3] #### 五、故障排查 1. **查看日志**: ```bash kubectl logs -n kube-system metrics-server-xxxxxx ``` 2. **检查API服务状态**: ```bash kubectl get apiservices | grep metrics ``` 正常应返回`Available`状态 #### 六、典型问题解决 - **镜像拉取失败**:手动替换为国内镜像源 - **证书验证失败**:添加`--kubelet-insecure-tls`参数 - **网络不通**:检查Calico/Flannel网络插件状态
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

GottdesKrieges

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值