搭建k8s监控问题排查-9093: connect: connection refused

本文聚焦于搭建k8s集群监控时Alertmanager的问题处理。当pod出现启动错误CrashLoopBackOff,意味着正常启动后异常退出。通过describe查看发现活性探测失败、连接遭拒,查看日志和statefulset,发现alertmanager - main未正常准备,还给出参考内容链接。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

搭建k8s集群监控-Alertmanager问题处理

pod启动错误-CrashLoopBackOff

在这里插入图片描述
CrashLoopBackOff说明pod正常启动后有异常退出了

describe查看

在这里插入图片描述

Events:
  Type     Reason     Age                    From                   Message
  ----     ------     ----                   ----                   -------
  Normal   Scheduled  <unknown>              default-scheduler      Successfully assigned monitoring/alertmanager-main-0 to 192.168.6.11
  Normal   Pulled     23m                    kubelet, 192.168.6.11  Container image "quay.mirrors.ustc.edu.cn/prometheus/alertmanager:v0.21.0" already present on machine
  Normal   Created    23m                    kubelet, 192.168.6.11  Created container alertmanager
  Normal   Started    23m                    kubelet, 192.168.6.11  Started container alertmanager
  Normal   Pulled     23m                    kubelet, 192.168.6.11  Container image "quay.mirrors.ustc.edu.cn/prometheus-operator/prometheus-config-reloader:v0.47.0" already present on machine
  Normal   Created    23m                    kubelet, 192.168.6.11  Created container config-reloader
  Normal   Started    23m                    kubelet, 192.168.6.11  Started container config-reloader
  Warning  Unhealthy  23m (x6 over 23m)      kubelet, 192.168.6.11  Liveness probe failed: Get http://172.17.25.5:9093/-/healthy: dial tcp 172.17.25.5:9093: connect: connection refused
  Warning  Unhealthy  8m53s (x148 over 23m)  kubelet, 192.168.6.11  Readiness probe failed: Get http://172.17.25.5:9093/-/ready: dial tcp 172.17.25.5:9093: connect: connection refused
  Warning  BackOff    3m51s (x34 over 12m)   kubelet, 192.168.6.11  Back-off restarting failed container

pod活性探测失败,无法连接,遭到拒绝

查看日志

在这里插入图片描述

[root@k8s-node1 ~]#  kubectl logs pod/alertmanager-main-0 alertmanager -n monitoring
level=info ts=2021-06-02T02:11:49.274Z caller=main.go:216 msg="Starting Alertmanager" version="(version=0.21.0, branch=HEAD, revision=4c6c03ebfe21009c546e4d1e9b92c371d67c021d)"
level=info ts=2021-06-02T02:11:49.274Z caller=main.go:217 build_context="(go=go1.14.4, user=root@dee35927357f, date=20200617-08:54:02)"
[root@k8s-node1 ~]# kubectl logs pod/alertmanager-main-0 config-reloader -n monitoring
level=info ts=2021-06-02T01:57:31.669430944Z caller=main.go:147 msg="Starting prometheus-config-reloader" version="(version=0.47.0, branch=refs/tags/pkg/client/v0.47.0, revision=539108b043e9ecc53c4e044083651e2ebfbd3492)"
level=info ts=2021-06-02T01:57:31.669531061Z caller=main.go:148 build_context="(go=go1.16.3, user=simonpasquier, date=20210413-15:46:43)"
level=info ts=2021-06-02T01:57:31.669664237Z caller=main.go:182 msg="Starting web server for metrics" listen=:8080
level=info ts=2021-06-02T01:57:31.67010267Z caller=reloader.go:214 msg="started watching config file and directories for changes" cfg= out= dirs=/etc/alertmanager/config
level=error ts=2021-06-02T01:57:32.81121586Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:37.811710125Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:42.811117367Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:47.810889541Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"

查看statefulset

在这里插入图片描述
发现alertmanager-main没有正常准备

导出statefulset

kubectl -n monitoring get statefulset.apps/alertmanager-main -o yaml > dump.yaml  
# spec.template.spec添加hostNetwork: true
# 删除原有的statefulset,重新创建
kubectl delete statefulsets.apps alertmanager-main -n monitoring
kubectl apply -f  dump.yaml  

在这里插入图片描述

参考类容:
https://github.com/prometheus-operator/kube-prometheus/issues/653

root@k8snode01-49 ~]# sudo kubeadm init --config=kubeadm-config.yaml [init] Using Kubernetes version: v1.25.0 [preflight] Running pre-flight checks [WARNING Hostname]: hostname "node" could not be reached [WARNING Hostname]: hostname "node": lookup node on 192.168.58.2:53: no such host [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-apiserver:v1.25.0: output: E0625 10:49:37.036283 7096 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-apiserver/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" image="registry.k8s.io/kube-apiserver:v1.25.0" time="2025-06-25T10:49:37+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-apiserver/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-controller-manager:v1.25.0: output: E0625 10:51:34.160244 7175 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-controller-manager:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-controller-manager:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-controller-manager/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" image="registry.k8s.io/kube-controller-manager:v1.25.0" time="2025-06-25T10:51:34+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-controller-manager:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-controller-manager:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-controller-manager/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-scheduler:v1.25.0: output: E0625 10:54:05.266959 7219 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-scheduler:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-scheduler:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-scheduler/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" image="registry.k8s.io/kube-scheduler:v1.25.0" time="2025-06-25T10:54:05+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-scheduler:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-scheduler:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-scheduler/manifests/v1.25.0\": dial tcp 74.125.199.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-proxy:v1.25.0: output: E0625 10:55:55.302433 7311 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-proxy:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-proxy:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-proxy/manifests/v1.25.0\": dial tcp 142.251.188.82:443: connect: connection refused" image="registry.k8s.io/kube-proxy:v1.25.0" time="2025-06-25T10:55:55+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-proxy:v1.25.0\": failed to resolve reference \"registry.k8s.io/kube-proxy:v1.25.0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-proxy/manifests/v1.25.0\": dial tcp 142.251.188.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/pause:3.8: output: E0625 10:57:46.470991 7353 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.8\": dial tcp 142.251.188.82:443: connect: connection refused" image="registry.k8s.io/pause:3.8" time="2025-06-25T10:57:46+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.8\": dial tcp 142.251.188.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/etcd:3.5.4-0: output: E0625 10:59:37.312592 7393 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/etcd:3.5.4-0\": failed to resolve reference \"registry.k8s.io/etcd:3.5.4-0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/etcd/manifests/3.5.4-0\": dial tcp 142.251.188.82:443: connect: connection refused" image="registry.k8s.io/etcd:3.5.4-0" time="2025-06-25T10:59:37+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/etcd:3.5.4-0\": failed to resolve reference \"registry.k8s.io/etcd:3.5.4-0\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/etcd/manifests/3.5.4-0\": dial tcp 142.251.188.82:443: connect: connection refused" , error: exit status 1 [ERROR ImagePull]: failed to pull image registry.k8s.io/coredns/coredns:v1.9.3: output: E0625 11:01:26.957404 7476 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/coredns/coredns:v1.9.3\": failed to resolve reference \"registry.k8s.io/coredns/coredns:v1.9.3\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/coredns/coredns/manifests/v1.9.3\": dial tcp 142.251.188.82:443: connect: connection refused" image="registry.k8s.io/coredns/coredns:v1.9.3" time="2025-06-25T11:01:26+08:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/coredns/coredns:v1.9.3\": failed to resolve reference \"registry.k8s.io/coredns/coredns:v1.9.3\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/coredns/coredns/manifests/v1.9.3\": dial tcp 142.251.188.82:443: connect: connection refused" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher
06-26
; write=0.006 s, sync=0.001 s, total=0.024 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB 2025-07-25 11:25:49.627 UTC [1] LOG: database system is ready to accept connections 2025-07-25 11:25:49.628 UTC [11305] FATAL: could not open file "global/pg_filenode.map": No such file or directory 2025-07-25 11:25:49.628 UTC [11306] FATAL: could not open file "global/pg_filenode.map": No such file or directory 2025-07-25 11:25:49.629 UTC [1] LOG: autovacuum launcher process (PID 11305) exited with exit code 1 2025-07-25 11:25:49.629 UTC [1] LOG: terminating any other active server processes 2025-07-25 11:25:49.630 UTC [1] LOG: background worker "logical replication launcher" (PID 11306) exited with exit code 1 2025-07-25 11:25:49.631 UTC [1] LOG: all server processes terminated; reinitializing 2025-07-25 11:25:49.680 UTC [11307] LOG: database system was interrupted; last known up at 2025-07-25 11:25:49 UTC 2025-07-25 11:25:49.753 UTC [11307] LOG: database system was not properly shut down; automatic recovery in progress 2025-07-25 11:25:49.759 UTC [11307] LOG: invalid record length at 0/1478CF0: wanted 24, got 0 2025-07-25 11:25:49.759 UTC [11318] FATAL: the database system is in recovery mode 2025-07-25 11:25:49.759 UTC [11307] LOG: redo is not required 2025-07-25 11:25:49.765 UTC [11308] LOG: checkpoint starting: end-of-recovery immediate wait 2025-07-25 11:25:49.774 UTC [1] LOG: received smart shutdown request 2025-07-25 11:25:49.789 UTC [11308] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.001 s, total=0.026 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB 2025-07-25 11:25:49.793 UTC [11308] LOG: shutting down 2025-07-25 11:25:49.794 UTC [11308] LOG: checkpoint starting: shutdown immediate 2025-07-25 11:25:49.810 UTC [11308] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.017 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB 2025-07-25 11:25:49.826 UTC [1] LOG: database system is shut down [root@k8s-master1 harbor]# [root@k8s-master1 harbor]# kubectl logs harbor-core-75cd4f54b5-b5stj -n harbor --previous Appending internal tls trust CA to ca-bundle ... find: '/etc/harbor/ssl': No such file or directory Internal tls trust CA appending is Done. init global config instance failed. If you do not use this, just ignore it. open conf/app.conf: no such file or directory 2025-07-25T11:29:52Z [INFO] [/controller/artifact/annotation/parser.go:85]: the annotation parser to parser artifact annotation version v1alpha1 registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cncf.helm.config.v1+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cnab.manifest.v1 registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cnai.model.manifest.v1+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.oci.image.index.v1+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.distribution.manifest.list.v2+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.distribution.manifest.v1+prettyjws registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.oci.image.config.v1+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.container.image.v1+json registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.goharbor.harbor.sbom.v1 registered 2025-07-25T11:29:52Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.wasm.config.v1+json registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/native/adapter.go:36]: the factory for adapter docker-registry registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/aliacr/adapter.go:40]: the factory for adapter ali-acr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/awsecr/adapter.go:44]: the factory for adapter aws-ecr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/azurecr/adapter.go:29]: Factory for adapter azure-acr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/dockerhub/adapter.go:40]: Factory for adapter docker-hub registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/dtr/adapter.go:36]: the factory of dtr adapter was registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/githubcr/adapter.go:43]: the factory for adapter github-ghcr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/gitlab/adapter.go:33]: the factory for adapter gitlab registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/googlegcr/adapter.go:37]: the factory for adapter google-gcr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/huawei/huawei_adapter.go:40]: the factory of Huawei adapter was registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/jfrog/adapter.go:42]: the factory of jfrog artifactory adapter was registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/quay/adapter.go:53]: the factory of Quay adapter was registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/tencentcr/adapter.go:55]: the factory for adapter tencent-tcr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/volcenginecr/adapter.go:40]: the factory for adapter volcengine-cr registered 2025-07-25T11:29:52Z [INFO] [/pkg/reg/adapter/harbor/adaper.go:31]: the factory for adapter harbor registered 2025-07-25T11:29:52Z [INFO] [/core/controllers/base.go:187]: Config path: /etc/core/app.conf 2025-07-25T11:29:52Z [INFO] [/core/main.go:148]: initializing cache ... 2025-07-25T11:29:52Z [INFO] [/core/main.go:167]: initializing configurations... 2025-07-25T11:29:52Z [INFO] [/lib/config/systemconfig.go:178]: key path: /etc/core/key 2025-07-25T11:29:52Z [INFO] [/lib/config/config.go:92]: init secret store 2025-07-25T11:29:52Z [INFO] [/core/main.go:169]: configurations initialization completed 2025-07-25T11:29:52Z [INFO] [/common/dao/base.go:67]: Registering database: type-PostgreSQL host-harbor-database port-5432 database-registry sslmode-"disable" 2025-07-25T11:29:52Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:29:54Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:29:56Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:29:58Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:00Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:02Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:04Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:06Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:08Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:10Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:12Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:14Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:16Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:18Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:20Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:22Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:24Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:26Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:28Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:30Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:32Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:34Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:36Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:38Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:40Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:42Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:44Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:46Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:48Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:50Z [ERROR] [/common/utils/utils.go:108]: failed to connect to tcp://harbor-database:5432, retry after 2 seconds :dial tcp 10.0.192.44:5432: connect: connection refused 2025-07-25T11:30:52Z [FATAL] [/core/main.go:190]: failed to initialize database: failed to connect to tcp:harbor-database:5432 after 60 seconds [root@k8s-master1 harbor]# kubectl logs harbor-jobservice-6457b57477-7qgt2 -n harbor --previous Appending internal tls trust CA to ca-bundle ... find: '/etc/harbor/ssl': No such file or directory Internal tls trust CA appending is Done. 2025-07-25T11:30:54Z [INFO] [/controller/artifact/annotation/parser.go:85]: the annotation parser to parser artifact annotation version v1alpha1 registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cncf.helm.config.v1+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cnab.manifest.v1 registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.cnai.model.manifest.v1+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.oci.image.index.v1+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.distribution.manifest.list.v2+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.distribution.manifest.v1+prettyjws registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.oci.image.config.v1+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.docker.container.image.v1+json registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.goharbor.harbor.sbom.v1 registered 2025-07-25T11:30:54Z [INFO] [/controller/artifact/processor/processor.go:59]: the processor to process media type application/vnd.wasm.config.v1+json registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/native/adapter.go:36]: the factory for adapter docker-registry registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/aliacr/adapter.go:40]: the factory for adapter ali-acr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/awsecr/adapter.go:44]: the factory for adapter aws-ecr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/azurecr/adapter.go:29]: Factory for adapter azure-acr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/dockerhub/adapter.go:40]: Factory for adapter docker-hub registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/dtr/adapter.go:36]: the factory of dtr adapter was registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/githubcr/adapter.go:43]: the factory for adapter github-ghcr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/gitlab/adapter.go:33]: the factory for adapter gitlab registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/googlegcr/adapter.go:37]: the factory for adapter google-gcr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/huawei/huawei_adapter.go:40]: the factory of Huawei adapter was registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/jfrog/adapter.go:42]: the factory of jfrog artifactory adapter was registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/quay/adapter.go:53]: the factory of Quay adapter was registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/tencentcr/adapter.go:55]: the factory for adapter tencent-tcr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/volcenginecr/adapter.go:40]: the factory for adapter volcengine-cr registered 2025-07-25T11:30:54Z [INFO] [/pkg/reg/adapter/harbor/adaper.go:31]: the factory for adapter harbor registered 2025-07-25T11:30:54Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://harbor-core:80/api/v2.0/internalconfig 2025-07-25T11:30:54Z [ERROR] [/pkg/config/rest/rest.go:50]: Failed on load rest config err:Get "http://harbor-core:80/api/v2.0/internalconfig": dial tcp 10.5.232.220:80: connect: connection refused, url:http://harbor-core:80/api/v2.0/internalconfig panic: failed to load configuration, error: failed to load rest config goroutine 1 [running]: main.main() /harbor/src/jobservice/main.go:46 +0x3ae [root@k8s-master1 harbor]# 查看日子输出这些,我应该如何解决呢,目前正在用kubernetes集群搭建harbor仓库,想让root@k8s-master1 harbor]# kubectl get pod -n harbor NAME READY STATUS RESTARTS AGE harbor-core-75cd4f54b5-b5stj 0/1 CrashLoopBackOff 11 (118s ago) 26m harbor-database-0 0/1 Running 4 (3m16s ago) 26m harbor-jobservice-6457b57477-7qgt2 0/1 CrashLoopBackOff 13 (3m23s ago) 26m harbor-portal-5b6b5f7494-gcc8n 1/1 Running 1 (10m ago) 26m harbor-redis-0 1/1 Running 1 (10m ago) 26m harbor-registry-5fb967b497-d4r4r 2/2 Running 2 (10m ago) 26m harbor-trivy-0 1/1 Running 1 (10m ago) 26m全是running
最新发布
07-26
### 问题分析 此错误表明 `kubelet` 尝试向 Kubernetes API Server 注册节点时无法建立 TCP 连接。具体表现为连接被拒绝 (`connection refused`),这通常意味着目标服务未正常运行或网络配置存在问题。 以下是可能的原因及其解决方案: --- ### 可能原因及解决办法 #### 1. **API Server 未启动** 如果 API Server 没有正确启动,则会返回 `connection refused` 错误。可以通过以下命令检查 API Server 的状态: ```bash ps -aux | grep kube-apiserver ``` 如果没有找到相关进程,说明 API Server 已经停止工作[^4]。 **解决方法:** 尝试手动启动 API Server 并查看日志以排查问题: ```bash journalctl -u kube-apiserver.service ``` --- #### 2. **Etcd 数据库不可用** API Server 使用 Etcd 存储集群数据。如果 Etcd 不可用,API Server 将无法正常工作。可以验证 Etcd 是否正常运行: ```bash curl -s http://localhost:2379/health ``` 如果返回健康状态则表示 Etcd 正常;否则可能是 Etcd 出现问题[^2]。 **解决方法:** 重新启动 Etcd 或修复其配置文件 `/etc/kubernetes/manifests/kube-apiserver.yaml` 中的相关设置。 --- #### 3. **防火墙阻止访问** 某些情况下,防火墙可能会阻止到 API Server 端口 (默认为 6443) 的流量。可以通过以下命令检查是否存在阻塞规则: ```bash sudo iptables -L | grep 6443 firewall-cmd --list-all ``` **解决方法:** 允许端口 6443 上的入站和出站流量: ```bash sudo firewall-cmd --add-port=6443/tcp --permanent sudo firewall-cmd --reload ``` --- #### 4. **证书过期** Kubernetes 使用 TLS 加密通信,如果证书已过期也会导致类似的错误。可通过以下方式检查证书有效期: ```bash openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not ``` 如果发现证书已经过期,需按照官方文档更新证书[^5]。 **解决方法:** 执行以下操作之一: - 手动生成新证书并替换旧证书; - 使用 kubeadm 提供的一键更新工具刷新证书。 --- #### 5. **DNS 解析问题** 当主机名解析失败时也可能引发此类错误。确认当前节点能够成功解析 Master 节点地址: ```bash ping master001 nslookup master001 ``` **解决方法:** 编辑 `/etc/hosts` 文件添加正确的映射关系: ```plaintext 192.168.150.111 master001 ``` --- ### 总结代码片段 以下是一个简单的调试脚本用于快速定位上述问题: ```bash #!/bin/bash echo "Checking API Server Status..." ps aux | grep kube-apiserver || echo "[ERROR] API Server is not running." echo "Verifying etcd health..." curl -s http://localhost:2379/health && echo "[OK] Etcd is healthy." || echo "[ERROR] Etcd may be down." echo "Testing port connectivity..." nc -zv localhost 6443 && echo "[OK] Port 6443 is open." || echo "[ERROR] Connection to port 6443 failed." echo "Inspecting certificates expiration date..." openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not ``` --- ###
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值