报错情况
[root@master ~]# kubeeasy install kubernetes --master 192.168.100.10 --worker 192.168.100.20 --user root --password 000000 --version 1.22.1 --offline-file /opt/kubernetes.tar.gz
[2024-03-13 00:57:44] INFO: [start] bash kubeeasy install kubernetes --master 192.168.100.10 --worker 192.168.100.20 --user root --password ****** --version 1.22.1 --offline-file /opt/kubernetes.tar.gz
[2024-03-13 00:57:44] INFO: [check] sshpass command exists.
[2024-03-13 00:57:44] INFO: [check] rsync command exists.
[2024-03-13 00:57:44] INFO: [check] ssh 192.168.100.10 connection succeeded.
[2024-03-13 00:57:44] INFO: [check] ssh 192.168.100.20 connection succeeded.
[2024-03-13 00:57:44] INFO: [offline] unzip offline package on local.
[2024-03-13 00:57:54] INFO: [offline] unzip offline package succeeded.
[2024-03-13 00:57:54] INFO: [offline] master 192.168.100.10: load offline file
[2024-03-13 00:57:54] INFO: [offline] load offline file to 192.168.100.10 succeeded.
[2024-03-13 00:57:54] INFO: [offline] master 192.168.100.10: disable the firewall
[2024-03-13 00:57:54] INFO: [offline] 192.168.100.10: disable the firewall succeeded.
[2024-03-13 00:57:54] INFO: [offline] worker 192.168.100.20: load offline file
[2024-03-13 00:57:55] INFO: [offline] load offline file to 192.168.100.20 succeeded.
[2024-03-13 00:57:55] INFO: [offline] worker 192.168.100.20: disable the firewall
[2024-03-13 00:57:55] INFO: [offline] 192.168.100.20: disable the firewall succeeded.
[2024-03-13 00:57:55] INFO: [get] Get 192.168.100.10 InternalIP.
[2024-03-13 00:57:56] INFO: [result] get MGMT_NODE_IP value succeeded.
[2024-03-13 00:57:56] INFO: [result] MGMT_NODE_IP is 192.168.100.10
[2024-03-13 00:57:56] INFO: [init] master: 192.168.100.10
[2024-03-13 00:57:57] INFO: [init] init master 192.168.100.10 succeeded.
[2024-03-13 00:57:57] INFO: [init] master: 192.168.100.10 set hostname and hosts
[2024-03-13 00:57:58] INFO: [init] 192.168.100.10 set hostname and hosts succeeded.
[2024-03-13 00:57:58] INFO: [init] worker: 192.168.100.20
[2024-03-13 00:58:00] INFO: [init] init worker 192.168.100.20 succeeded.
[2024-03-13 00:58:00] INFO: [init] master: 192.168.100.20 set hostname and hosts
[2024-03-13 00:58:00] INFO: [init] 192.168.100.20 set hostname and hosts succeeded.
[2024-03-13 00:58:00] INFO: [install] install docker on 192.168.100.10.
[2024-03-13 00:58:00] ERROR: [install] install docker on 192.168.100.10 failed.
[2024-03-13 00:58:00] INFO: [install] install kube on 192.168.100.10
[2024-03-13 00:58:01] INFO: [install] install kube on 192.168.100.10 succeeded.
[2024-03-13 00:58:01] INFO: [install] install docker on 192.168.100.20.
[2024-03-13 00:58:01] ERROR: [install] install docker on 192.168.100.20 failed.
[2024-03-13 00:58:01] INFO: [install] install kube on 192.168.100.20
[2024-03-13 00:58:02] INFO: [install] install kube on 192.168.100.20 succeeded.
[2024-03-13 00:58:02] INFO: [kubeadm init] kubeadm init on 192.168.100.10
[2024-03-13 00:58:02] INFO: [kubeadm init] 192.168.100.10: set kubeadm-config.yaml
[2024-03-13 00:58:02] INFO: [kubeadm init] 192.168.100.10: set kubeadm-config.yaml succeeded.
[2024-03-13 00:58:02] INFO: [kubeadm init] 192.168.100.10: kubeadm init start.
[2024-03-13 00:58:03] ERROR: [kubeadm init] 192.168.100.10: kubeadm init failed.
ERROR Summary:
[2024-03-13 00:58:00] ERROR: [install] install docker on 192.168.100.10 failed.
[2024-03-13 00:58:01] ERROR: [install] install docker on 192.168.100.20 failed.
[2024-03-13 00:58:03] ERROR: [kubeadm init] 192.168.100.10: kubeadm init failed.
See detailed log >> /var/log/kubeinstall.log
[root@master ~]#
还有这个,这个不是主要原因,主要原因还是根本就安装没有docker服务
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.1-docker)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
, error: exit status 1
[ERROR Service-Docker]: docker service is not active, please run 'systemctl start docker.service'
[ERROR SystemVerification]: error verifying Docker info: "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[2024-03-13 18:15:16] [31mERROR: [0m[kubeadm init] 192.168.100.100: kubeadm init failed.
排错步骤
我们直接使用SecureFX
把日志/var/log/kubeinstall.log
拉到本地看
然后我发现这里好像就是报错的地方:
这个文件应该在第一步安装依赖时就有了,但是这里却没有
于是往上找containerd
服务
发现这个containerd
服务因为依赖问题安装被跳过了(怪不得没有生成)
而且docker居然也在这里面
这里插一下docker的状态:有docker命令但是没有docker服务
向上继续找发现了这个很突兀的东西,跟别人不一样,说缺少依赖libseccomp
然后我们搜索Processing Dependency
,发现跳过安装的都是缺少这个libseccomp
一开始我以为是安装程序出现了问题,但是没想到这个libseccomp
是系统自带的
这是一台刚刚创建好的虚拟机,连他都有libseccomp
出现问题的虚拟机因为未知原因,居然没有libseccomp
解决方法
两个方法解决:
1,手动安装libseccomp
(适合不想从头再来的)
2,直接重新建一个虚拟机即可
方法一步骤:
libseccomp-2.3.1-4离线包下载
上传至出问题的虚拟机,并安装它
[root@localhost ~]# ls
anaconda-ks.cfg libseccomp-2.3.1-4.el7.x86_64.rpm
[root@localhost ~]# yum install -y libseccomp-2.3.1-4.el7.x86_64.rpm
再次开始部署集群
集群搭建步骤
注意:
在之前说的缺少依赖导致跳过安装的情况出现于第一步安装依赖的时候
因此,必须先执行安装依赖的步骤
然后在第一步安装依赖后,可以去出问题的节点看一下有没有/etc/containerd/config.toml
生成
看见这个基本上就没有问题了,再有其他问题按照排错步骤找基本都能解决
接着运行安装kubernetes
,结果就真的无报错了:
有时候也不是缺的这个依赖,但是本质问题也是缺依赖
缺依赖排错