cetos7搭建部署k8s 版本1.28

目录

系统最低要求

基础准备工作--三台主机

k8s集群安装

在master上安装 calico 网络插件

K8s集群加入node工作节点

k8s集群搭建完成

更改contianerd 运行时容器 为cri-docker

安装helm

helm常用指令

彻底删除calico网络插件

POD命令

namespace命令

ERROR:

cir-docke报错


系统最低要求

内存最少是4G  cpu个数最少两个  

IP内存CPU主机名
192.168.231.12044K1  
192.168.231.12144K2
192.168.231.12244K3

基础准备工作--三台主机

关闭防火墙

systemctl stop firewalled

关闭swap   永久关闭

swapoff -a 

vim  /etc/fstab

#/dev/mapper/centos-swap swap                    swap    defaults        0 0

设置主机名称  主机配置文件hosts

hostnamectl set-hostname k1
hostnamectl set-hostname k2
hostnamectl set-hostname k3

vim /etc/hosts

192.168.241.129 k2
192.168.241.128 k1
192.168.241.130 k3

内核参数设置

vim /etc/sysctl.d/k8s.conf

net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1

sysctl -p  

配置同步时间

yum install ntpdate
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
systemctl enable ntpdate
systemctl start ntpdate

systemctl status ntpdate



systemctl start crond
systemctl enable crond
 
# 配置计划任务,每5分钟同步一次
$ crontab -e
*/5 * * * * /usr/sbin/ntpdate cn.pool.ntp.org
 
crontab -l 表示列出所有的定时任务
时间正确 写入系统时间
date  

hwclock --systohc

安装containerd服务  # 如果选择docker 作为运行时容器则不装。

yum install -y containerd.io-1.6.27

配置IPVS 功能

3台机器操作

yum install ipset ipvsadm -y

#配置内核ipvs功能
vim  /etc/sysconfig/modules/ipvs.modules

#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack


写入文件 wq 保存退出



chmod +x /etc/sysconfig/modules/ipvs.modules


/bin/bash /etc/sysconfig/modules/ipvs.modules


检查模块是否安装
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

ip_vs_sh               12688  0 
ip_vs_wrr              12697  0 
ip_vs_rr               12600  0 
ip_vs                 145458  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack_ipv4      15053  7 
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
nf_conntrack          139264  8 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c              12644  4 xfs,ip_vs,nf_nat,nf_conntrack

安装docker 

#安装源 epel 
yum install epel-release  yum-utils
#docker官网源
yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo

# docker 官方源不通或者慢  则使用阿里源
yum-config-manager \
    --add-repo \
    https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

#docker安装
yum install docker-ce -y

# 启动并开机启动

systemctl enable docker --now

systemctl start docker

systemctl status docker

  

更改contianerd 运行时容器 为cri-docker (使用默认不需要) 

安装2023-1-27最新版本
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.9/cri-dockerd-0.3.9-3.el7.x86_64.rpm


# 三台机器上面都安装
rpm -ivh cri-dockerd-0.3.9-3.el7.x86_64.rpm


# systemctl daemon-reload

# systemctl enable cri-docker --now
Created symlink from /etc/systemd/system/multi-user.target.wants/cri-docker.service to /usr/lib/systemd/system/cri-docker.service.

#systemctl is-active cri-docker
active


# systemctl status  cri-docker

拉取cri-docker镜像 

 集群安装好执行执行 没有则不执行

kubeadm config images pull --cri-socket unix:///var/run/cri-dockerd.sock


#被墙 请选择阿里源
kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers --cri-socket unix:///run/cri-dockerd.sock

k8s集群安装

配置k8s yum源   目前1.28的稳定版本为1.28.6 可从拉取的仓库看到

vim /etc/yum.repos.d/k8s.repo

[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni


上述源不好用 请用阿里云的源    选择其中一个即可
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

安装  kubelet kubeadm kubectl


yum install -y kubelet kubeadm kubectl

初始化集群

在master上面执行   如果是用cri-docker作为运行时镜像 请先安装cri-docker

先测试预拉取一下k8s镜像 确保镜像仓库是可用的

kubeadm config images pull --image-repository registry.aliyuncs.com/google_containers --cri-socket=unix:///var/run/cri-dockerd.sock 
I0129 12:38:25.026132   60167 version.go:256] remote version is much newer: v1.29.1; falling back to: stable-1.28
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.6
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.6
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.6
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.28.6
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1





不指定源仓库就是拉取官网的  选择其中一个执行
kubeadm config images pull  --kubernetes-version=1.28.2   --cri-socket unix:///run/cri-dockerd.sock
[config/images] Pulled registry.k8s.io/kube-apiserver:v1.28.2
[config/images] Pulled registry.k8s.io/kube-controller-manager:v1.28.2
[config/images] Pulled registry.k8s.io/kube-scheduler:v1.28.2
[config/images] Pulled registry.k8s.io/kube-proxy:v1.28.2
[config/images] Pulled registry.k8s.io/pause:3.9
[config/images] Pulled registry.k8s.io/etcd:3.5.9-0
[config/images] Pulled registry.k8s.io/coredns/coredns:v1.10.1

初始化 集群 

kubeadm  init --kubernetes-version=1.28.2 --image-repository registry.aliyuncs.com/google_containers  --apiserver-advertise-address=192.168.241.128 --service-cidr=10.96.0.0/12  --pod-network-cidr=10.244.0.0/16  --cri-socket=unix:///run/cri-dockerd.sock





# 不指定阿里云仓库位置 使用默认的k8s官方源
kubeadm init --kubernetes-version=1.28.2 --apiserver-advertise-address=192.168.241.128 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///run/cri-dockerd.sock

初始化成功后 

在 K1上面执行 

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config


#如果是root 用户也可执行如下 一样的命令 

cp -i /etc/kubernetes/admin.conf /root/.kube/config

chown 0:0 /root/.kube/config

calico 网络插件 在master上安装

calico 3.26.1 
wget --no-check-certificate https://docs.projectcalico.org/manifests/calico.yaml



kubectl apply -f calico.yaml

配置calico网络参数 

vim calico.yaml

 修改 初始化指定的参数 --pod-network-cidr=10.244.0.0/16   和calico.yaml 中
 CALICO_IPV4POOL_CIDR  值是相同的 

默认是注释的 

- name: CALICO_IPV4POOL_CIDR
   value: "10.244.0.0/16"



修改完配置重新部署 
kubectl replace -f calico.yaml

安装 

kubectl apply -f calico.yaml

没有安装calico  组件之前  node 的状态是 NotReady  
 

获取节点信息 验证是否安装成功  

K8s集群加入node工作节点

即把 K2 K3主机加入到k8s集群中

生成 K1主节点的token 

# 在K1 master上面生成  
kubeadm token create --print-join-command

# 创建一个永不过期的token 
kubeadm token create --ttl 0 --print-join-command



#  在k2 k3 wokenode上执行下面命令  加入k8s集群 
生成的结果如下 不过期的token 

kubeadm join 192.168.241.128:6443 --token 5ajtxi.sx49u7jyygnmw0c4 --discovery-token-ca-cert-hash sha256:ada6bf229e93d346c4af69f953c96040c12c30b1f2b10eb2993052fbfaa48651



#如果采用 docker作为运行时容器  K2上面执行 加入进集群
kubeadm join 192.168.241.128:6443 --token iym0rs.q1at20rk4knx1b1x --discovery-token-ca-cert-hash sha256:e61313ccd385434aadf56b5fa2060e4ece95f442a6b64fbc54a8d522f98ce489 --cri-socket unix:///var/run/cri-dockerd.sock

k8s集群搭建完成

kubectl get nodes   #master上执行

重新初始化

#彻底清理 kubeadm reset  参考本文报错3

kubeadm reset


kubeadm  init --apiserver-advertise-address=192.168.241.128 --token-ttl=0 --cri-socket unix:///run/cri-dockerd.sock





可以看到运行时容器已修改   CONTAINER-RUNTIME
[root@k1 lib]# kubectl get node  -o wide
NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
k1     Ready    control-plane   67m   v1.28.6   192.168.241.128   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://24.0.7
k2     Ready    <none>          31m   v1.28.6   192.168.241.129   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://24.0.7
k3     Ready    <none>          29m   v1.28.6   192.168.241.130   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://24.0.7

删除k8s node节点



kubectl get nodes
NAME   STATUS     ROLES           AGE   VERSION
k1     Ready      control-plane   25h   v1.28.2
k2     NotReady   <none>          24h   v1.28.2
k3     NotReady   <none>          24h   v1.28.2


删除 节点 
kubectl delete k2


清空数据

kubeadm reset

cri-docker 镜像 清空数据
kubeadm reset --cri-socket=unix:///run/cri-dockerd.sock

安装helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash


#查看helm 版本
helm version

helm常用指令

# helm 官方仓库
helm repo add brigade https://brigadecore.github.io/charts

# 添加ingress 仓库
helm repo add ng https://kubernetes.github.io/ingress-nginx


# 删除仓库
helm repo remove stable


# 查看仓库
helm repo list

#查看 ingress的所有版本
helm search repo ingress-nginx/ingress-nginx -l

#搜索ingress

helm search repo ingress
NAME            	CHART VERSION	APP VERSION	DESCRIPTION                                       
ng/ingress-nginx	4.9.0        	1.9.5      	Ingress controller for Kubernetes using NGINX a...


#下载下来   会存储在本地
helm pull ng/ingress-nginx



查看安装失败的服务
helm -n ingress-nginx  ls -a


删除安装失败的服务
helm -n ingress-nginx  delete ingress-nginx

彻底删除calico网络插件

# master上面执行 确保有calico 安装文件 
kubeclt delete -f calico.yaml



#全部执行 
#删除网卡 如果是IpIp模式    网卡名为tun10   可通过ifconfig查看
modprobe -r ipip

#全部执行
# 删除calico 配置文件  相关的calico的配置文件
cd /etc/cni/net.d/  && rm -rf 10-calico.conflist  calico-kubeconfig


#全部执行 


systemctl restart kubelet


systemctl restart docker

POD命令

删除pod

1.
# 删除对应namespace下的pod
kubectl delete pod ingress-nginx-admission-patch-j4s7z -n ingress-nginx

查看pod日志
kubectl -n kube-system describe po calico-kube-controllers-7ddc4f45bc-gb7mp



2.查看deployment 信息


kubectl get deployment -n ingress-nginx

3. 删除deployment信息
kubectl delete deployment ingress-nginx-controller  -n ingress-nginx

4, 在删除pod信息
kubectl delete pod ingress-nginx-controller-6fcf745c45-24k9b  -n ingress-nginx

namespace命令

创建

查看所有的namspace

kubectl get namespaces


#namespace下的所有pod    这里的namespace是kube-system

kubectl get pods -n kube-system

查看 指定namespace(空间是ingress-nginx) 下所有的pod 

kubectl get all -n ingress-nginx

查看加入节点的命令

kubeadm token create --print-join-command

#生成一个永不过期的token  
kubeadm token create --ttl 0 --print-join-command

k8s集群上删除一个节点

kubectl  delete nodes k3

在节点上清空数据

kubeadm reset

把pod调度到指定节点

kubectl label node master1 ingress=true

ERROR:

报错1:

K2节点加入报错 

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR CRI]: container runtime is not running: output: time="2024-01-18T10:41:00-05:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决办法:

  在master端 重启 systemctl restart  containerd 

在node 端    删除  rm /etc/containerd/config.toml        重启  systemctl restart  containerd 

加入之后 如果没有 ready  在重启 master端的node

报错2:

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
kubeadm reset       在节点上  清空节点数据 

重新加入 

报错3 重启服务器导致的报错

E0126 20:46:32.845126    7509 memcache.go:265] couldn't get current server API group list: Get "https://192.168.241.128:6443/api?timeout=32s": dial tcp 192.168.241.128:6443: connect: connection refused
E0126 20:46:32.845281    7509 memcache.go:265] couldn't get current server API group list: Get "https://192.168.241.128:6443/api?timeout=32s": dial tcp 192.168.241.128:6443: connect: connection refused
E0126 20:46:32.853384    7509 memcache.go:265] couldn't get current server API group list: Get "https://192.168.241.128:6443/api?timeout=32s": dial tcp 192.168.241.128:6443: connect: connection refused
E0126 20:46:32.853530    7509 memcache.go:265] couldn't get current server API group list: Get "https://192.168.241.128:6443/api?timeout=32s": dial tcp 192.168.241.128:6443: connect: connection refused
E0126 20:46:32.855105    7509 memcache.go:265] couldn't get current server API group list: Get "https://192.168.241.128:6443/api?timeout=32s": dial tcp 192.168.241.128:6443: connect: connection refused
The connection to the server 192.168.241.128:6443 was refused - did you specify the right host or port?

解决过程:

lsof -i:6443 端口 发现端口没有服务

查看日志发现 无法注册k1 节点 

journalctl -u kubelet

Attempting to register node" node="k1"
Jan 26 21:01:54 k1 kubelet[2209]: E0126 21:01:54.712910    2209 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://192.168.241.128:6443/api/v1/nodes\": dial tcp 192.168.241.128:6443: connect: connection refused" node="k1"
Jan 26 21:01:55 k1 kubelet[2209]: E0126 21:01:55.677638    2209 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.241.128:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k1?timeout=10s\": dial tcp 192.168.241.128:6443: connect: connection refused" interval="7s"
Jan 26 21:01:57 k1 kubelet[2209]: E0126 21:01:57.385561    2209 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"k1\" not found"
Jan 26 21:02:01 k1 kubelet[2209]: E0126 21:02:01.236048    2209 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-scheduler-k1.17adf043686e3b00", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-scheduler-k1", UID:"84a01f6320ea2c31160c7acf2d558c4c", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"SandboxChanged", Message:"Pod sandbox changed, it will be killed and re-created.", Source:v1.EventSource{Component:"kubelet", Host:"k1"}, FirstTimestamp:time.Date(2024, time.January, 26, 19, 46, 46, 148815616, time.Local), LastTimestamp:time.Date(2024, time.January, 26, 19, 46, 46, 148815616, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"k1"}': 'Post "https://192.168.241.128:6443/api/v1/namespaces/kube-system/events": dial tcp 192.168.241.128:6443: connect: connection refused'(may retry after sleeping)

docker ps -a  查看容器状态   发现容器已经全部down掉

重启docker容器  docker restart 容器id 

重启k8s服务 三台机器 systemctl status kubelet

此时可以看见 6443端口服务已经启动   但是 k1还是连接拒绝

重置kubeadm reset      报错解决

   

报错3:重新加入节点报错    

查看日志  journalctl -u kubelet

StopPodSandbox from runtime service failed" err="rpc error

master节点显示如下:因为没有节点加入的原因

kube-system   coredns-5dd5756b68-lqkm5     0/1     ContainerCreating   0          23m
kube-system   coredns-5dd5756b68-s5czv     0/1     ContainerCreating   0          23m

加入节点时报错:

error execution phase kubelet-start: error uploading crisocket: Unauthorized

查看节点日志

journalctl -u kubelet

Jan 26 22:52:48 k2 kubelet[3986]: E0126 22:52:48.392703    3986 file_linux.go:61] "Unable to read config path" err="path does not exist, ignoring" path="/etc

解决办法:彻底清除kubeadm init  在系统的残留

kubeadm reset  
如果使用cri-docker  需要指定cri
全部 删除k8s拉取的镜像文件 
kubeadm reset --cri-socket unix:///var/run/cri-dockerd.sock


#三台机器上面执行

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

systemctl stop kubelet

systemctl stop docker

rm -rf /root/.kube
rm -rf /var/lib/cni/*
rm -rf /var/lib/calico/
rm -rf /var/lib/cri-dockerd/sandbox
rm -rf /var/lib/kubelet/*
rm -rf /var/lib/etcd/
rm -rf /etc/cni/*

rm /etc/kubernetes/pki


ifconfig docker0 down

ip link delete cni0


systemctl start docker
systemctl start  cri-docker
systemctl status  cri-docker
systemctl status  docker

完成以后 


重新kubeadm init

重新加入发现node 节点显示 not ready 状态  

kubectl describe node k1 | grep Taints
Taints:             node.kubernetes.io/not-ready:NoSchedule

pod 一直显示 pending 状态

[root@k1 ~]# kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-7ddc4f45bc-gb7mp   0/1     Pending   0          3s
kube-system   calico-node-fvj7h                          1/1     Running   0          10m
kube-system   calico-node-skgfb                          1/1     Running   0          10m
kube-system   calico-node-tcfjc                          1/1     Running   0          10m
kube-system   coredns-5dd5756b68-hsnwn                   0/1     Pending   0          12m
kube-system   coredns-5dd5756b68-vxw2b                   0/1     Pending   0          12m

查看 calico pod的日志 

kubectl -n kube-system describe po calico-kube-controllers-7ddc4f45bc-gb7mp

显示 not Schedule  

卸载掉污点节点  

 kubectl taint nodes k1 node.kubernetes.io/not-ready-


# 重启 kubelet 服务 docker  containerd服务

systemctl restart kubelet

systemctl restart docker

systemctl restart containerd 

显示正常  

报错:calico coredns 产生crashloopbackoff 可能得原因

发现几台机器的时区时间不同步 
cp /usr/share/zoneinfo/Asia/Dubai /etc/localtime 

systemctl restart ntpdate

各个机器同步时间之后  calico 和coredns  恢复正常 

报错4:calico 时不时的  CrashLoopBackOff

查看报错calico   

kubectl -n kube-system describe po calico-kube-controllers-7ddc4f45bc-bvzf2

kubelet  Back-off restarting failed container calico-kube-controllers in pod calico-kube-controllers-

kubelet  Readiness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout

解决办法:铲掉集群 重新安装  因为在kubeadm  init 阶段没有指定--service-cidr=10.96.0.0/12  --pod-network-cidr=10.244.0.0/16    安装 calico.yaml的时候没有修改 - name: CALICO_IPV4POOL_CIDR
   value: "10.244.0.0/16"  指定分配网段地址 

报错:

Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/cri-dockerd.sock
To see the stack trace of this error execute with --v=5 or higher

解决办法: 没有指定 CRI  执行的命令 指定起CRI   --cri-socket unix:///var/run/cri-dockerd.sock

-v2 查看具体报错信息 

报错:加入节点报错 

[discovery] Failed to request cluster-info, will try again: Get "https://192.168.241.128:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 192.168.241.128:6443: connect: no route to host
I0131 09:45:19.026519    8529 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://192.168.241.128:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 192.168.241.128:6443: connect: no route to host

解决办法:k1 master 节点 6443端口不通 防护墙规则拦截 开放644端口

firewall-cmd --permanent --add-port=6443/tcp

firewall-cmd --reload

报错:加入节点报错

Failed to request cluster-info, will try again: Get "https://192.168.241.128:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-01-31T10:53:06+08:00 is before 2024-01-31T13:42:55Z

报错:版本不一致导致 k8s版本1.28.2  calico 版本3.20.1

error: resource mapping not found for name: "calico-kube-controllers" namespace: "kube-system" from "calico.yaml": no matches for kind "PodDisruptionBudget"in version "policy/v1beta1"

报错: 其中有一个calico 的node状态 节点长期为 0/1 

NAMESPACE     NAME                                       READY   STATUS              RES
kube-system   calico-kube-controllers-7ddc4f45bc-l8fgm   1/1     Running             0               3h59m
kube-system   calico-node-97jxl                          1/1     Running             0               3h59m
kube-system   calico-node-kxx2g                          0/1     Running             1 (3h42m ago)   3h44m
kube-system   calico-node-swf7r                          1/1     Running             0  

添加 新节点之后 calico node 节点状态全部不是ready 状态
kube-system   calico-kube-controllers-7ddc4f45bc-l8fgm   1/1     Running             0             22h
kube-system   calico-node-97jxl                          0/1     Running             0             22h
kube-system   calico-node-kxx2g                          0/1     Running             1 (22h ago)   22h
kube-system   calico-node-swf7r                          0/1     Running             0             22h
kube-system   coredns-5dd5756b68-2w6gl                   1/1     Running             0             23h
kube-system   coredns-5dd5756b68-b6sc8                   1/1     Running             0             23h

解决办法: 

cir-docke报错

报错1      执行 kubeadm reset  

Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/cri-dockerd.sock
To see the stack trace of this error execute with --v=5 or higher

 systemctl stop containerd

 systemctl stop kubelet

 systemctl stop docker 

然后再执行 kubeadm reset  

### 部署 Docker 与 Kubernetes 1.28 #### 准备工作 为了成功部署 Docker 并将其配置为 Kubernetes容器运行时,需确保操作系统环境满足最低要求。通常推荐使用 Linux 发行版如 Ubuntu 或 CentOS。 #### 安装 Docker 由于 Kubernetes 默认采用 Docker 作为其容器运行时,在开始之前应先完成 Docker 的安装: ```bash # 更新软件包索引并安装必要的依赖项 sudo apt-get update && sudo apt-get install -y \ ca-certificates \ curl \ gnupg \ lsb-fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg # 设置稳定版本仓库 echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # 再次更新软件包列表并安装最新版本的 Docker Engine sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli containerd.io ``` 上述命令适用于基于 Debian 的系统;对于其他发行版,请访问[Docker官方文档][^1]获取对应指导。 #### 启动并验证 Docker 服务 启动 Docker 服务,并设置开机自启: ```bash sudo systemctl start docker sudo systemctl enable docker ``` 测试 Docker 是否正常工作: ```bash sudo docker run hello-world ``` #### 安装 Kubernetes 组件 接下来准备安装 kubeadm、kubelet 和 kubectl 这三个核心组件来构建集群。考虑到稳定性因素,建议遵循官方发布的兼容性矩阵选择合适的次要版本号。 ```bash # 下载 Google Cloud 公开密钥 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - # 注册 K8S APT 存储库 cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main EOF # 刷新本地缓存并将目标锁定至 v1.28.x 版本系列 sudo apt-get update sudo apt-get install -y kubelet=1.28.* kubeadm=1.28.* kubectl=1.28.* # 锁定已安装版本防止意外升级 sudo apt-mark hold kubelet kubeadm kubectl ``` 注意:以上脚本中的 `v1.28.x` 应替换为实际期望的具体子版本号。 #### 初始化 Master 节点 初始化 master 节点前,确认所有节点均已正确安装了所需的二进制文件和服务。 执行如下指令创建单机或多主机模式下的初始控制平面实例: ```bash sudo swapoff -a # 关闭交换分区 sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all ``` 此过程可能耗时较长,完成后按照屏幕提示继续操作以完成剩余步骤。 #### 加入 Worker 节点到 Cluster 中 在 worker 上重复前述准备工作之后,利用 master 提供的一次性令牌加入集群: ```bash sudo kubeadm join ... ``` 具体参数由master端输出提供。 #### 配置 kubectl 工具 为了让当前用户能够方便地管理新建立起来的集群,还需额外做些个人化的设定: ```bash mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config ``` 此时应该可以顺利连接上刚刚搭建好的集群了! #### 测试集群状态 最后一步就是检验整个架构是否健康运作: ```bash kubectl get nodes ``` 如果一切顺利的话,应当能看到所有的成员都处于 Ready 状态。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

村长在路上

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值