kubeadm快速部署kubernetes集群（1.24.2版本）

原创已于 2022-06-26 14:27:48 修改 · 1.2k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#kubernetes #linux #运维

于 2022-06-26 14:27:19 首次发布

k8s 专栏收录该内容

4 篇文章

订阅专栏

本文详细介绍了在CentOS7.x环境下搭建Kubernetes集群的步骤，包括硬件和网络要求、关闭防火墙和交换机、配置主机名、安装containerd和kubeadm等。在集群初始化过程中，特别提到了阿里云镜像仓库的使用和网络插件Calico的部署，以实现跨主机网络通信。最后，还解决了kubelet启动控制平面的问题，并展示了如何通过ipvs提高服务性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.安装要求

一台或多台机器，操作系统 CentOS7.x-86_x64
硬件配置：2GB或更多RAM，2个CPU或更多CPU，硬盘30GB或更多
集群中所有机器之间的网络互通
交换禁用。您必须禁用交换才能使 kubelet 正常工作
可以访问外网，需要拉取镜像；

2.环境准备

角色	IP
master-140	192.168.100.140
node-141	192.168.100.141
node-142	192.168.100.142

1. 关闭防火墙：
systemctl stop firewalld
systemctl disable firewalld

2. 关闭selinux：
sed -i 's/enforcing/disabled/'/etc/selinux/config  # 永久 需重启
setenforce 0# 临时

3. 关闭swap：
swapoff -a  # 临时
vim /etc/fstab  # 永久 将swap那一行注释

4.根据规划设置主机名
在master添加hosts：
cat >>/etc/hosts << EOF
192.168.100.140 master-140
192.168.100.141 node-141
192.168.100.142 node-142
EOF

5.修改linux的内核采纳数，添加网桥过滤和地址转发功能
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sudo sysctl --system

6.时间同步：
yum install ntpdate -y
ntpdate cn.pool.ntp.org

3.安装container

1.使用containerd 作为容器，下载 containerd 包

# wget https://github.com/containerd/containerd/releases/download/v1.6.6/cri-containerd-cni-1.6.6-linux-amd64.tar.gz

这里需要制定解压目录为【/】，包自带结构。
# tar zxvf cri-containerd-cni-1.6.6-linux-amd64.tar.gz -C /

2.创建容器目录
# mkdir /etc/containerd

3.生成容器配置文件
# containerd config default  >> /etc/containerd/config.toml

4.配置systemdcgroup 驱动程序
# vim /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

5.修改sandbox (pause) image地址
# vim /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2"

6.更新runc，因为cri-containerd-cni-1.6.6-linux-amd64.tar.gz的runc二进制文件有问题，最后说明。这一步很重要 ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ 
# wget https://github.com/opencontainers/runc/releases/download/v1.1.3/runc.amd64
# mv runc.amd64 /usr/local/sbin/runc 
mv：是否覆盖"/usr/local/sbin/runc"？ y
# chmod +x /usr/local/sbin/runc

7.启动containerd服务
# systemctl start containerd
# systemctl enable containerd

4.安装kubeadm、kubelet、kubectl

1.添加阿里云YUM源
# cat >/etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
 
2.指定版本安装软件
# yum install  kubelet-1.24.2 kubeadm-1.24.2 kubectl-1.24.2

3.配置kubelet的cgroup
# vim /etc/sysconfig/kubelet, 添加下面的配置
KUBELET_CGROUP_ARGS="--cgroup-driver=systemd"

3. kubelet设置开机自启
# systemctl enable kubelet

5.集群初始化

【此步骤只在master节点执行】
# kubeadm init \
--apiserver-advertise-address=192.168.100.140 \
--image-repository=registry.aliyuncs.com/google_containers \
--kubernetes-version=1.24.2 \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12

--apiserver-advertise-address 集群通告地址
--image-repository 由于默认拉取镜像地址k8s.gcr.io国内无法访问，这里指定阿里云镜像仓库地址。
--kubernetes-version K8s版本，与上面安装的一致
--service-cidr 集群内部虚拟网络，Pod统一访问入口
--pod-network-cidr Pod网络，与下面部署的CNI网络组件yaml中保持一致

【下面为日志输出】
......
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.100.140:6443 --token dirta5.mvlho7gqshh9hw6o \
	--discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd

根据输出日志操作

# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config

查看node
# kubectl get nodes
NAME         STATUS   ROLES           AGE     VERSION
master-140   Ready    control-plane   2m16s   v1.24.2

6.Node节点加入集群

在node节点执行。向集群添加新节点，执行在kubeadm init输出的kubeadm join命令。
# kubeadm join 192.168.100.140:6443 --token dirta5.mvlho7gqshh9hw6o --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd

查看node（上面我只在node-141执行了）
kubectl get nodes
NAME         STATUS   ROLES           AGE     VERSION
master-140   Ready    control-plane   2m16s   v1.24.2
node-141     Ready    <none>          54s     v1.24.2

token默认有效期为24小时，过期后需要重新创建：

1.查看token
# kubeadm token list 
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
dirta5.mvlho7gqshh9hw6o   23h         2022-06-27T05:01:40Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

2.创建token
# kubeadm token create --print-join-command
kubeadm join 192.168.100.140:6443 --token 81zsrm.jvjhbg0mwlsdzdb7 --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd

3.查看token
# kubeadm token list 
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
81zsrm.jvjhbg0mwlsdzdb7   23h         2022-06-27T05:11:20Z   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token
dirta5.mvlho7gqshh9hw6o   23h         2022-06-27T05:01:40Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

4.查看discovery-token-ca-cert-hash
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd

用新创建的token把node-142加入集群

【此操作在node-142执行】
# kubeadm join 192.168.100.140:6443 --token 81zsrm.jvjhbg0mwlsdzdb7 --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd

【在master查看node】
# kubectl get nodes
NAME         STATUS   ROLES           AGE   VERSION
master-140   Ready    control-plane   16m   v1.24.2
node-141     Ready    <none>          15m   v1.24.2
node-142     Ready    <none>          35s   v1.24.2

7.部署网络插件

解决容器跨主机网络通信，此cni网络插件使用calico

参考地址：Quickstart for Calico on Kubernetes

1.查看kebe-system空间的pod
# kubectl get pods -n kube-system
NAME                                 READY   STATUS              RESTARTS   AGE
coredns-74586cf9b6-5bfk7             0/1     ContainerCreating   0          22m
coredns-74586cf9b6-d29mj             0/1     ContainerCreating   0          22m
...

查看到coredns的两个pod异常，是因为没有部署cni网络插件。

2.下载calico的yaml文件
# wget https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
# wget https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml

3.修改custom-resources.yaml

    ipPools:
    - blockSize: 26
      cidr: 10.244.0.0/16    # 此处修改为pod-network-cidr的范围，就是init集群时候写的。
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

4.安装calico
# kubectl apply -f tigera-operator.yaml
# kubectl apply -f custom-resources.yaml

5.查看
# kubectl get pods -n calico-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-86dff98c45-jjflf   1/1     Running   0          2m20s
calico-node-27zbg                          1/1     Running   0          2m20s
calico-node-kjphd                          1/1     Running   0          2m20s
calico-node-ntw22                          1/1     Running   0          2m20s
calico-typha-6c8778fdb7-bbpnh              1/1     Running   0          2m20s
calico-typha-6c8778fdb7-lpmdl              1/1     Running   0          2m11s

6.查看coredns是否正常
 kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-74586cf9b6-5bfk7             1/1     Running   0          28m
coredns-74586cf9b6-d29mj             1/1     Running   0          28m
...
查看已正常

8.集群测试

1.部署一个deployment
# kubectl create deployment deploy-nginx --image=nginx:1.18

2.部署的deploy默认是一个pod，现在扩容为3个
# kubectl scale deployment deploy-nginx --replicas=3

3.暴露端口
# kubectl expose deployment deploy-nginx --port=80 --target-port=8081 --type=NodePort

4.查看
# kubectl get deployment,pods,svc -o wide
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS   IMAGES       SELECTOR
deployment.apps/deploy-nginx   3/3     3            3           5m5s   nginx        nginx:1.18   app=deploy-nginx

NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE       NOMINATED NODE   READINESS GATES
pod/deploy-nginx-74565bf758-8dsp7   1/1     Running   0          5m5s    10.244.65.194   node-141   <none>           <none>
pod/deploy-nginx-74565bf758-9kc74   1/1     Running   0          4m12s   10.244.56.3     node-142   <none>           <none>
pod/deploy-nginx-74565bf758-j7gs9   1/1     Running   0          4m12s   10.244.56.4     node-142   <none>           <none>

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE   SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP          37m   <none>
service/svc-nginx    NodePort    10.101.189.51   <none>        8081:31379/TCP   4s    app=deploy-nginx

5.访问pod地址和svc地址
---pod地址
# curl 10.244.65.194 
# curl 10.244.56.3
# curl 10.244.56.4
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 05:41:47 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes
---svc地址
# curl -I 10.101.189.51:8081
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 05:42:36 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes

9.配置ipvs

在Kubernetes中Service有两种带来模型，一种是基于iptables的，一种是基于ipvs的两者比较的话，ipvs的性能明显要高一些，但是如果要使用它，需要手动载入ipvs模块。

1.安装ipset和ipvsadm
# yum install ipset ipvsadm -y

2.添加需要加载的模块写入脚本文件
# cat <<EOF> /etc/sysconfig/modules/ipvs.modules
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

3.为脚本添加执行权限
# chmod +x /etc/sysconfig/modules/ipvs.modules

4.执行脚本文件
# /bin/bash /etc/sysconfig/modules/ipvs.modules

5.查看对应的模块是否加载成功
# lsmod | grep -e ip_vs -e nf_conntrack_ipv4

修改kube-proxy 的工作模式

1.在master节点执行
# kubectl edit cm kube-proxy -n kube-system
...
    kind: KubeProxyConfiguration
    metricsBindAddress: ""
    mode: "ipvs"   # 此处修改为ipvs,默认为空
    nodePortAddresses: null

...

2.查看当前的kube-proxy
# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-74586cf9b6-5bfk7             1/1     Running   0          75m
coredns-74586cf9b6-d29mj             1/1     Running   0          75m
etcd-master-140                      1/1     Running   0          76m
kube-apiserver-master-140            1/1     Running   0          76m
kube-controller-manager-master-140   1/1     Running   0          76m
kube-proxy-f7rcx                     1/1     Running   0          74m
kube-proxy-ggchx                     1/1     Running   0          60m
kube-proxy-hbt94                     1/1     Running   0          75m
kube-scheduler-master-140            1/1     Running   0          76m

3.删除当前的kube-proxy
# kubectl delete pod kube-proxy-f7rcx kube-proxy-ggchx kube-proxy-hbt94 -n kube-system
pod "kube-proxy-f7rcx" deleted
pod "kube-proxy-ggchx" deleted
pod "kube-proxy-hbt94" deleted

4.查看新自动创建的kube-proxy
# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-74586cf9b6-5bfk7             1/1     Running   0          77m
coredns-74586cf9b6-d29mj             1/1     Running   0          77m
etcd-master-140                      1/1     Running   0          78m
kube-apiserver-master-140            1/1     Running   0          78m
kube-controller-manager-master-140   1/1     Running   0          78m
kube-proxy-7859q                     1/1     Running   0          44s
kube-proxy-l4gqx                     1/1     Running   0          43s
kube-proxy-nnjr2                     1/1     Running   0          43s
kube-scheduler-master-140            1/1     Running   0          78m

验证：

1.查看刚才创建的svc
# kubectl get svc -o wide
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE   SELECTOR
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP          80m   <none>
svc-nginx    NodePort    10.101.189.51   <none>        8081:31379/TCP   42m   app=deploy-nginx

2. 请求
# curl -I 10.101.189.51:8081
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 06:22:14 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes

3.查看ipvs规则
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
...        
TCP  10.101.148.59:443 rr
  -> 10.244.56.2:5443             Masq    1      0          0         
  -> 10.244.65.193:5443           Masq    1      0          0    
(下面这个就是svc的ipvs规则链)     
TCP  10.101.189.51:8081 rr
  -> 10.244.56.3:80               Masq    1      0          0         
  -> 10.244.56.4:80               Masq    1      0          0         
  -> 10.244.65.194:80             Masq    1      0          1         
TCP  10.103.59.95:9094 rr
  -> 10.244.56.1:9094             Masq    1      0          0         
...

10.问题解决

进行集群初始化时候遇到如下错误。是因为安装的containerd二进制包里面的runc有问题，从官网从新下载一个替换解决，不要问为什么，我也不知道。

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

如果以上内容有错误的地方，欢迎指正，谢谢！