1.安装要求
- 一台或多台机器,操作系统 CentOS7.x-86_x64
- 硬件配置:2GB或更多RAM,2个CPU或更多CPU,硬盘30GB或更多
- 集群中所有机器之间的网络互通
- 交换禁用。您必须禁用交换才能使 kubelet 正常工作
- 可以访问外网,需要拉取镜像;
2.环境准备
角色 | IP |
master-140 | 192.168.100.140 |
node-141 | 192.168.100.141 |
node-142 | 192.168.100.142 |
1. 关闭防火墙:
systemctl stop firewalld
systemctl disable firewalld
2. 关闭selinux:
sed -i 's/enforcing/disabled/'/etc/selinux/config # 永久 需重启
setenforce 0# 临时
3. 关闭swap:
swapoff -a # 临时
vim /etc/fstab # 永久 将swap那一行注释
4.根据规划设置主机名
在master添加hosts:
cat >>/etc/hosts << EOF
192.168.100.140 master-140
192.168.100.141 node-141
192.168.100.142 node-142
EOF
5.修改linux的内核采纳数,添加网桥过滤和地址转发功能
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
6.时间同步:
yum install ntpdate -y
ntpdate cn.pool.ntp.org
3.安装container
1.使用containerd 作为容器,下载 containerd 包
# wget https://github.com/containerd/containerd/releases/download/v1.6.6/cri-containerd-cni-1.6.6-linux-amd64.tar.gz
这里需要制定解压目录为【/】,包自带结构。
# tar zxvf cri-containerd-cni-1.6.6-linux-amd64.tar.gz -C /
2.创建容器目录
# mkdir /etc/containerd
3.生成容器配置文件
# containerd config default >> /etc/containerd/config.toml
4.配置systemdcgroup 驱动程序
# vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
5.修改sandbox (pause) image地址
# vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2"
6.更新runc,因为cri-containerd-cni-1.6.6-linux-amd64.tar.gz的runc二进制文件有问题,最后说明。这一步很重要 ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰ ✰
# wget https://github.com/opencontainers/runc/releases/download/v1.1.3/runc.amd64
# mv runc.amd64 /usr/local/sbin/runc
mv:是否覆盖"/usr/local/sbin/runc"? y
# chmod +x /usr/local/sbin/runc
7.启动containerd服务
# systemctl start containerd
# systemctl enable containerd
4.安装kubeadm、kubelet、kubectl
1.添加阿里云YUM源
# cat >/etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
2.指定版本安装软件
# yum install kubelet-1.24.2 kubeadm-1.24.2 kubectl-1.24.2
3.配置kubelet的cgroup
# vim /etc/sysconfig/kubelet, 添加下面的配置
KUBELET_CGROUP_ARGS="--cgroup-driver=systemd"
3. kubelet设置开机自启
# systemctl enable kubelet
5.集群初始化
【此步骤只在master节点执行】
# kubeadm init \
--apiserver-advertise-address=192.168.100.140 \
--image-repository=registry.aliyuncs.com/google_containers \
--kubernetes-version=1.24.2 \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12
- --apiserver-advertise-address 集群通告地址
- --image-repository 由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址。
- --kubernetes-version K8s版本,与上面安装的一致
- --service-cidr 集群内部虚拟网络,Pod统一访问入口
- --pod-network-cidr Pod网络,与下面部署的CNI网络组件yaml中保持一致
【下面为日志输出】
......
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.100.140:6443 --token dirta5.mvlho7gqshh9hw6o \
--discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd
根据输出日志操作
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
查看node
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-140 Ready control-plane 2m16s v1.24.2
6.Node节点加入集群
在node节点执行。向集群添加新节点,执行在kubeadm init输出的kubeadm join命令。
# kubeadm join 192.168.100.140:6443 --token dirta5.mvlho7gqshh9hw6o --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd
查看node(上面我只在node-141执行了)
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-140 Ready control-plane 2m16s v1.24.2
node-141 Ready <none> 54s v1.24.2
token默认有效期为24小时,过期后需要重新创建:
1.查看token
# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
dirta5.mvlho7gqshh9hw6o 23h 2022-06-27T05:01:40Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
2.创建token
# kubeadm token create --print-join-command
kubeadm join 192.168.100.140:6443 --token 81zsrm.jvjhbg0mwlsdzdb7 --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd
3.查看token
# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
81zsrm.jvjhbg0mwlsdzdb7 23h 2022-06-27T05:11:20Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
dirta5.mvlho7gqshh9hw6o 23h 2022-06-27T05:01:40Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
4.查看discovery-token-ca-cert-hash
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd
用新创建的token把node-142加入集群
【此操作在node-142执行】
# kubeadm join 192.168.100.140:6443 --token 81zsrm.jvjhbg0mwlsdzdb7 --discovery-token-ca-cert-hash sha256:fc2e5cf3feebbdf8fec37ca9ce7656431414ebf816f217b7d1c076dd89e9dadd
【在master查看node】
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-140 Ready control-plane 16m v1.24.2
node-141 Ready <none> 15m v1.24.2
node-142 Ready <none> 35s v1.24.2
7.部署网络插件
解决容器跨主机网络通信,此cni网络插件使用calico
参考地址:Quickstart for Calico on Kubernetes
1.查看kebe-system空间的pod
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74586cf9b6-5bfk7 0/1 ContainerCreating 0 22m
coredns-74586cf9b6-d29mj 0/1 ContainerCreating 0 22m
...
查看到coredns的两个pod异常,是因为没有部署cni网络插件。
2.下载calico的yaml文件
# wget https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
# wget https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml
3.修改custom-resources.yaml
ipPools:
- blockSize: 26
cidr: 10.244.0.0/16 # 此处修改为pod-network-cidr的范围,就是init集群时候写的。
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
4.安装calico
# kubectl apply -f tigera-operator.yaml
# kubectl apply -f custom-resources.yaml
5.查看
# kubectl get pods -n calico-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-86dff98c45-jjflf 1/1 Running 0 2m20s
calico-node-27zbg 1/1 Running 0 2m20s
calico-node-kjphd 1/1 Running 0 2m20s
calico-node-ntw22 1/1 Running 0 2m20s
calico-typha-6c8778fdb7-bbpnh 1/1 Running 0 2m20s
calico-typha-6c8778fdb7-lpmdl 1/1 Running 0 2m11s
6.查看coredns是否正常
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74586cf9b6-5bfk7 1/1 Running 0 28m
coredns-74586cf9b6-d29mj 1/1 Running 0 28m
...
查看已正常
8.集群测试
1.部署一个deployment
# kubectl create deployment deploy-nginx --image=nginx:1.18
2.部署的deploy默认是一个pod,现在扩容为3个
# kubectl scale deployment deploy-nginx --replicas=3
3.暴露端口
# kubectl expose deployment deploy-nginx --port=80 --target-port=8081 --type=NodePort
4.查看
# kubectl get deployment,pods,svc -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/deploy-nginx 3/3 3 3 5m5s nginx nginx:1.18 app=deploy-nginx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/deploy-nginx-74565bf758-8dsp7 1/1 Running 0 5m5s 10.244.65.194 node-141 <none> <none>
pod/deploy-nginx-74565bf758-9kc74 1/1 Running 0 4m12s 10.244.56.3 node-142 <none> <none>
pod/deploy-nginx-74565bf758-j7gs9 1/1 Running 0 4m12s 10.244.56.4 node-142 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 37m <none>
service/svc-nginx NodePort 10.101.189.51 <none> 8081:31379/TCP 4s app=deploy-nginx
5.访问pod地址和svc地址
---pod地址
# curl 10.244.65.194
# curl 10.244.56.3
# curl 10.244.56.4
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 05:41:47 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes
---svc地址
# curl -I 10.101.189.51:8081
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 05:42:36 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes
9.配置ipvs
在Kubernetes中Service有两种带来模型,一种是基于iptables的,一种是基于ipvs的两者比较的话,ipvs的性能明显要高一些,但是如果要使用它,需要手动载入ipvs模块 。
1.安装ipset和ipvsadm
# yum install ipset ipvsadm -y
2.添加需要加载的模块写入脚本文件
# cat <<EOF> /etc/sysconfig/modules/ipvs.modules
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
3.为脚本添加执行权限
# chmod +x /etc/sysconfig/modules/ipvs.modules
4.执行脚本文件
# /bin/bash /etc/sysconfig/modules/ipvs.modules
5.查看对应的模块是否加载成功
# lsmod | grep -e ip_vs -e nf_conntrack_ipv4
修改kube-proxy 的工作模式
1.在master节点执行
# kubectl edit cm kube-proxy -n kube-system
...
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: "ipvs" # 此处修改为ipvs,默认为空
nodePortAddresses: null
...
2.查看当前的kube-proxy
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74586cf9b6-5bfk7 1/1 Running 0 75m
coredns-74586cf9b6-d29mj 1/1 Running 0 75m
etcd-master-140 1/1 Running 0 76m
kube-apiserver-master-140 1/1 Running 0 76m
kube-controller-manager-master-140 1/1 Running 0 76m
kube-proxy-f7rcx 1/1 Running 0 74m
kube-proxy-ggchx 1/1 Running 0 60m
kube-proxy-hbt94 1/1 Running 0 75m
kube-scheduler-master-140 1/1 Running 0 76m
3.删除当前的kube-proxy
# kubectl delete pod kube-proxy-f7rcx kube-proxy-ggchx kube-proxy-hbt94 -n kube-system
pod "kube-proxy-f7rcx" deleted
pod "kube-proxy-ggchx" deleted
pod "kube-proxy-hbt94" deleted
4.查看新自动创建的kube-proxy
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74586cf9b6-5bfk7 1/1 Running 0 77m
coredns-74586cf9b6-d29mj 1/1 Running 0 77m
etcd-master-140 1/1 Running 0 78m
kube-apiserver-master-140 1/1 Running 0 78m
kube-controller-manager-master-140 1/1 Running 0 78m
kube-proxy-7859q 1/1 Running 0 44s
kube-proxy-l4gqx 1/1 Running 0 43s
kube-proxy-nnjr2 1/1 Running 0 43s
kube-scheduler-master-140 1/1 Running 0 78m
验证:
1.查看刚才创建的svc
# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 80m <none>
svc-nginx NodePort 10.101.189.51 <none> 8081:31379/TCP 42m app=deploy-nginx
2. 请求
# curl -I 10.101.189.51:8081
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Sun, 26 Jun 2022 06:22:14 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
Connection: keep-alive
ETag: "5e9efe7d-264"
Accept-Ranges: bytes
3.查看ipvs规则
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
...
TCP 10.101.148.59:443 rr
-> 10.244.56.2:5443 Masq 1 0 0
-> 10.244.65.193:5443 Masq 1 0 0
(下面这个就是svc的ipvs规则链)
TCP 10.101.189.51:8081 rr
-> 10.244.56.3:80 Masq 1 0 0
-> 10.244.56.4:80 Masq 1 0 0
-> 10.244.65.194:80 Masq 1 0 1
TCP 10.103.59.95:9094 rr
-> 10.244.56.1:9094 Masq 1 0 0
...
10.问题解决
进行集群初始化时候遇到如下错误。是因为安装的containerd二进制包里面的runc有问题,从官网从新下载一个替换解决,不要问为什么,我也不知道。
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
如果以上内容有错误的地方,欢迎指正,谢谢!