救回多年未用kubeadm搭建的kubernetes集群

背景:

kubeadm方式搭建的kubernetes集群,多年未用,现启用发现集群崩溃,报故障
高级runtime用的docker

错误信息:couldn't get current server API group list: Get "https://192.168.121.141:6443/api?timeout=32s": dial tcp 192.168.121.141:6443: connect: connection refused

在这里插入图片描述

处理

1、先检查runtime daemon程序是否正常

在这里插入图片描述

2、查询容器运行是否正常

在这里插入图片描述
发现容器都是崩溃状态,一键重启后查看api-server运行状况,发现还是运行失败

3、排查集群kubeadm证书过期导致api-server启动失败的情况

[root@master manifests]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
            Not Before: Apr 16 05:27:41 2023 GMT
            Not After : Apr 15 05:27:41 2024 GMT

发现时间已经过期一整年,优先恢复下证书,将过期时间进行修改

[root@master ~]# vim update-kubeadm-cert.sh
[root@master ~]# chmod +x update-kubeadm-cert.sh
[root@master ~]# ./update-kubeadm-cert.sh all
[2025-07-10T19:09:10.178353302+0800]: INFO: backup /etc/kubernetes to /etc/kubernetes.old-20250710
Signature ok
subject=/CN=etcd-server
Getting CA Private Key
[2025-07-10T19:09:10.203823693+0800]: INFO: generated /etc/kubernetes/pki/etcd/server.crt
Signature ok
subject=/CN=etcd-peer
Getting CA Private Key
[2025-07-10T19:09:10.233991908+0800]: INFO: generated /etc/kubernetes/pki/etcd/peer.crt
Signature ok
subject=/O=system:masters/CN=kube-etcd-healthcheck-client
Getting CA Private Key
[2025-07-10T19:09:10.252579614+0800]: INFO: generated /etc/kubernetes/pki/etcd/healthcheck-client.crt
Signature ok
subject=/O=system:masters/CN=kube-apiserver-etcd-client
Getting CA Private Key
[2025-07-10T19:09:10.270662717+0800]: INFO: generated /etc/kubernetes/pki/apiserver-etcd-client.crt
[2025-07-10T19:09:10.361241647+0800]: INFO: restarted etcd
Signature ok
subject=/CN=kube-apiserver
Getting CA Private Key
[2025-07-10T19:09:10.387280029+0800]: INFO: generated /etc/kubernetes/pki/apiserver.crt
Signature ok
subject=/O=system:masters/CN=kube-apiserver-kubelet-client
Getting CA Private Key
[2025-07-10T19:09:10.404954699+0800]: INFO: generated /etc/kubernetes/pki/apiserver-kubelet-client.crt
Signature ok
subject=/CN=system:kube-controller-manager
Getting CA Private Key
[2025-07-10T19:09:10.441468594+0800]: INFO: generated /etc/kubernetes/controller-manager.crt
[2025-07-10T19:09:10.446127295+0800]: INFO: generated new /etc/kubernetes/controller-manager.conf
Signature ok
subject=/CN=system:kube-scheduler
Getting CA Private Key
[2025-07-10T19:09:10.478495262+0800]: INFO: generated /etc/kubernetes/scheduler.crt
[2025-07-10T19:09:10.483909670+0800]: INFO: generated new /etc/kubernetes/scheduler.conf
Signature ok
subject=/O=system:masters/CN=kubernetes-admin
Getting CA Private Key
[2025-07-10T19:09:10.514043835+0800]: INFO: generated /etc/kubernetes/admin.crt
[2025-07-10T19:09:10.519546384+0800]: INFO: generated new /etc/kubernetes/admin.conf
[2025-07-10T19:09:10.526208608+0800]: INFO: copy the admin.conf to ~/.kube/config for kubectl
[2025-07-10T19:09:10.528293082+0800]: WARNING: does not need to update kubelet.conf
Signature ok
subject=/CN=front-proxy-client
Getting CA Private Key
[2025-07-10T19:09:10.545998609+0800]: INFO: generated /etc/kubernetes/pki/front-proxy-client.crt
[2025-07-10T19:09:10.617143451+0800]: INFO: restarted kube-apiserver
[2025-07-10T19:09:10.676738046+0800]: INFO: restarted kube-controller-manager
[2025-07-10T19:09:10.742478810+0800]: INFO: restarted kube-scheduler
[2025-07-10T19:09:10.790700987+0800]: INFO: restarted kubelet
您在 /var/spool/mail/root 中有新邮件
[root@master ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
            Not Before: Jul 10 11:09:10 2025 GMT
            Not After : Jun 16 11:09:10 2125 GMT

脚本内容如下:引用大佬的脚本

#!/bin/bash
 
set -o errexit
set -o pipefail
# set -o xtrace
 
log::err() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[31mERROR: \033[0m$@\n"
}
 
log::info() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[32mINFO: \033[0m$@\n"
}
 
log::warning() {
  printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[33mWARNING: \033[0m$@\n"
}
 
check_file() {
  if [[ ! -r  ${1} ]]; then
    log::err "can not find ${1}"
    exit 1
  fi
}
 
# get x509v3 subject alternative name from the old certificate
cert::get_subject_alt_name() {
  local cert=${1}.crt
  check_file "${cert}"
  local alt_name=$(openssl x509 -text -noout -in ${cert} | grep -A1 'Alternative' | tail -n1 | sed 's/[[:space:]]*Address//g')
  printf "${alt_name}\n"
}
 
# get subject from the old certificate
cert::get_subj() {
  local cert=${1}.crt
  check_file "${cert}"
  local subj=$(openssl x509 -text -noout -in ${cert}  | grep "Subject:" | sed 's/Subject:/\//g;s/\,/\//;s/[[:space:]]//g')
  printf "${subj}\n"
}
 
cert::backup_file() {
  local file=${1}
  if [[ ! -e ${file}.old-$(date +%Y%m%d) ]]; then
    cp -rp ${file} ${file}.old-$(date +%Y%m%d)
    log::info "backup ${file} to ${file}.old-$(date +%Y%m%d)"
  else
    log::warning "does not backup, ${file}.old-$(date +%Y%m%d) already exists"
  fi
}
 
# generate certificate whit client, server or peer
# Args:
#   $1 (the name of certificate)
#   $2 (the type of certificate, must be one of client, server, peer)
#   $3 (the subject of certificates)
#   $4 (the validity of certificates) (days)
#   $5 (the x509v3 subject alternative name of certificate when the type of certificate is server or peer)
cert::gen_cert() {
  local cert_name=${1}
  local cert_type=${2}
  local subj=${3}
  local cert_days=${4}
  local alt_name=${5}
  local cert=${cert_name}.crt
  local key=${cert_name}.key
  local csr=${cert_name}.csr
  local csr_conf="distinguished_name = dn\n[dn]\n[v3_ext]\nkeyUsage = critical, digitalSignature, keyEncipherment\n"
 
  check_file "${key}"
  check_file "${cert}"
 
  # backup certificate when certificate not in ${kubeconf_arr[@]}
  # kubeconf_arr=("controller-manager.crt" "scheduler.crt" "admin.crt" "kubelet.crt")
  # if [[ ! "${kubeconf_arr[@]}" =~ "${cert##*/}" ]]; then
  #   cert::backup_file "${cert}"
  # fi
 
  case "${cert_type}" in
    client)
      openssl req -new  -key ${key} -subj "${subj}" -reqexts v3_ext \
        -config <(printf "${csr_conf} extendedKeyUsage = clientAuth\n") -out ${csr}
      openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
        -extfile <(printf "${csr_conf} extendedKeyUsage = clientAuth\n") -days ${cert_days} -out ${cert}
      log::info "generated ${cert}"
    ;;
    server)
      openssl req -new  -key ${key} -subj "${subj}" -reqexts v3_ext \
        -config <(printf "${csr_conf} extendedKeyUsage = serverAuth\nsubjectAltName = ${alt_name}\n") -out ${csr}
      openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
        -extfile <(printf "${csr_conf} extendedKeyUsage = serverAuth\nsubjectAltName = ${alt_name}\n") -days ${cert_days} -out ${cert}
      log::info "generated ${cert}"
    ;;
    peer)
      openssl req -new  -key ${key} -subj "${subj}" -reqexts v3_ext \
        -config <(printf "${csr_conf} extendedKeyUsage = serverAuth, clientAuth\nsubjectAltName = ${alt_name}\n") -out ${csr}
      openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
        -extfile <(printf "${csr_conf} extendedKeyUsage = serverAuth, clientAuth\nsubjectAltName = ${alt_name}\n") -days ${cert_days} -out ${cert}
      log::info "generated ${cert}"
    ;;
    *)
      log::err "unknow, unsupported etcd certs type: ${cert_type}, supported type: client, server, peer"
      exit 1
  esac
 
  rm -f ${csr}
}
 
cert::update_kubeconf() {
  local cert_name=${1}
  local kubeconf_file=${cert_name}.conf
  local cert=${cert_name}.crt
  local key=${cert_name}.key
 
  # generate  certificate
  check_file ${kubeconf_file}
  # get the key from the old kubeconf
  grep "client-key-data" ${kubeconf_file} | awk {'print$2'} | base64 -d > ${key}
  # get the old certificate from the old kubeconf
  grep "client-certificate-data" ${kubeconf_file} | awk {'print$2'} | base64 -d > ${cert}
  # get subject from the old certificate
  local subj=$(cert::get_subj ${cert_name})
  cert::gen_cert "${cert_name}" "client" "${subj}" "${CAER_DAYS}"
  # get certificate base64 code
  local cert_base64=$(base64 -w 0 ${cert})
 
  # backup kubeconf
  # cert::backup_file "${kubeconf_file}"
 
  # set certificate base64 code to kubeconf
  sed -i 's/client-certificate-data:.*/client-certificate-data: '${cert_base64}'/g' ${kubeconf_file}
 
  log::info "generated new ${kubeconf_file}"
  rm -f ${cert}
  rm -f ${key}
 
  # set config for kubectl
  if [[ ${cert_name##*/} == "admin" ]]; then
    mkdir -p ~/.kube
    cp -fp ${kubeconf_file} ~/.kube/config
    log::info "copy the admin.conf to ~/.kube/config for kubectl"
  fi
}
 
cert::update_etcd_cert() {
  PKI_PATH=${KUBE_PATH}/pki/etcd
  CA_CERT=${PKI_PATH}/ca.crt
  CA_KEY=${PKI_PATH}/ca.key
 
  check_file "${CA_CERT}"
  check_file "${CA_KEY}"
 
  # generate etcd server certificate
  # /etc/kubernetes/pki/etcd/server
  CART_NAME=${PKI_PATH}/server
  subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
  cert::gen_cert "${CART_NAME}" "peer" "/CN=etcd-server" "${CAER_DAYS}" "${subject_alt_name}"
 
  # generate etcd peer certificate
  # /etc/kubernetes/pki/etcd/peer
  CART_NAME=${PKI_PATH}/peer
  subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
  cert::gen_cert "${CART_NAME}" "peer" "/CN=etcd-peer" "${CAER_DAYS}" "${subject_alt_name}"
 
  # generate etcd healthcheck-client certificate
  # /etc/kubernetes/pki/etcd/healthcheck-client
  CART_NAME=${PKI_PATH}/healthcheck-client
  cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-etcd-healthcheck-client" "${CAER_DAYS}"
 
  # generate apiserver-etcd-client certificate
  # /etc/kubernetes/pki/apiserver-etcd-client
  check_file "${CA_CERT}"
  check_file "${CA_KEY}"
  PKI_PATH=${KUBE_PATH}/pki
  CART_NAME=${PKI_PATH}/apiserver-etcd-client
  cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-apiserver-etcd-client" "${CAER_DAYS}"
 
  # restart etcd
  docker ps | awk '/k8s_etcd/{print$1}' | xargs -r -I '{}' docker restart {} || true
  log::info "restarted etcd"
}
 
cert::update_master_cert() {
  PKI_PATH=${KUBE_PATH}/pki
  CA_CERT=${PKI_PATH}/ca.crt
  CA_KEY=${PKI_PATH}/ca.key
 
  check_file "${CA_CERT}"
  check_file "${CA_KEY}"
 
  # generate apiserver server certificate
  # /etc/kubernetes/pki/apiserver
  CART_NAME=${PKI_PATH}/apiserver
  subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
  cert::gen_cert "${CART_NAME}" "server" "/CN=kube-apiserver" "${CAER_DAYS}" "${subject_alt_name}"
 
  # generate apiserver-kubelet-client certificate
  # /etc/kubernetes/pki/apiserver-kubelet-client
  CART_NAME=${PKI_PATH}/apiserver-kubelet-client
  cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-apiserver-kubelet-client" "${CAER_DAYS}"
 
  # generate kubeconf for controller-manager,scheduler,kubectl and kubelet
  # /etc/kubernetes/controller-manager,scheduler,admin,kubelet.conf
  cert::update_kubeconf "${KUBE_PATH}/controller-manager"
  cert::update_kubeconf "${KUBE_PATH}/scheduler"
  cert::update_kubeconf "${KUBE_PATH}/admin"
  # check kubelet.conf
  # https://github.com/kubernetes/kubeadm/issues/1753
  set +e
  grep kubelet-client-current.pem /etc/kubernetes/kubelet.conf > /dev/null 2>&1
  kubelet_cert_auto_update=$?
  set -e
  if [[ "$kubelet_cert_auto_update" == "0" ]]; then
    log::warning "does not need to update kubelet.conf"
  else
    cert::update_kubeconf "${KUBE_PATH}/kubelet"
  fi
 
  # generate front-proxy-client certificate
  # use front-proxy-client ca
  CA_CERT=${PKI_PATH}/front-proxy-ca.crt
  CA_KEY=${PKI_PATH}/front-proxy-ca.key
  check_file "${CA_CERT}"
  check_file "${CA_KEY}"
  CART_NAME=${PKI_PATH}/front-proxy-client
  cert::gen_cert "${CART_NAME}" "client" "/CN=front-proxy-client" "${CAER_DAYS}"
 
  # restart apiserve, controller-manager, scheduler and kubelet
  docker ps | awk '/k8s_kube-apiserver/{print$1}' | xargs -r -I '{}' docker restart {} || true
  log::info "restarted kube-apiserver"
  docker ps | awk '/k8s_kube-controller-manager/{print$1}' | xargs -r -I '{}' docker restart {} || true
  log::info "restarted kube-controller-manager"
  docker ps | awk '/k8s_kube-scheduler/{print$1}' | xargs -r -I '{}' docker restart {} || true
  log::info "restarted kube-scheduler"
  systemctl restart kubelet
  log::info "restarted kubelet"
}
 
main() {
  local node_tpye=$1
  
  KUBE_PATH=/etc/kubernetes
  CAER_DAYS=36500
 
  # backup $KUBE_PATH to $KUBE_PATH.old-$(date +%Y%m%d)
  cert::backup_file "${KUBE_PATH}"
 
  case ${node_tpye} in
    etcd)
	  # update etcd certificates
      cert::update_etcd_cert
    ;;
    master)
	  # update master certificates and kubeconf
      cert::update_master_cert
    ;;
    all)
      # update etcd certificates
      cert::update_etcd_cert
      # update master certificates and kubeconf
      cert::update_master_cert
    ;;
    *)
      log::err "unknow, unsupported certs type: ${cert_type}, supported type: all, etcd, master"
      printf "Documentation: https://github.com/yuyicai/update-kube-cert
  example:
    '\033[32m./update-kubeadm-cert.sh all\033[0m' update all etcd certificates, master certificates and kubeconf
      /etc/kubernetes
      ├── admin.conf
      ├── controller-manager.conf
      ├── scheduler.conf
      ├── kubelet.conf
      └── pki
          ├── apiserver.crt
          ├── apiserver-etcd-client.crt
          ├── apiserver-kubelet-client.crt
          ├── front-proxy-client.crt
          └── etcd
              ├── healthcheck-client.crt
              ├── peer.crt
              └── server.crt
    '\033[32m./update-kubeadm-cert.sh etcd\033[0m' update only etcd certificates
      /etc/kubernetes
      └── pki
          ├── apiserver-etcd-client.crt
          └── etcd
              ├── healthcheck-client.crt
              ├── peer.crt
              └── server.crt
    '\033[32m./update-kubeadm-cert.sh master\033[0m' update only master certificates and kubeconf
      /etc/kubernetes
      ├── admin.conf
      ├── controller-manager.conf
      ├── scheduler.conf
      ├── kubelet.conf
      └── pki
          ├── apiserver.crt
          ├── apiserver-kubelet-client.crt
          └── front-proxy-client.crt
"
      exit 1
    esac
}
 
main "$@"

4、继续get资源发现新的报错

错误信息

错误信息:couldn't get current server API group list: the server has asked for the client to provide credentials

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
错误日志信息:

错误信息:unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

在这里插入图片描述

kubelet程序auto-restart状态导致通信异常,报错日志显示配置文件不存在,顺带查一下kubelet证书时间

[root@node1 pki]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate
notAfter=Apr 15 05:29:49 2024 GMT

实锤kubelet证书过期导致异常
替换证书处理:

1、主节点新生成证书
[root@master ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
error: You must be logged in to the server (the server has asked for the client to provide credentials)
[root@master ~]# mkdir test
您在 /var/spool/mail/root 中有新邮件
[root@master ~]# kubeadm init --kubernetes-version=v1.27.1 phase kubeconfig kubelet --node-name node1 --kubeconfig-dir ./test/
[kubeconfig] Writing "kubelet.conf" kubeconfig file

主节点和node节点都进行操作(都可以先查下证书时间,过期的都换)
[root@master ~]# scp test/kubelet.conf node1:/etc/kubernetes/
The authenticity of host 'node1 (192.168.121.142)' can't be established.
ECDSA key fingerprint is SHA256:RW+UCKRCkn+EBytxKm4Y+PA7z8YXjd7EMlUifsvljhI.
ECDSA key fingerprint is MD5:6f:97:64:8b:4c:5a:b0:33:1a:4e:95:ef:e3:a1:75:61.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.121.142' (ECDSA) to the list of known hosts.
root@node1's password: 
kubelet.conf     
[root@master ~]# scp test/kubelet.conf /etc/kubernetes/
[root@master ~]# cd /etc/kubernetes
[root@master kubernetes]# ll
总用量 36
-rw-------  1 root root 5511 710 19:09 admin.conf
-rw-------  1 root root 5551 710 19:09 controller-manager.conf
-rw-------  1 root root 5595 710 19:27 kubelet.conf
drwxr-xr-x. 2 root root  113 416 2023 manifests
drwxr-xr-x  3 root root 4096 710 19:09 pki
-rw-------  1 root root 5499 710 19:09 scheduler.conf
[root@master kubernetes]# systemctl restart kubelet
                     

在这里插入图片描述

5、获取namespace资源异常

错误信息:

Error from server (Forbidden): namespaces is forbidden: User "system:node:node1" cannot list resource "namespaces" in API group "" at the cluster scope
[root@master kubernetes]# kubectl get ns -A
Error from server (Forbidden): namespaces is forbidden: User "system:node:node1" cannot list resource "namespaces" in API group "" at the cluster scope
[root@master ~]# kubectl get rolebinding,clusterrolebinding -A
Error from server (Forbidden): rolebindings.rbac.authorization.k8s.io is forbidden: User "system:node:node1" cannot list resource "rolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:node:node1" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope

解决方法:

[root@master kubernetes]# mkdir -p $HOME/.kube
[root@master kubernetes]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y
[root@master kubernetes]# 
[root@master kubernetes]# 
[root@master kubernetes]# 
[root@master kubernetes]# export KUBECONFIG=/etc/kubernetes/admin.conf

6、校验

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值