背景:
kubeadm方式搭建的kubernetes集群,多年未用,现启用发现集群崩溃,报故障
高级runtime用的docker
错误信息:couldn't get current server API group list: Get "https://192.168.121.141:6443/api?timeout=32s": dial tcp 192.168.121.141:6443: connect: connection refused
处理
1、先检查runtime daemon程序是否正常
2、查询容器运行是否正常
发现容器都是崩溃状态,一键重启后查看api-server运行状况,发现还是运行失败
3、排查集群kubeadm证书过期导致api-server启动失败的情况
[root@master manifests]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
Not Before: Apr 16 05:27:41 2023 GMT
Not After : Apr 15 05:27:41 2024 GMT
发现时间已经过期一整年,优先恢复下证书,将过期时间进行修改
[root@master ~]# vim update-kubeadm-cert.sh
[root@master ~]# chmod +x update-kubeadm-cert.sh
[root@master ~]# ./update-kubeadm-cert.sh all
[2025-07-10T19:09:10.178353302+0800]: INFO: backup /etc/kubernetes to /etc/kubernetes.old-20250710
Signature ok
subject=/CN=etcd-server
Getting CA Private Key
[2025-07-10T19:09:10.203823693+0800]: INFO: generated /etc/kubernetes/pki/etcd/server.crt
Signature ok
subject=/CN=etcd-peer
Getting CA Private Key
[2025-07-10T19:09:10.233991908+0800]: INFO: generated /etc/kubernetes/pki/etcd/peer.crt
Signature ok
subject=/O=system:masters/CN=kube-etcd-healthcheck-client
Getting CA Private Key
[2025-07-10T19:09:10.252579614+0800]: INFO: generated /etc/kubernetes/pki/etcd/healthcheck-client.crt
Signature ok
subject=/O=system:masters/CN=kube-apiserver-etcd-client
Getting CA Private Key
[2025-07-10T19:09:10.270662717+0800]: INFO: generated /etc/kubernetes/pki/apiserver-etcd-client.crt
[2025-07-10T19:09:10.361241647+0800]: INFO: restarted etcd
Signature ok
subject=/CN=kube-apiserver
Getting CA Private Key
[2025-07-10T19:09:10.387280029+0800]: INFO: generated /etc/kubernetes/pki/apiserver.crt
Signature ok
subject=/O=system:masters/CN=kube-apiserver-kubelet-client
Getting CA Private Key
[2025-07-10T19:09:10.404954699+0800]: INFO: generated /etc/kubernetes/pki/apiserver-kubelet-client.crt
Signature ok
subject=/CN=system:kube-controller-manager
Getting CA Private Key
[2025-07-10T19:09:10.441468594+0800]: INFO: generated /etc/kubernetes/controller-manager.crt
[2025-07-10T19:09:10.446127295+0800]: INFO: generated new /etc/kubernetes/controller-manager.conf
Signature ok
subject=/CN=system:kube-scheduler
Getting CA Private Key
[2025-07-10T19:09:10.478495262+0800]: INFO: generated /etc/kubernetes/scheduler.crt
[2025-07-10T19:09:10.483909670+0800]: INFO: generated new /etc/kubernetes/scheduler.conf
Signature ok
subject=/O=system:masters/CN=kubernetes-admin
Getting CA Private Key
[2025-07-10T19:09:10.514043835+0800]: INFO: generated /etc/kubernetes/admin.crt
[2025-07-10T19:09:10.519546384+0800]: INFO: generated new /etc/kubernetes/admin.conf
[2025-07-10T19:09:10.526208608+0800]: INFO: copy the admin.conf to ~/.kube/config for kubectl
[2025-07-10T19:09:10.528293082+0800]: WARNING: does not need to update kubelet.conf
Signature ok
subject=/CN=front-proxy-client
Getting CA Private Key
[2025-07-10T19:09:10.545998609+0800]: INFO: generated /etc/kubernetes/pki/front-proxy-client.crt
[2025-07-10T19:09:10.617143451+0800]: INFO: restarted kube-apiserver
[2025-07-10T19:09:10.676738046+0800]: INFO: restarted kube-controller-manager
[2025-07-10T19:09:10.742478810+0800]: INFO: restarted kube-scheduler
[2025-07-10T19:09:10.790700987+0800]: INFO: restarted kubelet
您在 /var/spool/mail/root 中有新邮件
[root@master ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
Not Before: Jul 10 11:09:10 2025 GMT
Not After : Jun 16 11:09:10 2125 GMT
脚本内容如下:引用大佬的脚本
#!/bin/bash
set -o errexit
set -o pipefail
# set -o xtrace
log::err() {
printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[31mERROR: \033[0m$@\n"
}
log::info() {
printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[32mINFO: \033[0m$@\n"
}
log::warning() {
printf "[$(date +'%Y-%m-%dT%H:%M:%S.%N%z')]: \033[33mWARNING: \033[0m$@\n"
}
check_file() {
if [[ ! -r ${1} ]]; then
log::err "can not find ${1}"
exit 1
fi
}
# get x509v3 subject alternative name from the old certificate
cert::get_subject_alt_name() {
local cert=${1}.crt
check_file "${cert}"
local alt_name=$(openssl x509 -text -noout -in ${cert} | grep -A1 'Alternative' | tail -n1 | sed 's/[[:space:]]*Address//g')
printf "${alt_name}\n"
}
# get subject from the old certificate
cert::get_subj() {
local cert=${1}.crt
check_file "${cert}"
local subj=$(openssl x509 -text -noout -in ${cert} | grep "Subject:" | sed 's/Subject:/\//g;s/\,/\//;s/[[:space:]]//g')
printf "${subj}\n"
}
cert::backup_file() {
local file=${1}
if [[ ! -e ${file}.old-$(date +%Y%m%d) ]]; then
cp -rp ${file} ${file}.old-$(date +%Y%m%d)
log::info "backup ${file} to ${file}.old-$(date +%Y%m%d)"
else
log::warning "does not backup, ${file}.old-$(date +%Y%m%d) already exists"
fi
}
# generate certificate whit client, server or peer
# Args:
# $1 (the name of certificate)
# $2 (the type of certificate, must be one of client, server, peer)
# $3 (the subject of certificates)
# $4 (the validity of certificates) (days)
# $5 (the x509v3 subject alternative name of certificate when the type of certificate is server or peer)
cert::gen_cert() {
local cert_name=${1}
local cert_type=${2}
local subj=${3}
local cert_days=${4}
local alt_name=${5}
local cert=${cert_name}.crt
local key=${cert_name}.key
local csr=${cert_name}.csr
local csr_conf="distinguished_name = dn\n[dn]\n[v3_ext]\nkeyUsage = critical, digitalSignature, keyEncipherment\n"
check_file "${key}"
check_file "${cert}"
# backup certificate when certificate not in ${kubeconf_arr[@]}
# kubeconf_arr=("controller-manager.crt" "scheduler.crt" "admin.crt" "kubelet.crt")
# if [[ ! "${kubeconf_arr[@]}" =~ "${cert##*/}" ]]; then
# cert::backup_file "${cert}"
# fi
case "${cert_type}" in
client)
openssl req -new -key ${key} -subj "${subj}" -reqexts v3_ext \
-config <(printf "${csr_conf} extendedKeyUsage = clientAuth\n") -out ${csr}
openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
-extfile <(printf "${csr_conf} extendedKeyUsage = clientAuth\n") -days ${cert_days} -out ${cert}
log::info "generated ${cert}"
;;
server)
openssl req -new -key ${key} -subj "${subj}" -reqexts v3_ext \
-config <(printf "${csr_conf} extendedKeyUsage = serverAuth\nsubjectAltName = ${alt_name}\n") -out ${csr}
openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
-extfile <(printf "${csr_conf} extendedKeyUsage = serverAuth\nsubjectAltName = ${alt_name}\n") -days ${cert_days} -out ${cert}
log::info "generated ${cert}"
;;
peer)
openssl req -new -key ${key} -subj "${subj}" -reqexts v3_ext \
-config <(printf "${csr_conf} extendedKeyUsage = serverAuth, clientAuth\nsubjectAltName = ${alt_name}\n") -out ${csr}
openssl x509 -in ${csr} -req -CA ${CA_CERT} -CAkey ${CA_KEY} -CAcreateserial -extensions v3_ext \
-extfile <(printf "${csr_conf} extendedKeyUsage = serverAuth, clientAuth\nsubjectAltName = ${alt_name}\n") -days ${cert_days} -out ${cert}
log::info "generated ${cert}"
;;
*)
log::err "unknow, unsupported etcd certs type: ${cert_type}, supported type: client, server, peer"
exit 1
esac
rm -f ${csr}
}
cert::update_kubeconf() {
local cert_name=${1}
local kubeconf_file=${cert_name}.conf
local cert=${cert_name}.crt
local key=${cert_name}.key
# generate certificate
check_file ${kubeconf_file}
# get the key from the old kubeconf
grep "client-key-data" ${kubeconf_file} | awk {'print$2'} | base64 -d > ${key}
# get the old certificate from the old kubeconf
grep "client-certificate-data" ${kubeconf_file} | awk {'print$2'} | base64 -d > ${cert}
# get subject from the old certificate
local subj=$(cert::get_subj ${cert_name})
cert::gen_cert "${cert_name}" "client" "${subj}" "${CAER_DAYS}"
# get certificate base64 code
local cert_base64=$(base64 -w 0 ${cert})
# backup kubeconf
# cert::backup_file "${kubeconf_file}"
# set certificate base64 code to kubeconf
sed -i 's/client-certificate-data:.*/client-certificate-data: '${cert_base64}'/g' ${kubeconf_file}
log::info "generated new ${kubeconf_file}"
rm -f ${cert}
rm -f ${key}
# set config for kubectl
if [[ ${cert_name##*/} == "admin" ]]; then
mkdir -p ~/.kube
cp -fp ${kubeconf_file} ~/.kube/config
log::info "copy the admin.conf to ~/.kube/config for kubectl"
fi
}
cert::update_etcd_cert() {
PKI_PATH=${KUBE_PATH}/pki/etcd
CA_CERT=${PKI_PATH}/ca.crt
CA_KEY=${PKI_PATH}/ca.key
check_file "${CA_CERT}"
check_file "${CA_KEY}"
# generate etcd server certificate
# /etc/kubernetes/pki/etcd/server
CART_NAME=${PKI_PATH}/server
subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
cert::gen_cert "${CART_NAME}" "peer" "/CN=etcd-server" "${CAER_DAYS}" "${subject_alt_name}"
# generate etcd peer certificate
# /etc/kubernetes/pki/etcd/peer
CART_NAME=${PKI_PATH}/peer
subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
cert::gen_cert "${CART_NAME}" "peer" "/CN=etcd-peer" "${CAER_DAYS}" "${subject_alt_name}"
# generate etcd healthcheck-client certificate
# /etc/kubernetes/pki/etcd/healthcheck-client
CART_NAME=${PKI_PATH}/healthcheck-client
cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-etcd-healthcheck-client" "${CAER_DAYS}"
# generate apiserver-etcd-client certificate
# /etc/kubernetes/pki/apiserver-etcd-client
check_file "${CA_CERT}"
check_file "${CA_KEY}"
PKI_PATH=${KUBE_PATH}/pki
CART_NAME=${PKI_PATH}/apiserver-etcd-client
cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-apiserver-etcd-client" "${CAER_DAYS}"
# restart etcd
docker ps | awk '/k8s_etcd/{print$1}' | xargs -r -I '{}' docker restart {} || true
log::info "restarted etcd"
}
cert::update_master_cert() {
PKI_PATH=${KUBE_PATH}/pki
CA_CERT=${PKI_PATH}/ca.crt
CA_KEY=${PKI_PATH}/ca.key
check_file "${CA_CERT}"
check_file "${CA_KEY}"
# generate apiserver server certificate
# /etc/kubernetes/pki/apiserver
CART_NAME=${PKI_PATH}/apiserver
subject_alt_name=$(cert::get_subject_alt_name ${CART_NAME})
cert::gen_cert "${CART_NAME}" "server" "/CN=kube-apiserver" "${CAER_DAYS}" "${subject_alt_name}"
# generate apiserver-kubelet-client certificate
# /etc/kubernetes/pki/apiserver-kubelet-client
CART_NAME=${PKI_PATH}/apiserver-kubelet-client
cert::gen_cert "${CART_NAME}" "client" "/O=system:masters/CN=kube-apiserver-kubelet-client" "${CAER_DAYS}"
# generate kubeconf for controller-manager,scheduler,kubectl and kubelet
# /etc/kubernetes/controller-manager,scheduler,admin,kubelet.conf
cert::update_kubeconf "${KUBE_PATH}/controller-manager"
cert::update_kubeconf "${KUBE_PATH}/scheduler"
cert::update_kubeconf "${KUBE_PATH}/admin"
# check kubelet.conf
# https://github.com/kubernetes/kubeadm/issues/1753
set +e
grep kubelet-client-current.pem /etc/kubernetes/kubelet.conf > /dev/null 2>&1
kubelet_cert_auto_update=$?
set -e
if [[ "$kubelet_cert_auto_update" == "0" ]]; then
log::warning "does not need to update kubelet.conf"
else
cert::update_kubeconf "${KUBE_PATH}/kubelet"
fi
# generate front-proxy-client certificate
# use front-proxy-client ca
CA_CERT=${PKI_PATH}/front-proxy-ca.crt
CA_KEY=${PKI_PATH}/front-proxy-ca.key
check_file "${CA_CERT}"
check_file "${CA_KEY}"
CART_NAME=${PKI_PATH}/front-proxy-client
cert::gen_cert "${CART_NAME}" "client" "/CN=front-proxy-client" "${CAER_DAYS}"
# restart apiserve, controller-manager, scheduler and kubelet
docker ps | awk '/k8s_kube-apiserver/{print$1}' | xargs -r -I '{}' docker restart {} || true
log::info "restarted kube-apiserver"
docker ps | awk '/k8s_kube-controller-manager/{print$1}' | xargs -r -I '{}' docker restart {} || true
log::info "restarted kube-controller-manager"
docker ps | awk '/k8s_kube-scheduler/{print$1}' | xargs -r -I '{}' docker restart {} || true
log::info "restarted kube-scheduler"
systemctl restart kubelet
log::info "restarted kubelet"
}
main() {
local node_tpye=$1
KUBE_PATH=/etc/kubernetes
CAER_DAYS=36500
# backup $KUBE_PATH to $KUBE_PATH.old-$(date +%Y%m%d)
cert::backup_file "${KUBE_PATH}"
case ${node_tpye} in
etcd)
# update etcd certificates
cert::update_etcd_cert
;;
master)
# update master certificates and kubeconf
cert::update_master_cert
;;
all)
# update etcd certificates
cert::update_etcd_cert
# update master certificates and kubeconf
cert::update_master_cert
;;
*)
log::err "unknow, unsupported certs type: ${cert_type}, supported type: all, etcd, master"
printf "Documentation: https://github.com/yuyicai/update-kube-cert
example:
'\033[32m./update-kubeadm-cert.sh all\033[0m' update all etcd certificates, master certificates and kubeconf
/etc/kubernetes
├── admin.conf
├── controller-manager.conf
├── scheduler.conf
├── kubelet.conf
└── pki
├── apiserver.crt
├── apiserver-etcd-client.crt
├── apiserver-kubelet-client.crt
├── front-proxy-client.crt
└── etcd
├── healthcheck-client.crt
├── peer.crt
└── server.crt
'\033[32m./update-kubeadm-cert.sh etcd\033[0m' update only etcd certificates
/etc/kubernetes
└── pki
├── apiserver-etcd-client.crt
└── etcd
├── healthcheck-client.crt
├── peer.crt
└── server.crt
'\033[32m./update-kubeadm-cert.sh master\033[0m' update only master certificates and kubeconf
/etc/kubernetes
├── admin.conf
├── controller-manager.conf
├── scheduler.conf
├── kubelet.conf
└── pki
├── apiserver.crt
├── apiserver-kubelet-client.crt
└── front-proxy-client.crt
"
exit 1
esac
}
main "$@"
4、继续get资源发现新的报错
错误信息
错误信息:couldn't get current server API group list: the server has asked for the client to provide credentials
错误日志信息:
错误信息:unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
kubelet程序auto-restart状态导致通信异常,报错日志显示配置文件不存在,顺带查一下kubelet证书时间
[root@node1 pki]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate
notAfter=Apr 15 05:29:49 2024 GMT
实锤kubelet证书过期导致异常
替换证书处理:
1、主节点新生成证书
[root@master ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
error: You must be logged in to the server (the server has asked for the client to provide credentials)
[root@master ~]# mkdir test
您在 /var/spool/mail/root 中有新邮件
[root@master ~]# kubeadm init --kubernetes-version=v1.27.1 phase kubeconfig kubelet --node-name node1 --kubeconfig-dir ./test/
[kubeconfig] Writing "kubelet.conf" kubeconfig file
主节点和node节点都进行操作(都可以先查下证书时间,过期的都换)
[root@master ~]# scp test/kubelet.conf node1:/etc/kubernetes/
The authenticity of host 'node1 (192.168.121.142)' can't be established.
ECDSA key fingerprint is SHA256:RW+UCKRCkn+EBytxKm4Y+PA7z8YXjd7EMlUifsvljhI.
ECDSA key fingerprint is MD5:6f:97:64:8b:4c:5a:b0:33:1a:4e:95:ef:e3:a1:75:61.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.121.142' (ECDSA) to the list of known hosts.
root@node1's password:
kubelet.conf
[root@master ~]# scp test/kubelet.conf /etc/kubernetes/
[root@master ~]# cd /etc/kubernetes
[root@master kubernetes]# ll
总用量 36
-rw------- 1 root root 5511 7月 10 19:09 admin.conf
-rw------- 1 root root 5551 7月 10 19:09 controller-manager.conf
-rw------- 1 root root 5595 7月 10 19:27 kubelet.conf
drwxr-xr-x. 2 root root 113 4月 16 2023 manifests
drwxr-xr-x 3 root root 4096 7月 10 19:09 pki
-rw------- 1 root root 5499 7月 10 19:09 scheduler.conf
[root@master kubernetes]# systemctl restart kubelet
5、获取namespace资源异常
错误信息:
Error from server (Forbidden): namespaces is forbidden: User "system:node:node1" cannot list resource "namespaces" in API group "" at the cluster scope
[root@master kubernetes]# kubectl get ns -A
Error from server (Forbidden): namespaces is forbidden: User "system:node:node1" cannot list resource "namespaces" in API group "" at the cluster scope
[root@master ~]# kubectl get rolebinding,clusterrolebinding -A
Error from server (Forbidden): rolebindings.rbac.authorization.k8s.io is forbidden: User "system:node:node1" cannot list resource "rolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:node:node1" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
解决方法:
[root@master kubernetes]# mkdir -p $HOME/.kube
[root@master kubernetes]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y
[root@master kubernetes]#
[root@master kubernetes]#
[root@master kubernetes]#
[root@master kubernetes]# export KUBECONFIG=/etc/kubernetes/admin.conf