当集群某个osd节点启动失败,产生莫名奇妙的错,且一直不能启动成功时,我们需要重建此节点重新加入rookceph集群。
我的报错如下,osd pod的expand-blusfs容器日志:
NCB::__restore_allocator::No Valid allocation info on disk (empty file)
首先检查磁盘是否正常:
# 假设失败osd节点对应磁盘nvme4n1
smartctl -a /dev/nvme4n1
# 返回结果:PASSED代表磁盘正常
...
SMART overall-health self-assessment test result: PASSED
...
-
如果磁盘正常,可能由于osd对应宿主机某些配置缺失或其它原因导致osd不能启动成功,此时就可以重建此osd节点重新加入集群。
-
如果磁盘不正常,那就需要换盘后重新将osd节点重新加入集群,下面方法同样适用。
重建rook-ceph(1.12版本)某个osd节点过程:
-
前置说明
重建或者替换osd前,先确保集群的空间是足够的,空间不够时需要提前添加一个osd节点到集群中,防止下osd后,ceph数据自动平衡,而此时集群空间不足,导致ceph集群出现更多的报错。
-
操作
假设失败osd的id:1
# 关闭operator防止其自动检测节点并不断自动启动失败osd kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0 # 关闭osd1 pod kubectl -n rook-ceph scale deployment rook-ceph-osd-1 --replicas=0 # 删除osd1 pod kubectl delete deployment -n rook-ceph rook-ceph-osd-1 # 启动job自动清理osd,注意修改osd-purge.yaml中失败osd节点<OSD-IDs>改为1 kubectl apply -f osd-purge-osd1.yaml # job运行的同时,手动下线失败osd节点,例如失败节点id为1 kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash ceph osd out osd.1 osd purge 1 --yes-i-really-mean-it # 清理失败osd对应宿主机上的ceph配置并清理osd磁盘,使其成为新的裸盘(示例为:nvme4n1) ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove % wipefs --all /dev/nvme4n1 mkfs.xfs /dev/nvme4n1 -f sgdisk --zap-all /dev/nvme4n1 rm -rf /var/lib/rook /dev/ceph-* /dev/mapper/ceph--* # 启动operator使其重新识别osd1节点,耐心等待(大概5-10分钟,具体看轮询节点时间配置)。 kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1 # 重建过程中如果遇到rookceph某些pod异常(例如对应重建节点mon启动失败),删除使其重建即可 # 最终pod全部正常 NAME READY STATUS RESTARTS AGE csi-cephfsplugin-27nsq 2/2 Running 0 82m csi-cephfsplugin-fncj6 2/2 Running 0 82m csi-cephfsplugin-mgwsj 2/2 Running 0 82m csi-cephfsplugin-provisioner-7df6fdbbcf-6jpw8 5/5 Running 0 82m csi-cephfsplugin-provisioner-7df6fdbbcf-8xjsq 5/5 Running 0 82m csi-rbdplugin-b969d 2/2 Running 0 82m csi-rbdplugin-provisioner-7d6b6899cb-wtvxw 5/5 Running 0 82m csi-rbdplugin-provisioner-7d6b6899cb-zcxcm 5/5 Running 0 82m csi-rbdplugin-t6cn9 2/2 Running 0 82m csi-rbdplugin-w5fvh 2/2 Running 0 79m rook-ceph-crashcollector-hd1.dev.local-74cbdc8dfb-2zkj9 1/1 Running 0 97d rook-ceph-crashcollector-hd2.dev.local-7c9bcf9d69-l8q5s 1/1 Running 1 (28d ago) 70d rook-ceph-crashcollector-hd3.dev.local-598f8f78f8-jqr2z 1/1 Running 0 153m rook-ceph-mds-myfs-a-bbcf98dd5-tnvlx 2/2 Running 0 153m rook-ceph-mds-myfs-b-59c446df67-7vtlm 2/2 Running 4 (160d ago) 160d rook-ceph-mgr-a-88b95766b-tb4vz 3/3 Running 4 (160d ago) 160d rook-ceph-mgr-b-75c7f6bfb5-l8sjz 3/3 Running 0 153m rook-ceph-mon-d-6c8b886c-tpzfb 2/2 Running 0 160d rook-ceph-mon-g-5fddb59f46-msd2h 2/2 Running 0 34m rook-ceph-mon-i-847cdb7bbf-4vsvs 2/2 Running 0 113m rook-ceph-operator-75f86557fd-k84mt 1/1 Running 0 37m rook-ceph-osd-0-6fd89755f4-wj5mq 2/2 Running 3 (133m ago) 160d rook-ceph-osd-1-6d6bfdc9b7-p8rvf 2/2 Running 0 32m rook-ceph-osd-2-576dd7f99f-l22mc 2/2 Running 5705 (3h47m ago) 161d rook-ceph-osd-prepare-hd1.dev.local-4v566 0/1 Completed 0 32m rook-ceph-osd-prepare-hd2.dev.local-7t8w8 0/1 Completed 0 32m rook-ceph-osd-prepare-hd3.dev.local-bq7ks 0/1 Completed 0 32m rook-ceph-purge-osd-b4j7l 0/1 Completed 0 52m rook-ceph-rgw-my-store-a-69b5c788dd-gml4g 2/2 Running 18149 (3h4m ago) 70d rook-ceph-tools-5d89d79b77-drsr5 1/1 Running 0 70d # pod正常后,由于是新加osd节点,rookceph会自动平衡数据到新的节点中,这个过程要耐心等待(根据集群数据量和磁盘性能可能n小时)
-
备注
ceph常用基础命令
ceph status ceph osd tree ceph osd status ceph osd df ceph osd utilization
参考:
https://rook.io/docs/rook/v1.12/Storage-Configuration/Advanced/ceph-osd-mgmt/