Ceph集群磁盘没有剩余空间的解决方法

故障描述

OpenStack + Ceph集群在使用过程中,由于虚拟机拷入大量新的数据,导致集群的磁盘迅速消耗,没有空余空间,虚拟机无法操作,Ceph集群所有操作都无法执行。

故障现象

  • 尝试使用OpenStack重启虚拟机无效
  • 尝试直接用rbd命令直接删除块失败
[root@controller ~]# rbd -p volumes rm volume-c55fd052-212d-4107-a2ac-cf53bfc049be
2015-04-29 05:31:31.719478 7f5fb82f7760  0 client.4781741.objecter  FULL, paused modify 0xe9a9e0 tid 6
  • 查看ceph健康状态
cluster 059f27e8-a23f-4587-9033-3e3679d03b31
 health HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs stuck unclean; recovery 7482/129081 objects degraded (5.796%); 2 full osd(s); 1 near full osd(s)
 monmap e6: 4 mons at {node-5e40.cloud.com=10.10.20.40:6789/0,node-6670.cloud.com=10.10.20.31:6789/0,node-66c4.cloud.com=10.10.20.36:6789/0,node-fb27.cloud.com=10.10.20.41:6789/0}, election epoch 886, quorum 0,1,2,3 node-6670.cloud.com,node-66c4.cloud.com,node-5e40.cloud.com,node-fb27.cloud.com
 osdmap e2743: 3 osds: 3 up, 3 in
        flags full
  pgmap v6564199: 320 pgs, 4 pools, 262 GB data, 43027 objects
        786 GB used, 47785 MB / 833 GB avail
        7482/129081 objects degraded (5.796%)
             300 active+clean
              20 active+degraded+remapped+backfill_toofull
HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs stuck unclean; recovery 7482/129081 objects degraded (5.796%); 2 full osd(s); 1 near full osd(s)
pg 3.8 is stuck unclean for 7067109.597691, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.7d is stuck unclean for 1852078.505139, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.21 is stuck unclean for 7072842.637848, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.22 is stuck unclean for 7070880.213397, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.a is stuck unclean for 7067057.863562, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.7f is stuck unclean for 7067122.493746, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.5 is stuck unclean for 7067088.369629, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.1e is stuck unclean for 7073386.246281, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.19 is stuck unclean for 7068035.310269, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.5d is stuck unclean for 1852078.505949, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.1a is stuck unclean for 7067088.429544, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.1b is stuck unclean for 7072773.771385, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.3 is stuck unclean for 7067057.864514, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.15 is stuck unclean for 7067088.825483, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.11 is stuck unclean for 7067057.862408, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.6d is stuck unclean for 7067083.634454, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.6e is stuck unclean for 7067098.452576, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.c is stuck unclean for 5658116.678331, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.e is stuck unclean for 7067078.646953, current state active+degraded+remapped+backfill_toofull, last acting [2,0]
pg 3.20 is stuck unclean for 7067140.530849, current state active+degraded+remapped+backfill_toofull, last acting [0,2]
pg 3.7d is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.7f is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.6d is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.6e is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.5d is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.20 is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.21 is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.22 is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.1e is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.19 is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.1a is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.1b is active+degraded+remapped+backfill_toofull, acting [0,2]
pg 3.15 is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.11 is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.c is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.e is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.8 is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.a is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.5 is active+degraded+remapped+backfill_toofull, acting [2,0]
pg 3.3 is active+degraded+remapped+backfill_toofull, acting [2,0]
recovery 7482/129081 objects degraded (5.796%)
osd.0 is full at 95%
osd.2 is full at 95%
osd.1 is near full at 93%

解决方案一(已验证)

增加OSD节点,这也是官方文档中推荐的做法,增加新的节点后,Ceph开始重新平衡数据,OSD使用空间开始下降

2015-04-29 06:51:58.623262 osd.1 [WRN] OSD near full (91%)
2015-04-29 06:52:01.500813 osd.2 [WRN] OSD near full (92%)

解决方案二(理论上,没有进行验证)

如果在没有新的硬盘的情况下,只能采用另外一种方式。在当前状态下,Ceph不允许任何的读写操作,所以此时任何的Ceph命令都不好使,解决的方案就是尝试降低Ceph对于full的比例定义,我们从上面的日志中可以看到Ceph的full的比例为95%,我们需要做的就是提高full的比例,之后尽快尝试删除数据,将比例下降。

  • 尝试直接用命令设置,但是失败了,Ceph集群并没有重新同步数据,怀疑可能仍然需要重启服务本身
ceph mon tell \* injectargs '--mon-osd-full-ratio 0.98'
  • 修改配置文件,之后重启monitor服务,但是担心出问题,所以没有敢尝试该方法,后续经过在邮件列表确认,该方法应该不会对数据产生影响,但是前提是在恢复期间,所有的虚拟机不要向Ceph再写入任何数据。

默认情况下full的比例是95%,而near full的比例是85%,所以需要根据实际情况对该配置进行调整。

[global]
    mon osd full ratio = .98
    mon osd nearfull ratio = .80

分析总结

原因

根据Ceph官方文档中的描述,当一个OSD full比例达到95%时,集群将不接受任何Ceph Client端的读写数据的请求。所以导致虚拟机在重启时,无法启动的情况。

解决方法

从官方的推荐来看,应该比较支持添加新的OSD的方式,当然临时的提高比例是一个解决方案,但是并不推荐,因为需要手动的删除数据去解决,而且一旦再有一个新的节点出现故障,仍然会出现比例变满的状况,所以解决之道最好是扩容。

思考

在这次故障过程中,有两点是值得思考的:

  • 监控:由于当时服务器在配置过程中DNS配置错误,导致监控邮件无法正常发出,从而没有收到Ceph WARN的提示信息
  • 云平台本身: 由于Ceph的机制,在OpenStack平台中分配中,大多时候是超分的,从用户角度看,拷贝大量数据的行为并没有不妥之处,但是由于云平台并没有相应的预警机制,导致了该问题的发生

参考文档

<think>好的,用户现在需要清理虚拟机磁盘并重新部署Rook Ceph集群。首先,我需要理解用户的具体场景。用户可能是在测试或者生产环境中遇到了严重的问题,导致现有的Ceph集群无法正常使用,需要彻底清理然后重新部署。这可能是因为之前的部署出现了不可恢复的错误,或者用户想要重置环境进行新的配置。 接下来,我需要考虑用户的需求。用户提到的是“清理虚拟机磁盘”,这意味着他们可能是在虚拟机环境下运行Rook Ceph,可能需要确保所有相关的数据都被彻底删除,避免残留数据影响新集群的部署。重新部署Rook Ceph集群的步骤需要详细且准确,避免遗漏任何关键步骤,否则可能导致新的部署失败。 然后,我要分析可能的深层需求。用户可能不仅需要步骤,还需要确保清理过程彻底,包括Kubernetes资源、Ceph的配置和数据,以及虚拟机磁盘的残留文件。此外,用户可能在之前的部署中遇到了问题,希望通过清理和重新部署来解决问题,因此可能需要一些预防措施或最佳实践,避免同样的问题再次发生。 现在,我需要整理清理和重新部署的步骤。首先,清理现有的Rook Ceph集群,这包括删除CRD、命名空间、存储类和相关数据目录。然后,需要清理虚拟机磁盘,使用dd命令或shred工具确保数据不可恢复。接着,重新部署Rook Ceph集群需要从官方获取部署文件,进行必要的配置调整,并验证部署是否成功。 需要注意的地方包括:确保所有相关的Kubernetes资源都被删除,避免残留资源导致冲突;清理数据目录时,确认路径是否正确,尤其是如果用户自定义了数据存储位置;在重新部署时,可能需要根据虚拟机环境调整配置,例如存储设备路径或资源分配。 用户可能遇到的问题包括:清理不彻底导致新集群部署失败,或者在虚拟机环境中磁盘设备名称变化导致配置错误。因此,在步骤中需要强调检查磁盘设备名称,确认数据目录清理,以及使用正确的部署配置文件。 最后,需要提供验证步骤,确保新集群正常运行,以及预防措施,比如备份配置、资源限制和监控设置,帮助用户避免未来出现问题。同时,解释清理的重要性,比如残留数据可能导致数据不一致或启动失败,增强用户的理解。</think>以下是清理虚拟机磁盘并重新部署Rook Ceph集群的完整操作指南,分为**环境清理**和**重建部署**两大阶段: --- ### 一、环境清理阶段 #### 步骤1:销毁现有Ceph集群 ```bash # 删除CephCluster CRD(触发清理流程) kubectl -n rook-ceph patch cephcluster rook-ceph -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}' --type merge # 等待所有资源自动清理完成(约5-10分钟) watch kubectl -n rook-ceph get all ``` #### 步骤2:移除Rook相关Kubernetes对象 ```bash # 删除Rook命名空间及关联资源 kubectl delete ns rook-ceph # 清理CRD(关键!否则重建时会继承旧配置) kubectl delete crd cephclusters.ceph.rook.io kubectl delete crd cephblockpools.ceph.rook.io kubectl delete crd cephfilesystems.ceph.rook.io ``` #### 步骤3:虚拟机磁盘深度清理 ```bash # 确认磁盘设备路径(通常为/dev/sdb、/dev/vdb等) lsblk -f # 擦除磁盘头部元数据(对所有OSD磁盘执行) sudo dd if=/dev/zero of=/dev/{device} bs=1M count=100 status=progress # 彻底清理文件系统签名(可选) sudo wipefs -a /dev/{device} # 如果使用LVM: sudo vgremove -y ceph-$(uuidgen) sudo pvremove /dev/{device} ``` --- ### 二、重建部署阶段 #### 步骤1:准备基础环境 ```bash # 创建新命名空间 kubectl create ns rook-ceph # 下载最新部署文件 wget https://github.com/rook/rook/raw/master/deploy/examples/crds.yaml wget https://github.com/rook/rook/raw/master/deploy/examples/common.yaml wget https://github.com/rook/rook/raw/master/deploy/examples/operator.yaml wget https://github.com/rook/rook/raw/master/deploy/examples/cluster.yaml ``` #### 步骤2:调整关键配置(编辑cluster.yaml) ```yaml apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph spec: dataDirHostPath: /var/lib/rook mon: count: 3 # 必须为奇数 allowMultiplePerNode: false storage: useAllNodes: false nodes: - name: "node1" # 节点名称需匹配k8s节点名 devices: - name: "sdb" # 确认虚拟机磁盘设备名 - name: "sdc" config: osdsPerDevice: "1" # 每磁盘创建1个OSD ``` #### 步骤3:执行部署 ```bash # 应用CRD和Operator kubectl apply -f crds.yaml -f common.yaml -f operator.yaml # 等待Operator就绪(约2分钟) kubectl -n rook-ceph rollout status deploy/rook-ceph-operator # 部署Ceph集群 kubectl apply -f cluster.yaml ``` --- ### 三、验证阶段 #### 健康检查 ```bash # 等待所有Pod进入Running状态(约10-15分钟) watch kubectl -n rook-ceph get pods # 通过工具容器验证集群状态 kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s ``` 期望输出: ```log health: HEALTH_OK mon: 3 daemons, quorum a,b,c osd: 6 osds: 6 up (since 2m), 6 in ``` --- ### 四、关键注意事项 1. **磁盘残留数据风险**: - 若未彻底清理旧磁盘数据,新集群启动时可能出现`bluestore meta`校验失败 - 解决方案:使用`ceph-volume lvm zap`彻底擦除: ```bash ceph-volume lvm zap --destroy /dev/{device} ``` 2. **虚拟机磁盘分配建议**: - 每个OSD磁盘建议预留20%空间作为缓冲 - 使用直通模式(Passthrough)而非虚拟磁盘文件,提升性能 3. **部署优化参数**: ```yaml # cluster.yaml 追加配置 cephVersion: allowUnsupported: false skipUpgradeChecks: false continueUpgradeAfterChecksEvenIfNotHealthy: false ``` --- ### 五、故障排除 #### 常见问题1:OSD无法启动 ```log error: failed to create OSD: failed to configure device: failed to initialize device sdb: failed to create new OSD: failed to get monitor info: failed to fetch mon config ``` **修复步骤**: ```bash # 检查mon Secret是否存在 kubectl -n rook-ceph get secret rook-ceph-mon # 若缺失,手动生成monmap kubectl -n rook-ceph exec deploy/rook-ceph-mon-a -- ceph-mon --mkfs -i a --monmap /tmp/monmap --keyring /etc/ceph/keyring.mon ``` #### 常见问题2:仲裁丢失 ```log mon: clock skew detected ``` **解决方案**: ```bash # 所有节点同步NTP sudo chronyc makestep ``` --- 通过以上步骤可确保虚拟机环境下Rook Ceph集群的干净重建。建议在清理前备份`/var/lib/rook`目录(若有重要数据),并验证虚拟机存储配置是否满足Ceph的[官方硬件要求](https://docs.ceph.com/en/latest/start/hardware-recommendations/)。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

老孙正经胡说

创作不易,感谢您的关注

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值