ceph (luminous 版) 故障修复记录

本文记录了在Ceph Luminous版本中遇到的故障修复过程,包括直接启动Ceph OSD、系统文件修复、异常处理以及如何强制执行XFS修复和重建journal data的详细步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前提:

由于存储掉电, 导致启动后无法启动 OSD
利用 ceph osd tree 可以发现故障 OSD 位置

当前架构

2 sata 磁盘创建 raid1 作为系统盘
10 sata 磁盘,  每个创建独立的 raid0 ,  作为独立 ceph osd 磁盘使用
2 ssd 磁盘 ( 每个 ssd 独立划分 5 分区 ) 每个磁盘分区对应一个 ceph osd 作为 raw journal device 使用( 注, 每个分区都没有文件系统 )

修复思路

1. 尝试启动 ceph osd  
2. 假如无法启动,  则尝试修复系统文件 (xfs filesystem)
3. 假如文件系统修复后也无法启动,  尝试修复 journal data

记录

直接启动 ceph osd

[root@ns-ceph-208191 ~]# systemctl status ceph-osd@104
● ceph-osd@107.service - Ceph object storage daemon osd.104
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago
  Process: 13464 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=dumped, signal=ABRT)
  Process: 13458 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 13464 (code=dumped, signal=ABRT)

Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: Unit ceph-osd@104.service entered failed state.
Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: ceph-osd@104.service failed.

总结

发现 osd 一直处于 activing 状态, 参考下面提示
    Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago  
ceph osd 进程不断重启并尝试启动, 但始终无法正常启动

进行系统文件修复

异常处理

[root@ns-ceph-208191 ~]# umount /dev/sdh1
[root@ns-ceph-208191 ~]# xfs_repair  /dev/sdh1
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed.  Mount the filesystem to replay the log, and unmount it before re-running xfs_repair.  If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

上面命令发现, 无法直接对文件系统执行修复操作, 则需要使用 -L 参数强制重写

[root@ns-ceph-208191 ~]# xfs_repair -L /dev/sdh1
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
block (20,3439872-3439872) multiply claimed by cnt space tree, state - 2
block (30,3786168-3786168) multiply claimed by cnt space tree, state - 2
agf_freeblks 29573933, counted 29572605 in ag 29
block (30,3866568-3866568) multiply claimed by cnt space tree, state - 2
agf_freeblks 28607146, counted 28612138 in ag 20
agf_freeblks 32093411, counted 32091874 in ag 23
agf_freeblks 25081233, counted 25081039 in ag 10
agf_freeblks 24764670, counted 24764542 in ag 6
block (30,30832461-30832461) multiply claimed by cnt space tree, state - 2
agf_freeblks 26690466, counted 26697464 in ag 30
agf_longest 14937600, counted 14945600 in ag 30
sb_icount 576256, counted 576768
sb_ifree 1085, counted 1269
sb_fdblocks 915177700, counted 914722481
        - 22:39:16: scanning filesystem freespace - 32 of 32 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 22:39:16: scanning agi unl
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Terry_Tsang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值