前提:
由于存储掉电, 导致启动后无法启动 OSD
利用 ceph osd tree 可以发现故障 OSD 位置
当前架构
2 sata 磁盘创建 raid1 作为系统盘
10 sata 磁盘, 每个创建独立的 raid0 , 作为独立 ceph osd 磁盘使用
2 ssd 磁盘 ( 每个 ssd 独立划分 5 分区 ) 每个磁盘分区对应一个 ceph osd 作为 raw journal device 使用( 注, 每个分区都没有文件系统 )
修复思路
1. 尝试启动 ceph osd
2. 假如无法启动, 则尝试修复系统文件 (xfs filesystem)
3. 假如文件系统修复后也无法启动, 尝试修复 journal data
记录
直接启动 ceph osd
[root@ns-ceph-208191 ~]# systemctl status ceph-osd@104
● ceph-osd@107.service - Ceph object storage daemon osd.104
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago
Process: 13464 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=dumped, signal=ABRT)
Process: 13458 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 13464 (code=dumped, signal=ABRT)
Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: Unit ceph-osd@104.service entered failed state.
Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: ceph-osd@104.service failed.
总结
发现 osd 一直处于 activing 状态, 参考下面提示
Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago
ceph osd 进程不断重启并尝试启动, 但始终无法正常启动
进行系统文件修复
异常处理
[root@ns-ceph-208191 ~]# umount /dev/sdh1
[root@ns-ceph-208191 ~]# xfs_repair /dev/sdh1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
上面命令发现, 无法直接对文件系统执行修复操作, 则需要使用 -L 参数强制重写
[root@ns-ceph-208191 ~]# xfs_repair -L /dev/sdh1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
block (20,3439872-3439872) multiply claimed by cnt space tree, state - 2
block (30,3786168-3786168) multiply claimed by cnt space tree, state - 2
agf_freeblks 29573933, counted 29572605 in ag 29
block (30,3866568-3866568) multiply claimed by cnt space tree, state - 2
agf_freeblks 28607146, counted 28612138 in ag 20
agf_freeblks 32093411, counted 32091874 in ag 23
agf_freeblks 25081233, counted 25081039 in ag 10
agf_freeblks 24764670, counted 24764542 in ag 6
block (30,30832461-30832461) multiply claimed by cnt space tree, state - 2
agf_freeblks 26690466, counted 26697464 in ag 30
agf_longest 14937600, counted 14945600 in ag 30
sb_icount 576256, counted 576768
sb_ifree 1085, counted 1269
sb_fdblocks 915177700, counted 914722481
- 22:39:16: scanning filesystem freespace - 32 of 32 allocation groups done
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- 22:39:16: scanning agi unl