前提:
由于存储掉电, 导致启动后无法启动 OSD
利用 ceph osd tree 可以发现故障 OSD 位置
当前架构
2 sata 磁盘创建 raid1 作为系统盘
10 sata 磁盘, 每个创建独立的 raid0 , 作为独立 ceph osd 磁盘使用
2 ssd 磁盘 ( 每个 ssd 独立划分 5 分区 ) 每个磁盘分区对应一个 ceph osd 作为 raw journal device 使用( 注, 每个分区都没有文件系统 )
修复思路
1. 尝试启动 ceph osd
2. 假如无法启动, 则尝试修复系统文件 (xfs filesystem)
3. 假如文件系统修复后也无法启动, 尝试修复 journal data
记录
直接启动 ceph osd
[root@ns-ceph-208191 ~]# systemctl status ceph-osd@104
● ceph-osd@107.service - Ceph object storage daemon osd.104
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago
Process: 13464 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=dumped, signal=ABRT)
Process: 13458 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 13464 (code=dumped, signal=ABRT)
Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: Unit ceph-osd@104.service entered failed state.
Jan 26 22:58:56 ns-ceph-208191.vclound.com systemd[1]: ceph-osd@104.service failed.
总结
发现 osd 一直处于 activing 状态, 参考下面提示
Active: activating (auto-restart) (Result: core-dump) since Fri 2018-01-26 22:58:56 CST; 1s ago
ceph osd 进程不断重启并尝试启动, 但始终无法正常启动
进行系统文件修复
异常处理
[root@ns-ceph-208191 ~]# umount /dev/sdh1
[root@ns-ceph-208191 ~]# xfs_repair /dev/sdh1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
上面命令发现, 无法直接对文件系统执行修复操作, 则需要使用 -L 参数强制重写
[root@ns-ceph-208191 ~]# xfs_repair -L /dev/sdh1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
block (20,3439872-3439872) multiply claimed by cnt space tree, state - 2
block (30,3786168-3786168) multiply claimed by cnt space tree, state - 2
agf_freeblks 29573933, counted 29572605 in ag 29
block (30,3866568-3866568) multiply claimed by cnt space tree, state - 2
agf_freeblks 28607146, counted 28612138 in ag 20
agf_freeblks 32093411, counted 32091874 in ag 23
agf_freeblks 25081233, counted 25081039 in ag 10
agf_freeblks 24764670, counted 24764542 in ag 6
block (30,30832461-30832461) multiply claimed by cnt space tree, state - 2
agf_freeblks 26690466, counted 26697464 in ag 30
agf_longest 14937600, counted 14945600 in ag 30
sb_icount 576256, counted 576768
sb_ifree 1085, counted 1269
sb_fdblocks 915177700, counted 914722481
- 22:39:16: scanning filesystem freespace - 32 of 32 allocation groups done
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- 22:39:16: scanning agi unlinked lists - 32 of 32 allocation groups done
- process known inodes and perform inode discovery...
- agno = 0
- agno = 15
- agno = 30
- agno = 16
- agno = 1
- agno = 17
data fork in ino 8068204063 claims free block 2044093696
data fork in ino 8068204064 claims free block 2044095680
data fork in ino 8068204077 claims free block 2017052059
- agno = 31
data fork in ino 4577227304 claims free block 1166820864
data fork in ino 4577227304 claims free block 1166820865
data fork in ino 4577227307 claims free block 1166825856
data fork in ino 4577227307 claims free block 1166825857
data fork in ino 4577227309 claims free block 1144323341
data fork in ino 4577227309 claims free block 1144323342
data fork in ino 4577227311 claims free block 1144331906
data fork in ino 4577227311 claims free block 1144331907
imap claims a free inode 4577227312 is in use, correcting imap and clearing inode
cleared inode 4577227312
imap claims a free inode 4577227313 is in use, correcting imap and clearing inode
cleared inode 4577227313
imap claims a free inode 4577227314 is in use, correcting imap and clearing inode
cleared inode 4577227314
- agno = 18
imap claims a free inode 282586124 is in use, correcting imap and clearing inode
cleared inode 282586124
imap claims a free inode 282586125 is in use, correcting imap and clearing inode
cleared inode 282586125
imap claims a free inode 282586126 is in use, correcting imap and clearing inode
cleared inode 282586126
- agno = 2
- agno = 19
- agno = 20
- agno = 21
imap claims a free inode 8333390396 is in use, correcting imap and clearing inode
cleared inode 8333390396
imap claims a free inode 8333390397 is in use, correcting imap and clearing inode
cleared inode 8333390397
imap claims a free inode 8333390398 is in use, correcting imap and clearing inode
cleared inode 8333390398
- agno = 3
- agno = 22
- agno = 4
- agno = 23
imap claims a free inode 1088498745 is in use, correcting imap and clearing inode
cleared inode 1088498745
imap claims a free inode 1088498746 is in use, correcting imap and clearing inode
cleared inode 1088498746
imap claims a free inode 1088498747 is in use, correcting imap and clearing inode
cleared inode 1088498747
- agno = 5
- agno = 6
data fork in ino 6186232088 claims free block 1565314240
data fork in ino 6186232088 claims free block 1565314241
data fork in ino 6186232090 claims free block 1565316224
data fork in ino 6186232090 claims free block 1565316225
- agno = 24
data fork in ino 1627270445 claims free block 437061696
data fork in ino 1627270445 claims free block 437061697
data fork in ino 1627270447 claims free block 437065482
imap claims a free inode 1627270453 is in use, correcting imap and clearing inode
cleared inode 1627270453
imap claims a free inode 1627270454 is in use, correcting imap and clearing inode
cleared inode 1627270454
- agno = 7
data fork in ino 1892223262 claims free block 495111424
data fork in ino 1892223262 claims free block 495111425
data fork in ino 1892223263 claims free block 495113408
data fork in ino 1892223263 claims free block 495113409
data fork in ino 1892223266 claims free block 495117376
data fork in ino 1892223266 claims free block 495117377
data fork in ino 1892223267 claims free block 495119360
data fork in ino 1892223267 claims free block 495119361
- agno = 8
- agno = 25
- agno = 9
- agno = 26
imap claims a free inode 6993639185 is in use, correcting imap and clearing inode
cleared inode 6993639185
- agno = 27
- agno = 10
- agno = 28
data fork in ino 2698993982 claims free block 702052947
- agno = 11
imap claims a free inode 7531901467 is in use, correcting imap and clearing inode
cleared inode 7531901467
imap claims a free inode 7531901468 is in use, correcting imap and clearing inode
cleared inode 7531901468
imap claims a free inode 7531901469 is in use, correcting imap and clearing inode
cleared inode 7531901469
- agno = 12
- agno = 29
- agno = 13
- agno = 14
- 22:40:39: process known inodes and inode discovery - 576768 of 576256 inodes done
- process newly discovered inodes...
- 22:40:39: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- 22:40:39: setting up duplicate extent list - 32 of 32 allocation groups done
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 9
- agno = 13
- agno = 15
- agno = 18
- agno = 23
- agno = 26
- agno = 28
- agno = 6
- agno = 10
- agno = 11
- agno = 12
- agno = 14
- agno = 3
- agno = 4
- agno = 16
- agno = 17
- agno = 5
- agno = 21
- agno = 22
- agno = 24
- agno = 7
- agno = 25
- agno = 8
- agno = 20
- agno = 27
- agno = 19
- agno = 1
- agno = 29
- agno = 30
- agno = 31
- 22:40:39: check for inodes claiming duplicate blocks - 576768 of 576256 inodes done
Phase 5 - rebuild AG headers and trees...
- 22:40:40: rebuild AG headers and trees - 32 of 32 allocation groups done
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
fixing ftype mismatch (2/1) in directory/child inode 5637169961/7798299966
entry "DIR_C" in dir 7785023237 points to an already connected directory inode 7798299957
rebuilding directory inode 7785023237
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 1074935066, moving to lost+found
disconnected inode 1075069725, moving to lost+found
disconnected inode 1075364134, moving to lost+found
disconnected inode 1627270452, moving to lost+found
disconnected inode 5638039565, moving to lost+found
disconnected inode 5638569728, moving to lost+found
disconnected inode 5639909924, moving to lost+found
disconnected inode 5640129061, moving to lost+found
disconnected inode 5640810004, moving to lost+found
disconnected inode 5641809668, moving to lost+found
disconnected inode 5642181676, moving to lost+found
disconnected inode 5643848239, moving to lost+found
disconnected inode 5644897075, moving to lost+found
disconnected inode 5645330993, moving to lost+found
disconnected inode 5645401887, moving to lost+found
disconnected inode 5646127119, moving to lost+found
disconnected inode 5646485508, moving to lost+found
disconnected inode 5647125261, moving to lost+found
disconnected inode 5647394563, moving to lost+found
disconnected inode 5647394594, moving to lost+found
disconnected inode 5647775749, moving to lost+found
disconnected inode 5649927683, moving to lost+found
disconnected inode 5906268185, moving to lost+found
disconnected inode 7785451013, moving to lost+found
disconnected inode 7786417943, moving to lost+found
disconnected inode 7788027413, moving to lost+found
disconnected inode 7788027450, moving to lost+found
disconnected inode 7788865851, moving to lost+found
disconnected inode 7789176612, moving to lost+found
disconnected inode 7789203729, moving to lost+found
disconnected inode 7789707839, moving to lost+found
disconnected inode 7790392620, moving to lost+found
disconnected inode 7791052558, moving to lost+found
disconnected inode 7791197746, moving to lost+found
disconnected inode 7791358997, moving to lost+found
disconnected inode 7791458591, moving to lost+found
disconnected inode 7791458595, moving to lost+found
disconnected inode 7792672559, moving to lost+found
disconnected inode 7792974087, moving to lost+found
disconnected inode 7793121553, moving to lost+found
disconnected inode 7793439263, moving to lost+found
disconnected inode 7793756700, moving to lost+found
disconnected inode 7793869066, moving to lost+found
disconnected inode 7794057482, moving to lost+found
disconnected inode 7794457358, moving to lost+found
disconnected inode 7794624394, moving to lost+found
disconnected inode 7795757349, moving to lost+found
disconnected inode 7796501291, moving to lost+found
disconnected inode 7797312779, moving to lost+found
disconnected inode 7797643033, moving to lost+found
disconnected inode 7797991733, moving to lost+found
disconnected inode 7798172454, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 5637169961 nlinks from 18 to 17
resetting inode 7785023237 nlinks from 18 to 17
resetting inode 7798299956 nlinks from 1 to 2
resetting inode 7798299958 nlinks from 1 to 2
resetting inode 7798299959 nlinks from 1 to 2
resetting inode 7798299960 nlinks from 1 to 2
resetting inode 7798299962 nlinks from 1 to 2
resetting inode 7798299963 nlinks from 1 to 2
resetting inode 7798299964 nlinks from 1 to 2
resetting inode 7798299965 nlinks from 1 to 2
resetting inode 7798299966 nlinks from 1 to 2
- 22:41:13: verify and correct link counts - 32 of 32 allocation groups done
Maximum metadata LSN (5:2829320) is ahead of log (1:8).
Format log to cycle 8.
done
再次尝试执行 xfs_repair 则发现命令执行已经正常 (估计丢失了少量的数据, 有 CEPH 自行修复则可)
[root@ns-ceph-208191 ~]# xfs_repair /dev/sdh1
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- 22:50:52: scanning filesystem freespace - 32 of 32 allocation groups done
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- 22:50:52: scanning agi unlinked lists - 32 of 32 allocation groups done
- process known inodes and perform inode discovery...
- agno = 0
- agno = 30
- agno = 15
- agno = 1
- agno = 31
- agno = 16
- agno = 2
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 3
- agno = 4
- agno = 23
- agno = 24
- agno = 5
- agno = 25
- agno = 6
- agno = 26
- agno = 27
- agno = 7
- agno = 8
- agno = 28
- agno = 9
- agno = 29
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- 22:52:13: process known inodes and inode discovery - 576768 of 576768 inodes done
- process newly discovered inodes...
- 22:52:13: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- 22:52:13: setting up duplicate extent list - 32 of 32 allocation groups done
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 6
- agno = 11
- agno = 14
- agno = 19
- agno = 8
- agno = 22
- agno = 23
- agno = 4
- agno = 17
- agno = 30
- agno = 7
- agno = 15
- agno = 18
- agno = 10
- agno = 20
- agno = 21
- agno = 9
- agno = 5
- agno = 3
- agno = 25
- agno = 26
- agno = 27
- agno = 24
- agno = 28
- agno = 12
- agno = 29
- agno = 13
- agno = 31
- agno = 16
- 22:52:13: check for inodes claiming duplicate blocks - 576768 of 576768 inodes done
Phase 5 - rebuild AG headers and trees...
- 22:52:14: rebuild AG headers and trees - 32 of 32 allocation groups done
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
- 22:52:47: verify and correct link counts - 32 of 32 allocation groups done
done
当完成 xfs_repair 修复后, 建议重建一次 journal data
在重建 journal data 之前, 需要执行两个步骤
1. 确保当前 jouranl data 对应 OSD 已经挂载成功
2. 建议先吧对应 journal device 进行初始化
3. 完成上面两步后, 才对 journal device 重建
执行命令
[root@ns-ceph-208191 ~]# mount /dev/sdh1
[root@ns-ceph-208191 ~]# dd if=/dev/zero of=/dev/sda5 bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0.112486 s, 280 MB/s
[root@ns-ceph-208191 ~]# ceph-osd -i 104 --mkjournal
2018-01-26 22:55:41.176342 7f03455bdd00 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 540236d7-ff22-4439-902e-f9d5d2a4faed, invalid (someone else's?) journal
2018-01-26 22:55:41.176863 7f03455bdd00 -1 created new journal /dev/sda5 for object store /var/lib/ceph/osd/ceph-104
启动 osd
[root@ns-ceph-208191 ~]# systemctl start ceph-osd@104
[root@ns-ceph-208191 ~]# systemctl status ceph-osd@104
● ceph-osd@104.service - Ceph object storage daemon osd.104
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2018-01-26 22:58:03 CST; 4s ago
Process: 13364 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 13370 (ceph-osd)
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@104.service
└─13370 /usr/bin/ceph-osd -f --cluster ceph --id 104 --setuser ceph --setgroup ceph
Jan 26 22:58:03 ns-ceph-208191.vclound.com systemd[1]: Starting Ceph object storage daemon osd.104...
Jan 26 22:58:03 ns-ceph-208191.vclound.com systemd[1]: Started Ceph object storage daemon osd.104.
Jan 26 22:58:03 ns-ceph-208191.vclound.com ceph-osd[13370]: starting osd.104 at - osd_data /var/lib/ceph/osd/ceph-1...sda5
Hint: Some lines were ellipsized, use -l to show in full.
到此 ceph osd 修复成功
本文记录了在Ceph Luminous版本中遇到的故障修复过程,包括直接启动Ceph OSD、系统文件修复、异常处理以及如何强制执行XFS修复和重建journal data的详细步骤。
43





