RHEL环境的xfs磁盘无法执行系统命令恢复_ending clean mount-优快云博客

本文链接：https://blog.youkuaiyun.com/wy_hhxx/article/details/145638304

环境：REHL8.10

现象：在某个路径下系统命令无法使用

[root@RHEL8-xxx opt]# ll
ls: cannot open directory '.': Input/output error
[root@RHEL8-xxx opt]# ll
ls: cannot open directory '.': Input/output error
[root@RHEL8-xxx opt]# pwd
/opt

df -h 查看磁盘挂载如下
/dev/mapper/rhel-opt 4.2T 4.1T 90G 98% /opt

查看/var/log/messages

[root@RHEL8-xxx ~]# vim /var/log/messages
[root@RHEL8-xxx ~]#
[root@RHEL8-xxx ~]# grep dm-0 /var/log/messages
Feb 12 22:20:16 RHEL8-xxx kernel: Workqueue: xfs-conv/dm-0 xfs_end_io [xfs]
Feb 12 22:20:16 RHEL8-xxx kernel: XFS (dm-0): Internal error xfs_trans_cancel at line 957 of file fs/xfs/xfs_trans.c.  Caller xfs_iomap_write_unwritten+0x281/0x2a0 [xfs]
Feb 12 22:20:16 RHEL8-xxx kernel: Workqueue: xfs-conv/dm-0 xfs_end_io [xfs]
Feb 12 22:20:16 RHEL8-xxx kernel: XFS (dm-0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0xc6/0x130 [xfs] (fs/xfs/xfs_trans.c:958).  Shutting down filesystem
Feb 12 22:20:16 RHEL8-xxx kernel: XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
Feb 14 10:06:27 RHEL8-xxx kernel: dm-0: writeback error on inode 8006597246, offset 8613888, sector 1322459936
Feb 14 14:38:53 RHEL8-xxx kernel: XFS (dm-0): Unmounting Filesystem
Feb 14 15:10:57 RHEL8-xxx kernel: XFS (dm-0): Mounting V5 Filesystem
Feb 14 15:10:58 RHEL8-xxx kernel: XFS (dm-0): Ending clean mount
[root@RHEL8-xxx ~]#
[root@RHEL8-xxx ~]#

如果vim无法使用，则尝试如下命令

[root@RHEL8-xxx ~]# dmesg | grep dm-0
[    8.230683] XFS (dm-0): Mounting V5 Filesystem
[    8.464148] XFS (dm-0): Starting recovery (logdev: internal)
[    9.278087] XFS (dm-0): Ending recovery (logdev: internal)
[559929.348236] Workqueue: xfs-conv/dm-0 xfs_end_io [xfs]
[559929.349739] XFS (dm-0): Internal error xfs_trans_cancel at line 957 of file fs/xfs/xfs_trans.c.  Caller xfs_iomap_write_unwritten+0x281/0x2a0 [xfs]
[559929.349947] Workqueue: xfs-conv/dm-0 xfs_end_io [xfs]
[559929.352650] XFS (dm-0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0xc6/0x130 [xfs] (fs/xfs/xfs_trans.c:958).  Shutting down filesystem
[559929.352872] XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
[559929.353785] dm-0: writeback error on inode 8006597246, offset 8613888, sector 1322459936
[705046.354376] XFS (dm-0): Unmounting Filesystem
[706970.301055] XFS (dm-0): Mounting V5 Filesystem
[706970.617985] XFS (dm-0): Ending clean mount
[root@RHEL8-xxx ~]#

根据提示"Please unmount the filesystem and rectify the problem(s)"，接下来卸载/dev/mapper/rhel-opt -> 修复 -> 重新挂载

1.如果需要重新挂载一个使用 LVM（Logical Volume Manager）管理的磁盘（通常位于/dev/mapper），需要确认挂载点、文件系统等

/dev/mapper/rhel-opt 挂载点是/opt，现在要确认文件系统，通过如下方式可以确认文件系统是xfs

方法一 df -T,说明：-T 可以打印文件系统信息

[root@RHEL8-xxx ~]# df -Th /opt
Filesystem           Type  Size  Used Avail Use% Mounted on
/dev/mapper/rhel-opt xfs   4.2T  3.5T  677G  85% /opt
[root@RHEL8-xxx ~]#

方法二：
如果该盘重启后会自动挂载，查看 /etc/fstab

# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=xxxxxxxx-94d5-xxxx-xxxx-xxxxxxxxxxxx  /                       xfs     defaults        0 0
UUID=xxxxxxxx-1218-xxxx-xxxx-xxxxxxxxxxxx /boot                   xfs     defaults        0 0
UUID=xxxxxxxx-445a-xxxx-xxxx-xxxxxxxxxxxx /home                   xfs     defaults        0 0
/dev/mapper/rhel-opt    /opt                    xfs     defaults        0 0
UUID=xxxxxxxx-4f2b-xxxx-xxxx-xxxxxxxxxxxx /var                    xfs     defaults        0 0
UUID=xxxxxxxx-669f-4c0c-xxxx-xxxx-xxxxxxxxxxxx none                    swap    defaults        0 0
~

说明：此文件中/dev/mapper/rhel-opt 没有对应的UUID, 可以通过如下命令查询，然后加上

[root@RHEL8-xxx ~]# sudo blkid | grep /dev/mapper/rhel-opt
/dev/mapper/rhel-opt: UUID="xxxxxxxx-de5e-xxxx-xxxx-xxxxxxxxxxxx" BLOCK_SIZE="512" TYPE="xfs"
[root@RHEL8-xxx ~]#

方法三

[root@RHEL8-xxx ~]# mount | grep /dev/mapper/rhel-opt
/dev/mapper/rhel-opt on /opt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
[root@RHEL8-xxx ~]#

2. 确保没有进程占用 /opt
执行以下命令，检查是否有进程正在使用 /opt 目录

# lsof +D /opt

如果有进程占用，可以尝试结束相关进程，如果不确定有哪些进程在使用该目录，可以强制卸载：
umount -l /opt 或者 umount -f /opt

但是lsof 命令报 Input/output error , umount -f /opt 提示 target is busy
说明仍然有进程或内核组件占用了 /opt。但由于 lsof 不能使用（Input/output error），需要其它方法来找到占用 /opt 的进程并卸载。

执行如下命令查看/opt是否有进程占用

[root@RHEL8-xxx ~]# fuser -vm /dev/mapper/rhel-opt
Cannot stat file /proc/68141/fd/3: Input/output error
Cannot stat file /proc/68141/fd/4: Input/output error
Cannot stat file /proc/68141/fd/5: Input/output error
......
                     USER        PID ACCESS COMMAND
/dev/dm-0:           tcuser    68141 ....m java
                     tcuser    68204 ....m java
                     tcuser    68561 ....m redis-server
......
                     tcuser    68921 ....m java
                     tcuser    68974 ....m java

查看进程并kill
ps -ef | grep tcuser
kill -9 $(ps -ef | grep tcuser | awk '{print $2}')

3.. 卸载文件系统
如果 /opt 仍然被占用，尝试：

4.运行 XFS 文件系统检查
使用 xfs_repair 修复 /dev/mapper/rhel-opt
如果 xfs_repair 提示设备正在使用，可执行 xfs_repair -L /dev/mapper/rhel-opt
注意： -L 选项会清除 XFS 日志，可能导致部分数据丢失，但在文件系统无法访问的情况下，这是必要的措施。

[root@RHEL8-xxx ~]# xfs_repair -L /dev/mapper/rhel-opt
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_icount 11373792, counted 11373728
sb_ifree 201840, counted 306347
sb_fdblocks 13143922, counted 31306926
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
correcting nblocks for inode 6444415873, was 3233 - counted 3249
correcting nextents for inode 6444415873, was 81 - counted 82
        - agno = 4
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 4
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (83:1692886) is ahead of log (1:2).
Format log to cycle 86.
done
[root@RHEL8-xxx ~]#

5.重新挂载
修复完成后，尝试重新挂载：
#mount /dev/mapper/rhel-opt /opt

然后检查是否恢复正常：
df -h /opt
dmesg | tail -n 20

检查开机自启动/etc/fstab，重启环境后再检查挂载

=============================================================

参考资料：chatGTP
Q: RHEL8 中磁盘出现问题如何重新挂载？具体如下，dmesg中提示XFS (dm-0): Please unmount the filesystem and rectify the problem(s) ，出问题的是 /dev/mapper/rhel-opt；# mount | grep /dev/mapper/rhel-opt的输出是 /dev/mapper/rhel-opt on /opt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)；lvdisplay的输出是 --- Logical volume --- LV Path /dev/rhel/opt LV Name opt VG Name rhel LV UUID ……
Q: lsof命令用不了，提示 Input/output error
1. 尝试 fuser 释放占用
执行以下命令查看 /opt 是否有进程占用：
fuser -vm /opt
如果 fuser 正常工作，它会列出占用 /opt 的进程 ID。然后强制终止这些进程：
fuser -k /opt
如果 fuser 也报 Input/output error，继续下一步。

2. 使用 umount -l 进行懒卸载
如果 umount -f 无效，可以尝试 -l 选项（懒卸载），它不会立即释放 /opt，而是等所有进程结束后自动卸载：
umount -l /opt
如果成功卸载，则可以继续执行 xfs_repair 进行文件系统修复。

3. 强制停止可能的相关进程
有些进程可能没有直接使用 /opt，但仍然持有文件句柄。例如：
ps aux | grep /opt
然后手动终止相关进程：

kill -9 <PID>
如果 ps 也遇到 Input/output error，可能需要直接重启系统。

4. 进入单用户模式或救援模式
如果仍然无法卸载，建议重启系统并进入单用户模式（rescue mode）：

在 GRUB 菜单选择 RHEL 内核，按 e 进入编辑模式。
找到 linux 内核行（以 linux 或 linux16 开头），在行尾添加：
systemd.unit=rescue.target
按 Ctrl + X 启动。

在救援模式下，执行：
umount /opt
xfs_repair /dev/mapper/rhel-opt
修复完成后，执行 reboot。
如果 rescue mode 仍然无法修复，可以尝试 Live CD 或 initramfs 修复模式。

5. 检查 LVM 设备
如果 xfs_repair 仍然报错，可能 LVM 设备异常：
lvchange -an /dev/mapper/rhel-opt
lvchange -ay /dev/mapper/rhel-opt
然后再次尝试 umount 和 xfs_repair。

6. 硬件检查
如果 xfs_repair 报告设备 I/O 错误（I/O error），可能磁盘损坏。执行：
smartctl -a /dev/sdX # X 替换为实际设备
如果磁盘有坏块，可以尝试：
badblocks -sv /dev/sdX
如果磁盘损坏严重，建议更换硬件或尝试 ddrescue 进行数据恢复。

如果你尝试了这些步骤仍然无法修复，提供 dmesg | tail -n 50 和 xfs_repair 的完整错误信息，我可以帮你进一步分析。