Recovering an overflowed LVM volume configured with –virtualsize

本文介绍如何使用LVM的薄配置特性创建超过实际可用空间的逻辑卷,并通过一个实例展示当数据超出分配空间导致文件系统错误时,如何通过调整逻辑卷大小并修复元数据来恢复数据。

/dev/vg/somevolume: read failed after 0 of 4096 at nnnnn: Input/output error

If you’ve ever seen the above error, this usually means you have run out of disk space on the CoW-volume of a snapshot volume.

…but there is another uses for snapshots, and that is thin provisioning for sparse data use.  If you create an LVM volume using the –virtualsize option, you can provide a logical size that is much larger than the actual underlying volume.  If you exceed the space for such a volume, you will get the same error above—and all data on the volume will be invalidated and inaccessible.

LVM silently uses the ‘zero’ devicemapper target as the underlying volume.  Thus, even though the data is invalidated nothing is lost.  By overlaying the lost data over the top of a zero device, we can resurrect the data.

We have prepared our example file with the following:

lvcreate -L 100m --virtualsize 200m -n virtual_test vg
mkfs.ext4 /dev/vg/virtual_test
 [...]
mount /dev/vg/virtual_test /mnt/tmp/

And now we fill the disk:

dd if=/dev/zero of=/mnt/tmp/overflow-file
dd: writing to `/mnt/tmp/overflow-file': Input/output error

Message from syslogd@backup at Aug 27 15:17:27 ...
 kernel:journal commit I/O error
272729+0 records in
272728+0 records out
139636736 bytes (140 MB) copied
[I had to reboot here.  The kernel still thought
 the filesystem was mounted and I could not continue.
 Obviously we are working near the kernel's limits on
 this CentOS 6.2 2.6.32-based kernel]

Now we have a 200MB volume with 100MB allocated to it, which is now full.  LVM has marked the volume as invalid and the data is no longer available.

First, resize the volume so we have room after resizing.  Otherwise, the first byte written to the volume would, again, invalidate the disk:

lvresize -L +100m /dev/vg/virtual_test
 [errors, possibly, just ignore them]
  Extending logical volume virtual_test to 200.00 MiB
  Logical volume virtual_test successfully resized

Now we edit the -cow file directly with a short perl script.  The 5th byte is the ‘valid’ flag (see http://www.redhat.com/archives/linux-lvm/2006-September/msg00132.html) so all we need to is set it to ‘1’:

 perl -e 'open(F, ">>", "/dev/mapper/vg-virtual_test-cow"); seek(F, 4, SEEK_SET); syswrite(F,"\x01",1); close(F);'

Now have lvm re-read the CoW metadata and you’re in business:

lvchange -an /dev/backup/virtual_test
  [ignore errors]
lvchange -ay /dev/backup/virtual_test
  [shouldn't have any errors]
lvs
  LV                    VG       Attr     LSize   Pool Origin               Data% 
  virtual_test          vg   swi-a-s- 200.00m      [virtual_test_vorigin]   33.63

At this point you should probably fsck your filesystem, it may be damaged—or at least nead a journal-replay since it stopped abruptly at the end of its allocated space.  And as you can see, the “overflow” file is there up until the point of filling the disk.

[root@backup mapper]# e2fsck /dev/vg/virtual_test
e2fsck 1.41.12 (17-May-2010)
/dev/vg/virtual_test: recovering journal
/dev/vg/virtual_test: clean, 12/51200 files, 66398/204800 blocks
[root@backup mapper]# mount /dev/vg/virtual_test /mnt/tmp/
[root@backup mapper]# ls -lh /mnt/tmp/
total 54M
drwx------. 2 root root 12K Aug 27 15:16 lost+found
-rw-r--r--. 1 root root 54M Aug 27 15:17 overflow-file

### 节点 RECOVERING 状态的原因 当 MongoDB 副本集中的某个节点或 MySQL MGR 集群中的成员进入 `RECOVERING` 状态时,通常意味着该节点正在尝试重新加入集群并同步最新的数据副本。对于 MongoDB 来说,在此状态下,节点不会参与选举也不会服务读请求[^1]。 具体来说: - **MongoDB** 中的节点可能由于网络分区、硬件故障或其他异常情况而暂时失去与其他大多数节点的联系;恢复连接后即进入这一状态以便追赶其他成员的数据更新进度[^3]。 - 对于 **MySQL MGR (InnoDB Cluster)** ,如果检测到本地事务日志已落后太多,则也会标记自己为 `RECOVERING` 并试图通过 State Snapshot Transfer (SST) 或 Incremental State Transfer (IST) 追赶主服务器的状态[^2]。 ### 解决方案概述 针对上述两种数据库系统的处理措施有所不同: #### MongoDB 的解决方案 有两种主要的方法来应对处于 `RECOVERING` 状态下的 MongoDB 成员: 1. 清除现有数据文件后再重启实例,这将强制执行一次完整的初始同步操作,虽然耗时较长但较为彻底。 ```bash mongod --dbpath /data/db --shutdown rm -rf /data/db/* mongod --replSet rs0 --bind_ip_all & ``` 2. 另外一种更高效的方式是从当前健康的主节点复制整个数据目录至待修复节点上,从而跳过冗长的日志回放阶段。不过需要注意的是这种方法可能会遗漏部分实时写入的数据变更。 #### MySQL MGR 的解决方案 面对 MySQL MGR 成员长期停留在 `RECOVERING` 状态的情况,可以考虑以下几个方面进行排查与修正[^4]: - 检查是否存在重复命名的 schema 导致冲突; - 审视组通信层配置参数是否合理设置; - 如果 SST 流程频繁触发可能是由于 IST 不适用造成的性能瓶颈所致。 ### 注意事项 无论采取哪种策略解决问题之前都应该做好充分备份工作,并仔细评估潜在风险以及对业务连续性的冲击程度。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值