Configuring Write Barriers: File System Data Integrity over Power Failures in RHEL

本文详细介绍了在Red Hat Enterprise Linux中配置写屏障(write barriers)以保障文件系统数据在断电情况下的完整性的方法。写屏障是Linux 2.6内核的一种机制,通过在日志文件系统提交数据前后发出缓存刷新指令,确保元数据正确排序并持久化。文章解释了写屏障的重要性,尤其是在ext3、ext4和XFS文件系统中,并提供了如何启用和验证写屏障的步骤。

https://access.redhat.com/articles/22540

Configuring Write Barriers: File System Data Integrity over Power Failures in Red Hat Enterprise Linux

Updated 2014年十一月26日19:12 - 

English 

Issue

Data integrity over power failures is one of the critical features for many Red Hat Enterprise Linux (RHEL) users, especially for mission critical data. Specifically, users have the expectation that file systems can survive a power failure without requiring a long running file system check on recovery. This article is meant to give a broad overview of what Write Barriers are in Linux and how to configure and verify them in RHEL.

Environment

Red Hat Enterprise Linux 4

Red Hat Enterprise Linux 5

Resolution

Background

Setting up and verifying data integrity can be a confusing process. Before describing the specifics of how to configure write barriers, it is useful to define some common terms and give an overview of how application data moves from the application's data buffers to persistent storage. A simplified view of the data path is that application data moves from the application data buffer into the kernel's page cache after a "write()" system call. This application data will be referred to as user data in this article. File systems maintain metadata in addition to this user data for its internal book keeping, including things like allocation maps, file names, directory entries, file system super blocks and inodes. Think of the metadata as the structure of the file system which allows the kernel to keep track of where the user data is stored.

File systems take great care to update the metadata in a safe way. Journalling file systems bundled metadata updates into transactions which are sent to persistent storage in an ordered way: first the body of the transaction is sent to the storage device and then a commit block. If the transaction and its corresponding commit block are both present after a power failure, the file system assumes that the transaction will survive any power failure.

Things get complicated when storage devices add extra caches for data. For example, hardware RAID controllers often contain internal write caches which can have battery back up. Storage target devices like a local S-ATA or SAS drive also have write caches which can range  up to 32 or 64 MB in size with modern drives. High end arrays, like those from NetApp, IBM, Hitachi and EMC among others, also have large caches.

The key for file system data integrity is that the IO sent for any given transaction and its commit block must not be reordered or lost on power failure by any of these potential caches.

What are Write Barriers?

Write barriers are a 2.6 kernel mechanism that issue cache flush commands before and after the commit block used by journalling file systems. This is a brute force way to insure that the metadata is correctly ordered on persistent storage and can have a substantial performance impact for some applications when enabled. Specifically, applications which create and delete lots of small files and applications that are heavy "fsync()" users will often go much slower.

In RHEL 4 and RHEL 5, write barriers for ext3 are not enabled by default. To enable barriers for ext3, use the "-o barrier=1" mount option:

Raw

# mount -o barrier=1 /dev/sda1 /test

Note that barriers are enabled by default for ext4 and XFS. Barriers are enabled by default for GFS2 in RHEL6 and above, but are not supported by GFS or GFS2 in RHEL5 or earlier.

The kernel will automatically disable barriers when it detects devices that advertise themselves as having a write through cache or the system is configured with MD or LVM devices that do not properly handle write barriers. When barriers are disabled, a message is logged to /var/log/messages.  The messages may look like:

Raw

Jun 23 11:54:06 hostname kernel: JBD: barrier-based sync failed on dm-1-8 - disabling barriers

Note that write barriers do not help user data survive power failures in most configurations. To do this, applications must use explicit "fsync()" commands. Note that applications that use direct IO still need to use "fsync()" in order to flush data from the downstream IO devices.

Does My System Need Write Barrier Support?

Several configurations of systems do not need to use write barriers.

The first way to avoid data integrity issues is to make sure that there are no write caches that could lose data on power failures. For a simple server or desktop, say with one or more S-ATA drives off a local S-ATA controller like the Intel AHCI part, users can disable the write cache on the target S-ATA drives with the hdparm command:

Raw

# hdparm -W0 /dev/sda

The second type of system that can avoid using write barriers are those with hardware RAID controllers with battery backed write cache. If the system has this kind of hardware RAID card with battery backed write cache and its component drives have their write caches disabled, the controller will advertise itself as a write through cache which indicates that the kernel can trust it to persist data. A specific example would be a controller like the LSI megaraid SAS controller. To verify the state of the backend drives which are normally hidden by such controllers, users need to use vendor specific tools to query and manipulate the target drives. For LSI megaraid SAS, the command is the LSI MegaCli command:

Raw

# MegaCli64 -LDGetProp  -DskCache  -LAll -aALL

The above will show the state of the back end drives.

Raw

# MegaCli64 -LDSetProp -DisDskCache -Lall -aALL

The above disables the write cache for those drives.

As mentioned above, this command is very vendor (even HBA) specific.

SCSI disk devices' cache status can be referenced from /sys/block/(Block device name)/device/scsi_disk/(Address)/cache_type

Raw

# cat /sys/block/sda/device/scsi_disk/5:0:0:0/cache_type
write back

Note that hardware RAID cards recharge their batteries while the system is operational. If a system is powered off for an extended period of time, the batteries will lose their charge and the system will not be protected over a power failure.

The third major class of storage that do not need write barriers are high end arrays that have various ways of maintaining data across a power failure. There is no need to try and verify the state of the internal drives in external RAID storage.

Note that NFS clients do not need to enable write barriers since the data integrity is handled by the NFS server side. NFS servers should be configured and run on local file systems which do have barriers enabled as mentioned above.

在使用 Docker CLI 时,如果遇到 `failed to write file, exit status 0xffffffff` 错误,通常表明配置文件写入失败。此问题可能由权限不足、文件锁定或磁盘空间不足引起。 ### 可能的原因及解决方案 #### 1. 权限问题 Docker CLI 在运行时需要对配置文件(如 `~/.docker/config.json`)进行读写操作。若用户没有对该文件的写权限,则可能导致写入失败。可以通过以下命令修改文件权限: ```bash chmod 600 ~/.docker/config.json ``` 此外,确保当前用户属于 `docker` 用户组,以避免因权限限制导致的操作失败: ```bash sudo usermod -aG docker $USER ``` 执行完上述命令后,需重新登录以使更改生效[^1]。 #### 2. 文件被其他进程占用 如果 Docker 配置文件正在被其他进程使用,可能会导致写入失败。可以通过检查是否有 Docker 相关进程正在运行,并尝试停止这些进程: ```bash ps aux | grep docker sudo systemctl stop docker ``` 停止服务后,再尝试执行 Docker 命令。完成后,重新启动 Docker 服务: ```bash sudo systemctl start docker ``` #### 3. 磁盘空间不足 磁盘空间不足也可能导致写入失败。可以通过以下命令检查磁盘空间: ```bash df -h ``` 如果发现磁盘空间已满,可以尝试清理不必要的文件或扩展磁盘容量。 #### 4. 文件损坏 有时,配置文件本身可能已损坏,导致 Docker CLI 无法正常写入。可以尝试删除现有的配置文件并让 Docker 自动重建: ```bash rm ~/.docker/config.json ``` 执行此操作后,Docker 会在下次需要时自动创建新的配置文件。需要注意的是,这将清除所有现有的认证信息和其他配置。 #### 5. 检查日志以获取更多信息 Docker 提供了详细的日志功能,可以帮助诊断此类问题。可以通过以下命令查看 Docker 的日志: ```bash journalctl -u docker.service ``` 通过分析日志,可以更深入地了解错误的具体原因,从而采取相应的解决措施[^1]。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值