arm linux "r",ARM Linux

自内核版本5.4以来,Aarch64系统频繁出现不稳定,表现为文件系统数据损坏导致根文件系统变为只读。主要表现为EXT4文件系统错误,涉及inode校验和无效。研究发现该问题不受硬件、媒介类型或内核版本(5.4到5.9)限制,并通过xfstests验证。在某些情况下,内存中的数据与媒体上的数据不匹配,经过debugfs修正inode校验和后,问题暂时解决。经过进一步调查,怀疑可能是缓存一致性或内存顺序问题,目前通过在特定函数中添加内存屏障指令暂时缓解了问题的发生。然而,最终发现该问题可能与gcc-4.9的bug有关,已知Android和Linaro gcc-4.9对此进行了修复。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Update: Friday 8 January 2021 - Fixed!

Since kernel version 5.4, my Aarch64 systems have become very unreliable,

requiring regular reboots to keep them working. Worringly, symptoms have

so far pointed towards filesystem data corruption, which results in the

root filesystem being marked read-only. This normally results in something

like one of these messages:

EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #271688: comm mandb: iget: checksum invalid

[7478798.720368] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #157096: comm mandb: iget: checksum invalid

EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #173544: comm mandb: iget: checksum invalid

[365750.234472] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #166384: comm mandb: iget: checksum invalid

[4175456.231948] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:1004: inode #396582: comm find: Directory block failed checksum

The result is the journal is aborted, the rootfs is marked read only.

The known facts so far:it has not been seen on kernel 5.2 on Armada 8040 hardware

(with an uptime of 560 days).

it has been seen on all mainline kernel versions from 5.4 to 5.9.

it occurs on several of my Armada 8040 and NXP LX2160A based systems,

which are both Cortex-A72 based systems. I have all the errata enabled

in the kernel.

it seems independent of the media; it has been seen on the rootfs of

two different NVMes on two different platforms, uSD, and eMMC.

it occurs between a week and three months, which makes attempting a

bisection of the changes between 5.2 to 5.4 infeasible.

I've run xfstests (as suggested by tytso) on the LX2160A and

generic/531 triggered the inode checksum error.

Investigation with debugfs sometimes shows that the inode checksum

is invalid, but if the block device is flushed (via hdparm) and re-read

from the media, the inode checksum is then correct. This implies that the

data in memory/CPU caches does not match the data on the media, especially

when the inode has not changed for days.

Below is a log of some of the recent instances:

29th February 2020

Error: [73729.556544] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid

Platform: NXP LX2160A

Media: XPG SX8200PNP NVMe

Kernel: 5.5

Uptime: 20 hours

Inode #917524 was /var/backups/dpkg.status.6.gz.

Running e2fsck -n /dev/nvme0n1p2 without rebooting showed that the

checksum was incorrect, so further investigation with debugfs was

warranted:debugfs: id <917524>

0000 a481 0000 30ff 0300 3d3d 465e bd77 4f5e ....0...==F^.wO^

0020 29ca 345e 0000 0000 0000 0100 0002 0000 ).4^............

0040 0000 0800 0100 0000 0af3 0100 0400 0000 ................

0060 0000 0000 0000 0000 4000 0000 c088 3800 ........@.....8.

0100 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

0140 0000 0000 5fc4 cfb4 0000 0000 0000 0000 ...._...........

0160 0000 0000 0000 0000 0000 0000 af23 0000 .............#..

0200 2000 1cc3 ac95 c9c8 a4d2 9883 583e addf ...........X>..

0220 3de0 485e b04d 7151 0000 0000 0000 0000 =.H^.MqQ........

0240 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

debugfs: stat <917524>

Inode: 917524 Type: regular Mode: 0644 Flags: 0x80000

Generation: 3033515103 Version: 0x00000000:00000001

User: 0 Group: 0 Project: 0 Size: 261936

File ACL: 0

Links: 1 Blockcount: 512

Fragment: Address: 0 Number: 0 Size: 0

ctime: 0x5e4f77bd:c8c995ac -- Fri Feb 21 06:25:01 2020

atime: 0x5e463d3d:dfad3e58 -- Fri Feb 14 06:25:01 2020

mtime: 0x5e34ca29:8398d2a4 -- Sat Feb 1 00:45:29 2020

crtime: 0x5e48e03d:51714db0 -- Sun Feb 16 06:25:01 2020

Size of extra inode fields: 32

Inode checksum: 0xc31c23af

EXTENTS:

(0-63):3705024-3705087

This is, as I remember, operating on the in-memory data rather than

the on-disk data, and the inode checksum of 0xc31c23af was incorrect.

I corrected the checksum using debugfs "sif" command, which wrote a

corrected checksum. This resulted in:

debugfs: id <917524>

0000 a481 0000 30ff 0300 3d3d 465e bd77 4f5e ....0...==F^.wO^

0020 29ca 345e 0000 0000 0000 0100 0002 0000 ).4^............

0040 0000 0800 0100 0000 0af3 0100 0400 0000 ................

0060 0000 0000 0000 0000 4000 0000 c088 3800 ........@.....8.

0100 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

0140 0000 0000 5fc4 cfb4 0000 0000 0000 0000 ...._...........

0160 0000 0000 0000 0000 0000 0000 b61f 0000 ................

^^^^

0200 2000 aa15 ac95 c9c8 a4d2 9883 583e addf ...........X>..

^^^^

0220 3de0 485e b04d 7151 0000 0000 0000 0000 =.H^.MqQ........

0240 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

With

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值