linux终端一直内核,一个linux的内核问题

博客内容讲述了在Linux环境中遇到的一个问题,即虚拟机连接不稳定,系统出现任务挂起的情况,尤其是khugepaged进程。通过检查系统日志,发现有多个任务被挂起超过120秒,疑似由于内核锁引发。博主尝试了查阅资料并找到一个可能与khugepaged相关的bug报告。为了解决当前问题,博主决定临时关闭khugepaged功能,以避免系统假死。关闭方法包括修改系统内核参数来禁用hung_task_timeout_secs和khugepaged的defrag功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

linux中的一个bug

最近测试用的一台虚拟机经常出现连接不上的问题。通过ssh,vnc各种办法都没有效果,

到终端上,一看机器是看这,看起来运行良好,使用ps查看命令后,没有办法显示完全。

为了保证测试人员的使用,先重启了集群,好了。接下来再慢慢看,

查看系统日志后发现如下记录:

Feb 6 14:13:02 localhost kernel: INFO: task khugepaged:28 blocked for more than 120 seconds.

Feb 6 14:13:02 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Feb 6 14:13:02 localhost kernel: khugepaged D ffff88003b06da78 0 28 2 0x00000000

Feb 6 14:13:02 localhost kernel: ffff88003af69c78 0000000000000046 ffffffff81059d12 0000000000000000

Feb 6 14:13:02 localhost kernel: 0000000000016980 0000000000000000 ffff88003b06da00 000000010100c183

Feb 6 14:13:02 localhost kernel: ffff88003aefe5f8 ffff88003af69fd8 0000000000010518 ffff88003aefe5f8

Feb 6 14:13:02 localhost kernel: Call Trace:

Feb 6 14:13:02 localhost kernel: [] ? finish_task_switch+0x42/0xd0

Feb 6 14:13:02 localhost kernel: [] ? lock_timer_base+0x3c/0x70

Feb 6 14:13:02 localhost kernel: [] rwsem_down_failed_common+0x95/0x1d0

Feb 6 14:13:02 localhost kernel: [] ? del_timer_sync+0x22/0x30

Feb 6 14:13:02 localhost kernel: [] rwsem_down_read_failed+0x26/0x30

Feb 6 14:13:02 localhost kernel: [] call_rwsem_down_read_failed+0x14/0x30

Feb 6 14:13:02 localhost kernel: [] ? down_read+0x24/0x30

Feb 6 14:13:02 localhost kernel: [] khugepaged+0x1b2/0x1190

Feb 6 14:13:02 localhost kernel: [] ? autoremove_wake_function+0x0/0x40

Feb 6 14:13:02 localhost kernel: [] ? khugepaged+0x0/0x1190

Feb 6 14:13:02 localhost kernel: [] kthread+0x96/0xa0

Feb 6 14:13:02 localhost kernel: [] child_rip+0xa/0x20

Feb 6 14:13:02 localhost kernel: [] ? kthread+0x0/0xa0

Feb 6 14:13:02 localhost kernel: [] ? child_rip+0x0/0x20

Feb 6 14:13:02 localhost kernel: INFO: task fuse_dfs:1593 blocked for more than 120 seconds.

Feb 6 14:13:02 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Feb 6 14:13:02 localhost kernel: fuse_dfs D ffffffff8110c060 0 1593 1 0x00000080

Feb 6 14:13:02 localhost kernel: ffff88003cd599b8 0000000000000086 00000000ffffffff 00010bb591837fe6

Feb 6 14:13:02 localhost kernel: 0000000000000000 ffff88003b4b9980 0000000000266bba ffffffffaeee4f12

Feb 6 14:13:02 localhost kernel: ffff88003cd550a8 ffff88003cd59fd8 0000000000010518 ffff88003cd550a8

Feb 6 14:13:02 localhost kernel: Call Trace:

Feb 6 14:13:02 localhost kernel: [] ? ktime_get_ts+0xa9/0xe0

Feb 6 14:13:02 localhost kernel: [] ? sync_page+0x0/0x50

Feb 6 14:13:02 localhost kernel: [] io_schedule+0x73/0xc0

Feb 6 14:13:02 localhost kernel: [] sync_page+0x3d/0x50

Feb 6 14:13:02 localhost kernel: [] __wait_on_bit_lock+0x5a/0xc0

Feb 6 14:13:02 localhost kernel: [] __lock_page+0x67/0x70

Feb 6 14:13:02 localhost kernel: [] ? wake_bit_function+0x0/0x50

Feb 6 14:13:02 localhost kernel: [] lock_page+0x30/0x40

Feb 6 14:13:02 localhost kernel: [] migrate_pages+0x59d/0x5d0

Feb 6 14:13:02 localhost kernel: [] ? ____pagevec_lru_add+0x167/0x180

Feb 6 14:13:02 localhost kernel: [] ? compaction_alloc+0x0/0x370

Feb 6 14:13:02 localhost kernel: [] compact_zone+0x4ac/0x5e0

Feb 6 14:13:02 localhost kernel: [] ? get_page_from_freelist+0x15c/0x820

Feb 6 14:13:02 localhost kernel: [] compact_zone_order+0x7e/0xb0

Feb 6 14:13:02 localhost kernel: [] try_to_compact_pages+0x109/0x170

Feb 6 14:13:02 localhost kernel: [] __alloc_pages_nodemask+0x55c/0x810

Feb 6 14:13:02 localhost kernel: [] ? page_add_new_anon_rmap+0xb5/0xd0

Feb 6 14:13:02 localhost kernel: [] alloc_pages_vma+0x84/0x110

Feb 6 14:13:02 localhost kernel: [] ? anon_vma_prepare+0x30/0x160

Feb 6 14:13:02 localhost kernel: [] do_huge_pmd_anonymous_page+0x135/0x360

Feb 6 14:13:02 localhost kernel: [] ? apic_timer_interrupt+0xe/0x20

Feb 6 14:13:02 localhost kernel: [] handle_mm_fault+0x245/0x2b0

看起来是由于一个内核级别的错误引起的。仔细看了下,所有的错误都是先由khugepaged进程挂起引起的。网上查看资料是由于内核锁造成的。

系统处于一个假死的状态,对于内核我也不熟悉,网上查查资料先,

在网上查到一个关于khugepaged的一个bug信息,链接http://bugs.centos.org/view.php?id=5716

暂时想不到别的办法,先关闭khugepaged试试看了。

echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值