一、问题描述:
我的服务突然被kill了,没有任何人操作过。
二、问题排查:
- 查看app日志。看不出异常;
- 查看内存溢出日志。启动参数里设置了内存溢出日志打印:xxx;但没有相关日志;
- 查看linux日志,看看是否有异常;日志路径:/var/log/messages
messages内容截图:
messages日志内容:
Jan 20 14:46:04 uip-uat-app-4 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jan 20 14:46:04 uip-uat-app-4 kernel: java cpuset=/ mems_allowed=0
Jan 20 14:46:04 uip-uat-app-4 kernel: CPU: 3 PID: 21826 Comm: java Kdump: loaded Not tainted 3.10.0-1160.119.1.el7.x86_64 #1
Jan 20 14:46:04 uip-uat-app-4 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Jan 20 14:46:04 uip-uat-app-4 kernel: Call Trace:
Jan 20 14:46:04 uip-uat-app-4 kernel: [] dump_stack+0x19/0x1f
Jan 20 14:46:04 uip-uat-app-4 kernel: [] dump_header+0x90/0x22d
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? ktime_get_ts64+0x52/0xf0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? delayacct_end+0x8f/0xc0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] oom_kill_process+0x2d5/0x4a0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? oom_unkillable_task+0xcd/0x120
Jan 20 14:46:04 uip-uat-app-4 kernel: [] out_of_memory+0x31a/0x500
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __alloc_pages_nodemask+0xae4/0xbf0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] alloc_pages_current+0x98/0x110
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __page_cache_alloc+0x97/0xb0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] filemap_fault+0x270/0x420
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Jan 20 14:46:04 uip-uat-app-4 kernel: [] xfs_filemap_fault+0x2c/0x40 [xfs]
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __do_fault.isra.61+0x8a/0x100
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? hrtimer_get_res+0x50/0x50
Jan 20 14:46:04 uip-uat-app-4 kernel: [] do_read_fault.isra.63+0x4c/0x1b0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] handle_mm_fault+0xa33/0x1190
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __do_page_fault+0x213/0x510
Jan 20 14:46:04 uip-uat-app-4 kernel: [] do_page_fault+0x35/0x90
Jan 20 14:46:04 uip-uat-app-4 kernel: [] page_fault+0x28/0x30
Jan 20 14:46:04 uip-uat-app-4 kernel: Mem-Info:
Jan 20 14:46:04 uip-uat-app-4 kernel: active_anon:1838776 inactive_anon:335123 isolated_anon:9#012 active_file:218 inactive_file:336 isolated_file:0#012 unevictable:0 dirty:2 writeback:47 unstable:0#012 slab_reclaimable:27150 slab_unreclaimable:23069#012 mapped:6318 shmem:7667 pagetables:8317 bounce:0#012 free:33840 free_pcp:99 free_cma:0
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA free:15888kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 20 14:46:04 uip-uat-app-4 kernel: lowmem_reserve[]: 0 2830 15857 15857
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA32 free:64012kB min:12052kB low:15064kB high:18076kB active_anon:1315600kB inactive_anon:465924kB active_file:292kB inactive_file:816kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:2898720kB mlocked:0kB dirty:0kB writeback:24kB mapped:6612kB shmem:7608kB slab_reclaimable:19032kB slab_unreclaimable:15408kB kernel_stack:7808kB pagetables:5304kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11266 all_unreclaimable? yes
Jan 20 14:46:04 uip-uat-app-4 kernel: lowmem_reserve[]: 0 0 13026 13026
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 Normal free:55460kB min:55464kB low:69328kB high:83196kB active_anon:6039504kB inactive_anon:874568kB active_file:580kB inactive_file:528kB unevictable:0kB isolated(anon):36kB isolated(file):0kB present:13631488kB managed:13342488kB mlocked:0kB dirty:8kB writeback:164kB mapped:18660kB shmem:23060kB slab_reclaimable:89568kB slab_unreclaimable:76852kB kernel_stack:23904kB pagetables:27964kB unstable:0kB bounce:0kB free_pcp:396kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:15437 all_unreclaimable? yes
Jan 20 14:46:04 uip-uat-app-4 kernel: lowmem_reserve[]: 0 0 0 0
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA: 04kB 08kB 116kB (U) 032kB 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (EM) = 15888kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA32: 89704kB (EM) 25238kB (UEM) 56716kB (EM) 332kB (UE) 164kB (U) 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 65296kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 Normal: 136754kB (UM) 1978kB (UEM) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 56276kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 14:46:04 uip-uat-app-4 kernel: 19814 total pagecache pages
Jan 20 14:46:04 uip-uat-app-4 kernel: 11522 pages in swap cache
Jan 20 14:46:04 uip-uat-app-4 kernel: Swap cache stats: add 2345322, delete 2332288, find 3860069/4023116
Jan 20 14:46:04 uip-uat-app-4 kernel: Free swap = 0kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Total swap = 1048572kB
Jan 20 14:46:04 uip-uat-app-4 kernel: 4194157 pages RAM
Jan 20 14:46:04 uip-uat-app-4 kernel: 0 pages HighMem/MovableOnly
Jan 20 14:46:04 uip-uat-app-4 kernel: 129879 pages reserved
Jan 20 14:46:04 uip-uat-app-4 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 511] 0 511 25239 7779 56 51 0 systemd-journal
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 548] 0 548 11272 2 25 283 -1000 systemd-udevd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 549] 0 549 47582 0 29 97 0 lvmetad
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 743] 0 743 13877 26 27 88 -1000 auditd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 745] 0 745 21138 25 12 28 0 audispd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 747] 0 747 13899 16 32 72 0 sedispatch
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 770] 0 770 101813 530 54 116 0 accounts-daemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 776] 999 776 137315 112 65 1312 0 polkitd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 780] 0 780 4221 5 13 44 0 alsactl
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 783] 172 783 49690 0 32 92 0 rtkit-daemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 784] 81 784 15113 129 33 75 -900 dbus-daemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 787] 32 787 17305 15 37 120 0 rpcbind
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 794] 0 794 1096 1 7 33 0 acpid
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 798] 0 798 24913 0 42 402 0 VGAuthService
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 799] 0 799 5419 43 15 41 0 irqbalance
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 800] 0 800 115681 145 92 283 0 udisksd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 802] 0 802 3304 2 11 49 0 rngd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 805] 0 805 50325 11 38 121 0 gssproxy
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 806] 70 806 15561 36 35 80 0 avahi-daemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 807] 0 807 80012 85 69 280 0 vmtoolsd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 808] 0 808 6108 5 16 136 0 smartd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 809] 0 809 55930 1 62 466 0 abrtd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 811] 0 811 55307 10 63 345 0 abrt-watch-log
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 813] 0 813 55307 1 61 354 0 abrt-watch-log
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 816] 996 816 2144 8 9 29 0 lsmd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 817] 0 817 6594 44 18 41 0 systemd-logind
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 820] 998 820 29438 52 28 54 0 chronyd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 830] 70 830 15528 1 34 98 0 avahi-daemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 831] 0 831 124119 182 96 452 0 NetworkManager
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 838] 0 838 1652 1 8 37 0 mcelog
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 842] 0 842 28954 170 12 21 0 ksmtuned
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1112] 0 1112 68322 1976 69 162 0 rsyslogd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1116] 0 1116 49482 0 52 288 0 cupsd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1120] 0 1120 143453 134 99 2643 0 tuned
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1124] 0 1124 28924 24 13 17 0 rhsmcertd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1131] 0 1131 27522 2 12 32 0 agetty
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1132] 0 1132 6476 6 19 47 0 atd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1133] 0 1133 27522 2 10 31 0 agetty
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1136] 0 1136 31570 26 21 131 0 crond
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1180] 0 1180 7705 23 19 86 -1000 sshd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1209] 0 1209 26991 32 11 6 0 rhnsd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1420] 0 1420 22927 21 44 240 0 master
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1433] 89 1433 22997 20 45 247 0 qmgr
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1839] 0 1839 104986 159 60 178 0 packagekitd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1914] 1101 1914 2165947 147640 590 96974 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [26643] 1101 26643 214195 13047 85 1403 0 filebeat
Jan 20 14:46:04 uip-uat-app-4 kernel: [26679] 1101 26679 214395 12273 87 1981 0 filebeat
Jan 20 14:46:04 uip-uat-app-4 kernel: [26782] 1101 26782 393304 2786 86 1821 0 metricbeat
Jan 20 14:46:04 uip-uat-app-4 kernel: [31886] 1101 31886 304621 2337 42 1662 0 agentdaemon
Jan 20 14:46:04 uip-uat-app-4 kernel: [32068] 1101 32068 579918 16662 277 5447 0 cdc
Jan 20 14:46:04 uip-uat-app-4 kernel: [20995] 1101 20995 2731097 336832 1370 144123 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [40581] 1101 40581 2746534 650356 1623 0 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 8336] 1101 8336 2168447 660825 1582 0 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [34287] 1101 34287 1770449 278297 760 0 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 7034] 0 7034 25449 215 53 0 0 sshd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 7037] 1101 7037 25970 216 54 0 0 sshd
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 7038] 1101 7038 5937 95 17 0 0 sftp-server
Jan 20 14:46:04 uip-uat-app-4 kernel: [21801] 89 21801 22953 252 43 0 0 pickup
Jan 20 14:46:04 uip-uat-app-4 kernel: [28689] 0 28689 26987 18 12 0 0 sleep
Jan 20 14:46:04 uip-uat-app-4 kernel: Out of memory: Kill process 7127 (java) score 153 or sacrifice child
Jan 20 14:46:04 uip-uat-app-4 kernel: Killed process 8336 (java), UID 1101, total-vm:8673788kB, anon-rss:2643300kB, file-rss:0kB, shmem-rss:0kB
2.1 日志分析
以下是kimi的分析;
5. 调用栈信息
Jan 20 14:46:04 uip-uat-app-4 kernel: Call Trace:
Jan 20 14:46:04 uip-uat-app-4 kernel: [] dump_stack+0x19/0x1f
Jan 20 14:46:04 uip-uat-app-4 kernel: [] dump_header+0x90/0x22d
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? ktime_get_ts64+0x52/0xf0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? delayacct_end+0x8f/0xc0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] oom_kill_process+0x2d5/0x4a0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] ? oom_unkillable_task+0xcd/0x120
Jan 20 14:46:04 uip-uat-app-4 kernel: [] out_of_memory+0x31a/0x500
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __alloc_pages_nodemask+0xae4/0xbf0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] alloc_pages_current+0x98/0x110
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __page_cache_alloc+0x97/0xb0
Jan 20 14:46:04 uip-uat-app-4 kernel: [] filemap_fault+0x270/0x420
Jan 20 14:46:04 uip-uat-app-4 kernel: [] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Jan 20 14:46:04 uip-uat-app-4 kernel: [] xfs_filemap_fault+0x2c/0x40 [xfs]
含义:调用栈信息记录了触发 OOM Killer 的代码路径:
- dump_stack 和 dump_header:内核打印调用栈和头信息。
- oom_kill_process:内核开始尝试杀死进程以释放内存。
- out_of_memory:内核的 OOM 处理逻辑。
- __alloc_pages_nodemask 和 alloc_pages_current:内核尝试分配页面失败。
- filemap_fault 和 xfs_filemap_fault:与文件系统相关的内存分配失败,表明可能是文件系统操作触发了 OOM。
6. 内存信息
7. 内存节点信息
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA free:15888kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 20 14:46:04 uip-uat-app-4 kernel: lowmem_reserve[]: 0 2830 15857 15857
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA32 free:64012kB min:12052kB low:15064kB high:18076kB active_anon:1315600kB inactive_anon:465924kB active_file:292kB inactive_file:816kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:2898720kB mlocked:0kB dirty:0kB writeback:24kB mapped:6612kB shmem:7608kB slab_reclaimable:19032kB slab_unreclaimable:15408kB kernel_stack:7808kB pagetables:5304kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11266 all_unreclaimable? yes
Jan 20 14:46:04 uip-uat-app-4 kernel: lowmem_reserve[]: 0 0 13026 13026
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 Normal free:55460kB min:55464kB low:69328kB high:83196kB active_anon:6039504kB inactive_anon:874568kB active_file:580kB inactive_file:528kB unevictable:0kB isolated(anon):36kB isolated(file):0kB present:13631488kB managed:13342488kB mlocked:0kB dirty:8kB writeback:164kB mapped:18660kB shmem:23060kB slab_reclaimable:89568kB slab_unreclaimable:76852kB kernel_stack:23904kB pagetables:27964kB unstable:0kB bounce:0kB free_pcp:396kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:15437 all_unreclaimable? yes
含义:内核打印了不同内存节点(DMA、DMA32、Normal)的详细信息:
- free:空闲内存大小。
- min、low、high:内存水位线,内核会根据这些值决定是否触发内存回收。
- active_anon 和 inactive_anon:匿名内存的活跃和非活跃部分。
- managed:内核管理的内存总量。
- all_unreclaimable? yes:表示所有内存区域都无法回收,进一步说明内存压力很大。
8. 内存分配情况
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA: 04kB 08kB 116kB (U) 032kB 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (EM) = 15888kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 DMA32: 89704kB (EM) 25238kB (UEM) 56716kB (EM) 332kB (UE) 164kB (U) 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 65296kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 Normal: 136754kB (UM) 1978kB (UEM) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 56276kB
**含义:**内核打印了不同内存节点的页面分配情况,显示了每个大小的页面数量和状态(如 (U) 表示未使用,(M) 表示已使用)。
9. 交换空间信息
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 14:46:04 uip-uat-app-4 kernel: 19814 total pagecache pages
Jan 20 14:46:04 uip-uat-app-4 kernel: 11522 pages in swap cache
Jan 20 14:46:04 uip-uat-app-4 kernel: Swap cache stats: add 2345322, delete 2332288, find 3860069/4023116
Jan 20 14:46:04 uip-uat-app-4 kernel: Free swap = 0kB
Jan 20 14:46:04 uip-uat-app-4 kernel: Total swap = 1048572kB
含义:
- hugepages_total=0:没有使用大页。
- Free swap = 0kB:交换空间已用完。
- Total swap = 1048572kB:总交换空间为 1GB。
11. 进程列表
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 511] 0 511 25239 7779 56 51 0 systemd-journal
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 1914] 1101 1914 2165947 147640 590 96974 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [ 8336] 1101 8336 2168447 660825 1582 0 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: [40581] 1101 40581 2746534 650356 1623 0 0 java
Jan 20 14:46:04 uip-uat-app-4 kernel: Out of memory: Kill process 7127 (java) score 153 or sacrifice child
Jan 20 14:46:04 uip-uat-app-4 kernel: Killed process 8336 (java), UID 1101, total-vm:8673788kB,
含义:
列出了系统中各个进程的内存使用情况。
- total_vm:进程的虚拟内存大小。
- rss:进程的常驻内存大小。
- oom_score_adj:进程的 OOM 分数调整值。
- Out of memory: Kill process 7127 (java) score 153 or sacrifice child:内核决定杀死进程 7127(Java),其 OOM 分数为 153。
- Killed process 8336 (java):最终内核杀死了进程 8336(Java),释放其占用的内存。
总结
从日志中可以看出,系统触发 OOM Killer 的原因是:
- 内存不足:系统空闲内存非常低(如 Normal 节点的空闲内存仅为 55MB),且交换空间已用完。
- Java 进程占用过多内存:多个 Java 进程的 total_vm 和 rss 都非常高(如进程 8336 的 total_vm 为 8.6GB,rss 为 660MB)。
- 内存回收失败:内核尝试回收内存,但所有内存区域都无法回收(all_unreclaimable? yes)。
建议
- 优化 Java 内存配置:检查 Java 应用的堆内存(-Xmx、-Xms)和堆外内存配置,避免过度占用内存。
- 增加交换空间:虽然增加交换空间并不能直接解决内存不足的问题,但可以为系统提供更多的缓冲空间。
- 监控和排查内存泄漏:使用工具(如 jmap、jstack)分析 Java 进程的内存使用情况,排查是否存在内存泄漏。
- 调整 OOM Killer 行为:通过修改 /proc//oom_score_adj 文件,将 Java 进程的 OOM 分数调整为较低值(如 -1000),减少被杀掉的概率。
- 增加物理内存:如果条件允许,增加服务器的物理内存。