说明:
本次案例非常简单,达梦数据库内存参数设置不合理,内存分配过大,最终导致系统内存资源耗尽,dmserver进程自动退出。
环境说明:
OS:Kylin Linux V10 SP1
DB:8.1.3.26
问题现象:
测试达梦数据库库连不上,dmserver进程没了;
问题分析:
检查达梦日志,没有实例中断、退出、异常等信息。
dmdba@cjc-db-01:/dm/dmdbms/log$tail -100f dm_OATEST_202406.log
......
2024-06-17 18:07:53.736 [INFO] database P0003615846 T0000000000003615871 checkpoint begin, used_space[421888], free_space[536440832]...
2024-06-17 18:07:58.833 [INFO] database P0003615846 T0000000000003615871 ckpt2_log_adjust: full_status: 160, ptx_reserved: 0
2024-06-17 18:08:00.272 [INFO] database P0003615846 T0000000000003615871 ckpt2_log_adjust: ckpt_lsn(129863537), ckpt_fil(0), ckpt_off(236103680), cur_lsn(129864731), l_next_seq(23764148), g_next_seq(23764148), cur_free(236516864), total_space(536862720), used_space(413184), free_space(536449536), n_ep(1)
2024-06-17 18:08:02.513 [INFO] database P0003615846 T0000000000003615871 checkpoint end, 0 pages flushed, used_space[413184], free_space[536449536].
2024-06-17 18:11:12.883 [INFO] database P0003615846 T0000000000003615916 checkpoint requested by CKPT_INTERVAL, rlog free space[536442368], used space[420352]
......
检查达梦数据库core目录,没有core文件生成。
检查历史执行命令,没有相关的操作。
history
检查操作系统日志,可以看到,操作系统OOM,自动杀掉了dmserver
Out of memory: Kill process 3615846 (dmserver) score 706 or sacrifice child
详细日志如下:
tail -n xxx /var/log/message
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401855] dm_sql_thd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401856] dm_sql_thd cpuset=/ mems_allowed=0
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401863] Call Trace:
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401871] dump_stack+0x66/0x8b
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401968] Mem-Info:
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401973] active_anon:3390823 inactive_anon:408959 isolated_anon:0#012 active_file:210 inactive_file:603 isolated_file:36#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:11970 slab_unreclaimable:18197#012 mapped:315 shmem:365 pagetables:11200 bounce:0#012 free:33057 free_pcp:1969 free_cma:0
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402005] 1796 total pagecache pages
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402007] 476 pages in swap cache
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402008] Swap cache stats: add 9558818, delete 9558297, find 34404375/34807113
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402009] Free swap = 0kB
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402009] Total swap = 2097148kB
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 4194078 pages RAM
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 0 pages HighMem/MovableOnly
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 0 pages HighMem/MovableOnly
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402011] 286657 pages reserved
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402011] 0 pages hwpoisoned
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402012] Tasks state (memory values in pages):
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402012] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402021] [ 743] 0 743 9341 14 98304 557 -1000 systemd-udevd
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402024] [ 988] 0 988 740 0 40960 30 0 mdadm
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402085] [ 103036] 631 103036 84105 247 245760 1022 0 redis-server
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402088] [ 169561] 661 169561 102507 5 397312 891 0 dmap
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402142] [3615846] 661 3615846 4551575 2996132 30134272 127199 0 dmserver
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402188] Out of memory: Kill process 3615846 (dmserver) score 706 or sacrifice child
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402361] Killed process 3615846 (dmserver) total-vm:18206300kB, anon-rss:11984528kB, file-rss:0kB, shmem-rss:0kB
......
当前内存相关参数配置如下:
cat /etc/sysctl.conf |grep vm.overcommit_memory
vm.overcommit_memory=1
配置参数 overcommit_memory 表示系统的内存分配策略可以选值为 0,1,2。
0:表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。
1:表示内核允许分配所有的物理内存,不管当前的内存状态如何。
2:表示内核允许分配超过所有物理内存和交换空间总和的内存。
检查数据库内存参数:
dmdba@cjc-db-01:/dm/data/cjc$more dm.ini
......
MAX_OS_MEMORY = 100 #Maximum Percent Of OS Memory
MEMORY_POOL = 500 #Memory Pool Size In Megabyte
MEMORY_TARGET = 15000 #Memory Share Pool Target Size In Megabyte
BUFFER = 6144 #Initial System Buffer Size In Megabytes
......
解决方案:
调低达梦内存参数MEMORY_TARGET、BUFFER等,减少内存占用,避免系统内存资源紧张,除此以外,也可以考虑调整Linux OOM-killer策略,最后重新启动达梦数据库。
###chenjuchao 20240619###
欢迎关注我的公众号《IT小Chen》