银河麒麟系统下达梦数据库被异常Kill

c040360a23848df2c3e8a61855232fbe.jpg

说明:

本次案例非常简单,达梦数据库内存参数设置不合理,内存分配过大,最终导致系统内存资源耗尽,dmserver进程自动退出。

环境说明:

OS:Kylin Linux V10 SP1
DB:8.1.3.26

问题现象:

测试达梦数据库库连不上,dmserver进程没了;

问题分析:

检查达梦日志,没有实例中断、退出、异常等信息。

dmdba@cjc-db-01:/dm/dmdbms/log$tail -100f dm_OATEST_202406.log
......

2024-06-17 18:07:53.736 [INFO] database P0003615846 T0000000000003615871  checkpoint begin, used_space[421888], free_space[536440832]...
2024-06-17 18:07:58.833 [INFO] database P0003615846 T0000000000003615871  ckpt2_log_adjust: full_status: 160, ptx_reserved: 0
2024-06-17 18:08:00.272 [INFO] database P0003615846 T0000000000003615871  ckpt2_log_adjust: ckpt_lsn(129863537), ckpt_fil(0), ckpt_off(236103680), cur_lsn(129864731), l_next_seq(23764148), g_next_seq(23764148), cur_free(236516864), total_space(536862720), used_space(413184), free_space(536449536), n_ep(1)
2024-06-17 18:08:02.513 [INFO] database P0003615846 T0000000000003615871  checkpoint end, 0 pages flushed, used_space[413184], free_space[536449536].
2024-06-17 18:11:12.883 [INFO] database P0003615846 T0000000000003615916  checkpoint requested by CKPT_INTERVAL, rlog free space[536442368], used space[420352]
......

检查达梦数据库core目录,没有core文件生成。
检查历史执行命令,没有相关的操作。

history

检查操作系统日志,可以看到,操作系统OOM,自动杀掉了dmserver

Out of memory: Kill process 3615846 (dmserver) score 706 or sacrifice child

详细日志如下:

tail -n xxx /var/log/message
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401855] dm_sql_thd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401856] dm_sql_thd cpuset=/ mems_allowed=0
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401863] Call Trace:
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401871]  dump_stack+0x66/0x8b
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401968] Mem-Info:
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.401973] active_anon:3390823 inactive_anon:408959 isolated_anon:0#012 active_file:210 inactive_file:603 isolated_file:36#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:11970 slab_unreclaimable:18197#012 mapped:315 shmem:365 pagetables:11200 bounce:0#012 free:33057 free_pcp:1969 free_cma:0
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402005] 1796 total pagecache pages
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402007] 476 pages in swap cache
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402008] Swap cache stats: add 9558818, delete 9558297, find 34404375/34807113
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402009] Free swap  = 0kB
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402009] Total swap = 2097148kB
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 4194078 pages RAM
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 0 pages HighMem/MovableOnly
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402010] 0 pages HighMem/MovableOnly
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402011] 286657 pages reserved
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402011] 0 pages hwpoisoned
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402012] Tasks state (memory values in pages):
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402012] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402021] [    743]     0   743     9341       14    98304      557         -1000 systemd-udevd
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402024] [    988]     0   988      740        0    40960       30             0 mdadm
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402085] [ 103036]   631 103036    84105      247   245760     1022             0 redis-server
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402088] [ 169561]   661 169561   102507        5   397312      891             0 dmap
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402142] [3615846]   661 3615846  4551575  2996132 30134272   127199             0 dmserver
......
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402188] Out of memory: Kill process 3615846 (dmserver) score 706 or sacrifice child
Jun 17 18:11:22 cjc-db-01 kernel: [42256179.402361] Killed process 3615846 (dmserver) total-vm:18206300kB, anon-rss:11984528kB, file-rss:0kB, shmem-rss:0kB
......

当前内存相关参数配置如下:

cat /etc/sysctl.conf |grep vm.overcommit_memory
vm.overcommit_memory=1

配置参数 overcommit_memory 表示系统的内存分配策略可以选值为 0,1,2。

0:表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。

1:表示内核允许分配所有的物理内存,不管当前的内存状态如何。
2:表示内核允许分配超过所有物理内存和交换空间总和的内存。

检查数据库内存参数:

dmdba@cjc-db-01:/dm/data/cjc$more dm.ini 
......

MAX_OS_MEMORY                   = 100                   #Maximum Percent Of OS Memory
MEMORY_POOL                     = 500                   #Memory Pool Size In Megabyte
MEMORY_TARGET                   = 15000                 #Memory Share Pool Target Size In Megabyte
BUFFER                          = 6144                  #Initial System Buffer Size In Megabytes
......​

解决方案:

调低达梦内存参数MEMORY_TARGET、BUFFER等,减少内存占用,避免系统内存资源紧张,除此以外,也可以考虑调整Linux OOM-killer策略,最后重新启动达梦数据库。

###chenjuchao 20240619###

欢迎关注我的公众号《IT小Chen

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值