一、现象
操作系统内存使用率告警,已达到98%,,告警内容如下:
【全景监控:Oracle主机内存使用监控】
【主机名】:XXXXX11
【主机IP】主机IP:*.126.15
【告警内容】当前内存使用率为98.9%,超警戒值90%
请及时处理
二、分析
1、查看操作系统内存使用情况
free -g
top
M
2、查看sga和pga分配的内存分别是:140g和40g。
3、集群日志发现如下报错
2022-03-20 20:22:19.247:
[ohasd(30064)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /u01/app/11.2.0/grid/auth/ohasd/gw11/A1076079
2022-03-20 20:22:19.248:
[ohasd(30064)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /u01/app/11.2.0/grid/auth/ohasd/gw11/A0906893
4、查看内存排行榜,top M ohas.d占用率最高
5、根据alert的报错日志查看空间使用率77%
[grid@gw11 crsd]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_user01-LogVol04
50G 4.5G 43G 10% /
tmpfs 126G 736M 126G 1% /dev/shm
/dev/sda1 190M 42M 139M 24% /boot
/dev/mapper/vg_user01-LogVol01
20G 45M 19G 1% /tmp
/dev/mapper/vg_user01-LogVol02
148G 108G 33G 77% /u01
/dev/mapper/vg_user01-LogVol00
9.8G 636M 8.6G 7% /var
/dev/mapper/vg_user01-ogg_vg
493G 8.8G 459G 2% /ogg
/dev/sds 2.0T 76G 1.8T 4% /nfs
6、当前空间看并没有不足,进一步查看删除未释放的空间,发现异常,存在大量删除未释放的文件,至此问题原因已明确,由于在删除文件的时候,进程占用,导致空间未能释放,从而引起空间不足。
[root@gw11 oraagent_grid]# lsof |grep delete
oracle 47876 oracle 3u REG 253,5 7607 21495944 /ogg/dirrpt/R_P_JF9.rpt (deleted)
oracle 47876 oracle 5w REG 253,5 10490641 114 /ogg/ggserr.log.9 (deleted)
oracle 47876 oracle 8w REG