http://t.askmaclean.com/thread-897-1-1.html
环境:
1.OS:
[root@11grac1 ~]# cat /etc/issue
Oracle Linux Server release 6.3
Kernel \r on an \m
2.oracle 数据库版本:
[oracle@11grac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Fri Feb 15 17:39:25 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> desc v$version
Name Null? Type
----------------------------------------- -------- ----------------------------
BANNER VARCHAR2(80)
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
3.各rac节点内存:
[root@11grac1 ~]# free -m
total used free shared buffers cached
Mem: 3016 2385 631 0 134 879
-/+ buffers/cache: 1372 1644
Swap: 3071 137 2934
4.出现的问题,异常现象
[root@11grac1 ~]# dstat --top-io --top-bio -dr
----most-expensive---- ----most-expensive---- -dsk/total- --io/total-
i/o process | block i/o process | read writ| read writ
ologgerd 44M 43M|ologgerd 4138B 8406k| 139k 9346k|7.80 137
ologgerd 236k 832k|ora_ckpt_ra 128k 48k| 144k 74k|9.00 12.0
ologgerd 64M 81M|ologgerd 0 21M| 32k 24M|2.00 377
ologgerd 137M 136M|ologgerd 0 22M| 32k 25M|2.00 394
ologgerd 98M 72M|ora_ckpt_ra 64k 48k| 80k 307k|5.00 67.0
osysmond.bi 492k 0 |ora_lmon_ra 32k 0 | 32k 1536B|2.00 3.00
ologgerd 248k 940k|ora_lmon_ra 16k 0 | 16k 1536B|1.00 3.00
[root@11grac1 ~]# iostat 2
Linux 2.6.39-300.26.1.el6uek.x86_64 (11grac1) 02/15/2013 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
13.29 0.00 10.39 18.32 0.00 58.00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 134.07 144.57 18631.06 3950612 509135788
sdb 1.36 1.20 3.59 32673 98077
sdc 1.38 1.03 3.81 28147 104055
sdd 1.39 1.16 3.82 31581 104383
sde 3.83 89.54 20.85 2446835 569831
sdf 2.22 38.06 20.85 1040170 569831
sdg 0.85 1.64 15.23 44734 416167
avg-cpu: %user %nice %system %iowait %steal %idle
10.28 0.00 7.27 8.27 0.00 74.19
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 168.50 0.00 22432.00 0 44864
sdb 1.50 0.00 5.00 0 10
sdc 1.50 0.00 5.00 0 10
sdd 1.50 0.00 5.00 0 10
sde 3.50 80.00 20.00 160 40
sdf 4.00 96.00 20.00 192 40
sdg 1.00 0.00 20.00 0 40
avg-cpu: %user %nice %system %iowait %steal %idle
18.95 0.00 14.96 6.98 0.00 59.10
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 209.50 0.00 28736.00 0 57472
sdb 1.50 0.00 5.00 0 10
sdc 1.50 0.00 5.00 0 10
sdd 1.50 0.00 5.00 0 10
sde 2.00 48.00 4.00 96 8
sdf 0.50 0.00 4.00 0 8
sdg 0.50 0.00 4.00 0 8
是一个叫ologgerd的进程长时间占用大量的磁盘io
ologgerd 是什么进程,
网上别人的文章:http://blog.youkuaiyun.com/jjwspj/article/details/7857106
oracle官方文档:http://docs.oracle.com/cd/E11882_01/rac.112/e16794/troubleshoot.htm#autoId0
Cluster Health Monitor(以下简称CHM)是一个Oracle提供的工具,用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。
这些系统资源数据对于诊断集群系统的节点重启、Hang、实例驱逐(Eviction)、性能问题等是非常有帮助的。另外,用户可以使用CHM来及早发现一些系统负载高、内存异常等问题,从而避免产生更严重的问题。
CHM会自动安装在下面的软件:
11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
解决方法:
需要安装11.2.0.3.1的PSU:p13348650_112030_Linux-x86-64.zip
但没有metalink帐号,打不了补丁,
所以只有关闭所有节点的CHM服务:
[grid@11grac2 ~]$ crsctl status res -t -init | grep ora.crf
ora.crf
[grid@11grac2 ~]$ crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on '11grac2'
CRS-2677: Stop of 'ora.crf' on '11grac2' succeeded