Oracle 11.2.0.2.0 RAC环境一次内存溢出ORA-04031问题的处理

本文记录了一次Oracle 11.2.0.2.0 RAC环境下出现的ORA-04031共享池内存不足的问题及处理过程。通过对SAR报告、ASH、AWR等工具的分析,定位到问题SQL并给出解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Oracle 11.2.0.2.0 RAC环境一次内存溢出ORA-04031问题的处理

环境信息

数据库版本:11.2.0.2.0

操作系统版本:AIX 6.1

一.现象

使用sar 观察cpu使用情况,注意观察 %usr 参数

1.RAC环境中其中一台服务器node2负载上升(平时为idle为90左右),node1数据库正常,下面为node2当时的SAR报告

bash-3.00#sar -f sa15|more

AIXp740db2 1 6 00F688714C00 08/15/12

Systemconfiguration: lcpu=32 mode=Capped

00:00:00%usr %sys %wio%idle physc

16:15:00 5 2 4 89 8.00

16:16:008 50 87 8.00

16:17:00 36 6 0 57 8.00

16:18:0050 54 42 8.00

16:19:007 27 85 8.00

16:20:006 25 88 8.00

16:21:006 28 84 8.00

16:22:006 29 83 8.00

16:23:007 27 84 8.00

16:24:006 25 87 8.00

16:25:006 26 87 8.00

16:26:006 26 86 8.00

16:27:005 25 89 8.00

16:28:005 25 88 8.00

16:29:005 24 89 8.00

16:30:005 24 90 8.00

16:31:0025 33 70 8.00

16:32:0163 20 34 8.00

16:33:0027 20 71 8.00

16:34:0046 20 51 8.00

16:35:0066 20 32 8.00

16:36:0137 20 61 8.00

16:37:0027 20 72 8.00

16:38:0015 10 83 8.00

16:39:0029 20 69 8.00

16:40:0041 20 57 8.00

16:41:0045 20 53 8.00

16:42:0047 20 51 8.00

16:43:0039 20 59 8.00

16:44:0050 21 48 8.00

16:45:0015 25 78 8.00

16:46:001 13 95 8.00

16:47:006 21 91 8.00

2.登陆数据库一直挂起,长时间无响应,查看alert log,发现从16:15开始报ORA-04031 shared pool不够的错

ArchivedLog entry 15904 added forthread 2 sequence 2418 ID 0xffffffffd412add4 dest 1:

WedAug 15 16:15:09 2012

Errorsin file/db/diag/rdbms/xxxxshprod/xxxxshpd2/trace/xxxxshpd2_ora_24707320.trc(incident=5241532):

ORA-04031:鎊谞娉^斟^萟崎^臹?32瀛^阻^蔨络^赹腻^疟浜 ^芧佩璣?("shared pool","selectcompany_id,user_id fr...","KGLH0^28668775","k

glHeapInitialize:temp")

Incidentdetails in:/db/diag/rdbms/xxxxshprod/xxxxshpd2/incident/incdir_5241532/xxxxshpd2_ora_24707320_i5241532.trc

WedAug 15 16:15:09 2012

Noncritical error ORA-48913 caught while writing to trace file"/db/diag/rdbms/xxxxshprod/xxxxshpd2/trace/xxxxshpd2_ora_40042558.trc"

Writingto the above trace file is disabled for now on...

Noncritical error ORA-48913 caught while writing to trace file"/db/diag/rdbms/xxxxshprod/xxxxshpd2/incident/incdir_5241532/xxxxshp

d2_ora_24707320_i5241532.trc"

Writingto the above trace file is disabled for now on...

WedAug 15 16:16:02 2012

Errorsin file/db/diag/rdbms/xxxxshprod/xxxxshpd2/trace/xxxxshpd2_ora_36044968.trc(incident=5241340):

ORA-04031:unable to allocate 32 bytes of shared memory ("sharedpool","select count(*) as totalrec...","KGLH0^c0f94bbb","kglHeapInitialize:temp")

二.诊断过程

1.查看这段时间的ASH可以得知过去数据库发生了什么事,其中最严重的等待事件为“cursor: pin S wait on X

SQL> select * from (

select /*+ rule */

event,count(1) cnt

from dba_hist_active_sess_history

where sample_time > to_date('2013-09-2310:15:00', 'yyyy-MM-dd HH24:mi:ss')

and sample_time <to_date('2013-09-23 11:40:00', 'yyyy-MM-dd HH24:mi:ss')

group by event

order by cnt desc) whererownum<=5;

EVENTCNT

--------------------------------------------------------------------------

cursor:pinwaitonX944latch:sharedpool 305 dbfilesequentialread277nullevent95library cache: mutex 87

SQL>

--根据等待事情查找相关的SQL

SQL> select sql_id,count(1)

fromdba_hist_active_sess_history

where

sample_time > to_date('2013-09-2316:15:00', 'yyyy-MM-dd HH24:mi:ss')

and sample_time < to_date('2013-09-2316:40:00', 'yyyy-MM-dd HH24:mi:ss')

and event = 'cursor: pin S wait on X'

group by sql_id;

SQL_IDCOUNT(1)

------------- ----------

22bb9t1167adc5

6yx2qgbpw6j8x909

bunvx480ynf5722

dcrjrtxqrsrsw8

根据上面结果,有问题的SQL_id为6yx2qgbpw6j8x,根据SQL_ID查找问题的SQL

Select *from v$sql where sql_id='6yx2qgbpw6j8x'

取得的SQL如下:

selectcompany_id,user_id from t_xxxx_user where user_name=:1

同时查看这个SQL的执行计划是否正常,通过下面结果可以看到,这个SQL非常简单,全表只有883条记录,查询时走的是唯一索引

SQL>

SQL> --查看历吏执行计划

SQL> SELECT sp.id,

2lpad(' ', sp.DEPTH * 3, ' ') || sp.OPERATION || ' ' || sp.OPTIONS operation,

3sp.OBJECT_NAME

4 FROM dba_hist_sql_plan sp

5 WHERESQL_ID ='6yx2qgbpw6j8x'

6 ORDER BY ID;

ID OPERATIONOBJECT_NAME

-------------------------------------------------------------------------------------------------------------------------

0SELECTSTATEMENT

1TABLEACCESSBYINDEXROWIDT_XXXX_USER

2 INDEX UNIQUESCANU_TXXXX_USER__USER_NAME

SQL> select * fromtable(dbms_xplan.display_awr('6yx2qgbpw6j8x'));

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 6yx2qgbpw6j8x

--------------------

select company_id,user_id from t_xxxx_user whereuser_name=:1

Plan hash value: 377560326

--------------------------------------------------------------------------------

| Id |Operation|Name| Rows | Bytes |

--------------------------------------------------------------------------------

| 0 | SELECTSTATEMENT|| | |

| 1 | TABLE ACCESS BY INDEX ROWID|T_XXXX_USER| 1 | 20 |

| 2 | INDEX UNIQUESCAN | U_TXXXX_USER__USER_NAME| 1 | |

--------------------------------------------------------------------------------

14 rows selected

2.分析这期间的AWR报告如下

产生AWR比对报告

@?/rdbms/admin/awrddrpt

比较出问题时的AWR和正常情况下的AWR,发现平均每秒的硬解析和软解析没有明显的增长,再比较了AWR中的SQL ordered by Parse Calls和SQL ordered by Version Count都和平时差不多,包括SQL_ID为6yx2qgbpw6j8x的SQL的执行次数,初步怀疑为oraclebug

出问题期间的AWR

Per Second

Per Transaction

Per Exec

Per Call

DB Time(s):

13.8

0.3

0.02

0.01

DB CPU(s):

0.7

0.0

0.00

0.00

Redo size:

194,547.2

4,767.2

Logical reads:

7,571.0

185.5

Block changes:

815.7

20.0

Physical reads:

296.2

7.3

Physical writes:

71.7

1.8

User calls:

1,339.8

32.8

Parses:

497.4

12.2

Hard parses:

18.2

0.5

W/A MB processed:

1.3

0.0

Logons:

0.1

0.0

Executes:

774.7

19.0

Rollbacks:

0.0

0.0

Transactions:

40.8

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %:

99.90

Redo NoWait %:

100.00

Buffer Hit %:

96.12

In-memory Sort %:

100.00

Library Hit %:

92.39

Soft Parse %:

96.35

Execute to Parse %:

35.79

Latch Hit %:

99.73

Parse CPU to Parse Elapsd %:

2.49

% Non-Parse CPU:

73.0

正常情况下的AWR

Load Profile

Per Second

Per Transaction

Per Exec

Per Call

DB Time(s):

2.7

0.1

0.00

0.00

DB CPU(s):

0.3

0.0

0.00

0.00

Redo size:

182,077.7

4,810.9

Logical reads:

9,038.7

238.8

Block changes:

756.9

20.0

Physical reads:

338.2

8.9

Physical writes:

78.9

2.1

User calls:

1,284.9

34.0

Parses:

482.7

12.8

Hard parses:

17.5

0.5

W/A MB processed:

1.0

0.0

Logons:

0.1

0.0

Executes:

744.9

19.7

Rollbacks:

0.0

0.0

Transactions:

37.9

Instance Efficiency Percentages (Target100%)

Buffer Nowait %:

99.94

Redo NoWait %:

100.00

Buffer Hit %:

96.29

In-memory Sort %:

100.00

Library Hit %:

92.88

Soft Parse %:

96.37

Execute to Parse %:

35.20

Latch Hit %:

99.78

Parse CPU to Parse Elapsd %:

16.07

% Non-Parse CPU:

92.86

3.metalink1349387.1中发现下面一段话,If thedatabasehas been migrated from10gto11gand mutexperformance issue surfaces,please consider the 11.2.0.2.2 psu + fix for12431716. Many mutex fixes are already included in this patch,查看Bug12431716的说明,版本和现象都比较相似,oracle强烈推荐在11.2.0.2版本中打上补丁12431716

Troubleshooting 'cursor: pin S wait on X' waits. [ID1349387.1]

To Bottom


Modified:Jan 7, 2012Type:TROUBLESHOOTINGStatus:PUBLISHEDPriority:1

Comments (0)

In this Document
Purpose
LastReview Date
Instructionsfor the Reader
TroubleshootingDetails
Whatis a 'Cursor: pin S wait on X' wait?
Whatcauses 'Cursor: pin S wait on X' waits?
How todiagnose the cause.
How toExamine the Diagnostics.
PotentialSolutions
TroubleshootingOther Issues
References


Applies to:

OracleServer- Enterprise Edition - Version:10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.

Purpose

The purpose of the note is to helpcustomers troubleshoot 'Cursor: pin S wait on X' waits.

Last Review Date

August 17, 2011

Instructions for the Reader

A Troubleshooting Guide is provided toassist in debugging a specific issue. When possible, diagnostic tools areincluded in the document to assist in troubleshooting.

Troubleshooting Details

What is a 'Cursor: pin S wait on X'wait?

A cursor wait is associated with parsing in some form. Asession may wait for this event when it is trying to get a mutex pin in Sharemode but another session is holding the mutex pin on the same cursor object inexclusive. Frequently, waits for 'Cursor: pin S wait on X' is a symptomand not the cause. There may be underlying tuning requirements or knownissues.

What causes 'Cursor: pin S wait on X'waits?

·Frequent Hard Parses
If the frequency of Hard Parsing is extremely high, then contention can occuron this pin.

·High Version Counts
When Version counts become excessive, a long chain of versions needs to beexamined and this can lead to contention on this event

·Known bugs

How to diagnose the cause.

Obtain information and diagnostics to help locate thecause.

1. Run AWR and ADDM reports during the time that you see 'Cursor: pin S wait onX' waits and also collect baseline reports when you do not see the waits tomake comparisons. The baseline shows us the typical 'background'concurrency and activity going on during the time of both the issue and thebaseline and may help identify (for example) culprit SQLs with highversion counts.

To gather this it is suggested to run AWR and ADDM for half an hour to an hourinterval as follows:

SQL>@$ORACLE_HOME/rdbms/admin/awrrpt.sql

SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql



2. Sometimes system state dump is necessary to match known issues. Forexample, if there is no obvious candidate SQL in AWR, capturing holder orwaiter processes in systemstate allows you to focus in on potential problems.Run system state when processes appear hung on 'Cursor: pin S wait on X' waits:

(a) Non-Rac
sqlplus "/as sysdba"

oradebugsetmypid
oradebug unlimit
oradebug dumpsystemstate 266
wait 90 seconds
oradebug dumpsystemstate 266
wait 90 seconds
oradebug dumpsystemstate 266
quit

(b)RAC
$ sqlplus '/ assysdba'
oradebugsetmypid
oradebug unlimit
oradebug setinstall
oradebug -g allhanganalyze 4
oradebug -g alldump systemstate 266
quit



3. Errorstacks: Another way to obtain process information is witherrorstack. Assuming you can identify a blocker, taking errorstacks willprovide much the same information as systemstates but with a much reduced diskfootprint for trace. Once the ospid of the blocker has been found, anerrorstack can be generated:


$ sqlplus
SQL> oradebugsetospid <p.spid from above>
oradebug dumperrorstack 3
<< wait1min>>
oradebug dumperrorstack 3
<< wait1min>>
oradebug dumperrorstack 3
exit


In particular, the stack from the resultant trace can be used to match knownissues.

The system state and errorstacks are not easily readable; so a Service Requestmay need to be opened to read the files.

4. It is not always feasible to run system state dump. Thus, this note onfinding the blocker can be used as well:

Note:786507.1How toDetermine the Blocking Session for Event: 'cursor: pin S wait on X'


5. Furthermore, the following sqls can be ran to identify the waiters:

SELECT s.sid, t.sql_text
FROM v$sessions, v$sql t
WHERE s.eventLIKE '%cursor: pin S wait on X%'
AND t.sql_id =s.sql_id


6. In 11g RAC, there is another less resource intensive tool that can be usedwhen compared with taking system state dumps:

Note:459694.1Procwatcher:Script. to Monitor and Examine Oracle DB and Clusterware Processes

How to Examine the Diagnostics.

1. Look for high parsing and high version counts fromAWR.
Click on*SQL Statistics under Main Report of AWR:



Then, under SQL Statistics click on 'SQL ordered by Parse Calls' or 'SQLordered by Version Count' to view that information:

SQL ordered by Parse Calls

* Total Parse Calls: 2,935,390
* Captured SQLaccount for 95.6% of Total

Parse CallsExecutions % Total Parses SQL Id SQL Module SQL Text
668,174 668,01422.76 gpkdazvcfwkv select DATA_TABLE (...
667,440 668,23422.74 5p9vjzht9jqb INSERT INTO DATA_TABLE (DATA_I...


From this list, investigate the top SQLs to determine whether this volume ofparse calls is excessive or can be reduced.

SQL ordered by Version Count

* Only Statements with Version Countgreater than 20 are displayed

Version CountExecutions SQL Id SQL Module SQL Text
277 392,73730d5a6v16mpb select FLOW_CONTEXT (...
78 131,1047c0gj35488xs INSERT INTO PROJECT (DOC_ID, ...


From this list, investigate the SQLs with the high version count. What are thereasons that these statements are not shared? Can this be addressed?

2. From systemstate and errorstack, we are looking for stacks to match knownissues.

For systemstates/errorstacks to be useful, it relies them being taken "atthe right time". On a fast moving system, processes may have moved on soholders and waiters are not shown. In a perfect example the systemstatesshould show the blocker and processes being blocked or one of these. Thestack from blocker may indicate known issues.

Potential Solutions

1. Tune sqls with high parse count by eitherinvestigating the application or the sql.

From AWR, the parse statistics is at the top of the report under load profile:





In this case there are mainly soft parses, however, if there were high hardparses, this may indicate high usage of literals or introduction of many newsqls. In this case consider using bind variables in place ofliterals.

If there is high percentage of soft parses, then check the application to seeif it is using shareable sqls.

Ideally, Execute to Parse should be closer to 100%. The applicationshould parse less and execute many times. See:

Note:62143.1Understandingand Tuning the Shared Pool


Also remember if the shared pool is flushed, then sqls will need to be hardparsed. This also may cause mutex waits. So make sure sqls are inmemory once hard parsed and monitor to see if mutex waits arealleviated.

2. High version counts can also cause 'cursor: pin S wait on X' waits.

Check V$SQL_SHARED_CURSOR to see the potential reason for the high versioncount using:

Note:438755.1FormatedV$SQL_SHARED_CURSOR Report by SQLID or Hash Value
Note:296377.1Troubleshooting:High Version Count Issues (Doc ID 296377.1)


There are some notable bugs with high version counts:

Note:10157392.8Bug10157392 - High version counts for SQL with binds (BIND_MISMATCH)
Note:9689310.8Bug9689310 - Excessive child cursors / high VERSION_COUNT / OERI:17059 due to bindmismatch


3. For more known defects, please go to following note and click on known bugs:

Note:1298015.1WAITEVENT:"cursor: pin S wait on X" Reference Note


Click on the version that applies and review bug or bugs with similar scenario.

4. If the database has been migrated from 10g to 11g and mutex performanceissue surfaces,
please consider the 11.2.0.2.2 psu + fix for 12431716. Manymutex fixes are already included in this patch:

Note:1291879.1Oracle Database Patch Set Update11.2.0.2.2 Known Issues




Troubleshooting Other Issues

For guidance troubleshooting other performance issuessee:

Document 1377446.1TroubleshootingPerformance Issues

References

NOTE:1291879.1- Oracle DatabasePatch Set Update 11.2.0.2.2 Known Issues
NOTE:1298015.1-WAITEVENT: "cursor: pin S wait on X" Reference Note
NOTE:296377.1-Troubleshooting: High Version Count Issues
NOTE:459694.1-Procwatcher: Script. to Monitor and Examine Oracle DB and Clusterware Processes
NOTE:786507.1- How toDetermine the Blocking Session for Event: 'cursor: pin S wait on X'
NOTE:1356828.1- FAQ:'cursor: ' waits


Attachments

·Parse_Vers.png(13.75 KB)

·Parses.png(16.69 KB)

·SQL_Stats_link.png(9.94 KB)

·Soft_Parse.png(13.39 KB)

Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU - Overlay fix Recommended [ID 1321817.1]

To Bottom


Modified:May 17, 2011Type:ALERTStatus:PUBLISHEDPriority:1

Comments (0)

In this Document
Description
Likelihoodof Occurrence
PossibleSymptoms
Workaroundor Resolution
Patches
ModificationHistory
References


Applies to:

Oracle Server - Enterprise Edition -Version: 11.2.0.2 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
This issue affects the following server releases only:
- 11.2.0.2.2 Database Patch Set Update (PSU) [ 11724916 ]
- 11.2.0.2.2 Grid Infrastructure Patch Set Update (GI PSU) [ 12311357 ]
- Windows 11.2.0.2 Patch bundles 4, 5 and 6.

This issue also affects 11.2.0.2.1 interim overlay patches for Bug 10411618 onUnix platforms downloaded before 14/May/2011.
(Interim patches for other versions are NOT affected. Interim patchesdownloaded on or after 14/May/2011 are NOT affected).

This issue does NOT affect any of the Exadata Database Recommended Patches.

Description

An incomplete fix in the above releasescan cause database mutex waits to be accompanied by higher CPU usage thanexpected.

Likelihood of Occurrence

This problem is most likely to impactlarge heavily concurrent systems especially if those systems have some degreeof mutex contention.

Possible Symptoms

There are 2 mains symptoms of theproblem:

1. Higher CPU usagemay be seen for mutex waits compared to earlier releases. This may appear asCPU spikes or as general higher CPU usage levels depending on the nature of thecontention.

2. The hiddenparameters described inNote:10411618.8, which allowconfiguration of the mutex wait scheme, do not work as expected. Notablysetting "_mutex_wait_scheme" has no effect.

Workaround or Resolution

·For non-Windows platforms it is recommended toapplyPatch:12431716on top of the 11.2.0.2.2 PSU.

·For Windows platforms on 11.2.0.2 bundle patch 4, 5 or 6it is recommended to apply bundle patch 7 when available. If that patch cannotbe applied then it is recommended to set "_mutex_wait_time"=8 atinstance level until such time as it is applied.

·11.2.0.2.1 systems with overlay patches for bug 10411618from before 14/May/2011 are advised to replace the patch with the latestversion.

·Once the relevant patch is in place then the mutex waitscheme can be configured as described inNote:10411618.8ifrequired.

Patches

·For non-Windows platforms with 11.2.0.2.2 PSU installedit is recommended to apply overlayPatch:12431716(or any patch containing thatfix).

·For Windows platforms it is recommended to apply 11.2.0.2bundle patch 7 (or later) once available.

·11.2.0.2.1 overlay patches for 10411618 on non-Windowsplatforms were reloaded on 14/May/2011. If you have an 11.2.0.2.1 overlay patchfor bug 10411618 from before that date then it is advisable to replace it withthe current version.
eg: Rollback the installed patch, download the latest version of
Patch:10411618for 11.2.0.2.1, apply thereplacement patch.

·The problem will be fixed generally in the next 11.2.0.2Patch Set Update (11.2.0.2.3) due for release in July 2011.

Modification History

13-May-2011 - Created
16-May-2011 - Confirm this does NOT affect Exadata Recommended Patches
17-May-2011 - Add references to 11.2.0.2.2 PSU / GI PSU Known Issues(whichmention this problem)

References

NOTE:10411618.8- Bug10411618 - Enhancement to add different "Mutex" wait schemes
NOTE:11724916.8- Bug11724916 - 11.2.0.2.2 Patch Set Update (PSU)
NOTE:12311357.8- Bug12311357 - 11.2.0.2.2 Grid Infrastructure Patch Set Update (GI PSU)
NOTE:12431716.8- Bug12431716 - Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU
NOTE:1272288.1-11.2.0.2.X Grid Infrastructure Bundle/PSU Known Issues
NOTE:1291879.1- OracleDatabase Patch Set Update 11.2.0.2.2 Known Issues

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值