AIX: ORA-29770: LMS0 (OSID 123) is hung for more than 70 seconds in 'gcs remote message'

本文记录了Oracle数据库11.2.0.4在IBM AIX系统上出现的ORA-29770错误,详细描述了LMS0进程在等待'gcsremotemessage'事件时挂起超过70秒导致实例终止的情况,并提供了相应的解决办法。

参考自:

AIX: ORA-29770: LMS0 (OSID 123) is hung for more than 70 seconds in 'gcs remote message' (Doc ID 2237182.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
IBM AIX on POWER Systems (64-bit)

SYMPTOMS

Instance1 terminated by lmhb as LMS0 waiting for "gcs remote message" for too long:

Sun Jan 01 15:14:38 2017
Archived Log entry 33282 added for thread 1 sequence 10837 ID 0xffffffffff7b47ac dest 1:
Sun Jan 01 15:25:02 2017
LMS0 (ospid: 33621970) waits for event 'gcs remote message' for 83 secs. 
Sun Jan 01 15:25:17 2017
Errors in file /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/trace/FCUBSPRD1_lmhb_30148200_FCUBSSTB.trc (incident=1184194):
ORA-29770: global enqueue process LMS0 (OSID 33621970) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/incident/incdir_1184194/FCUBSPRD1_lmhb_30148200_i1184194.trc
Sun Jan 01 15:25:23 2017
Sweep [inc][1184194]: completed
Sweep [inc2][1184194]: completed
Sun Jan 01 15:25:28 2017
ERROR: Some process(s) is not making progress.
LMHB (ospid: 30148200) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 30148200): terminating the instance due to error 29770

LMS0 trace has no entry for the period

 

FCUBSPRD1_lmhb_30148200_FCUBSSTB.trc:

*** 2017-01-01 15:25:02.621
....
LMS0 (ospid: 33621970) has no heartbeats for 86 sec. (threshold 70 sec)
: waiting for event 'gcs remote message' for 83 secs with wait_id 121809912.
===[ Wait Chain ]===
Wait chain is empty.
==============================
Dumping PROCESS LMS0 (ospid: 33621970) States
==============================
===[ System Load State ]===
CPU Total 72 Raw 72 Core 18 Socket -1
Load normal: Cur 2718 Highmark 331776 (10.61 1296.00) 
===[ Latch State ]===
Not in Latch Get
===[ Session State Object ]===
----------------------------------------
SO: 0x7000115b1c5e3f0, type: 4, owner: 0x7000115917f01f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x7000115917f01f0, name=session, file=ksu.h LINE:12729 ID:, pg=0
(session) sid: 1093 ser: 1 trans: 0x0, creator: 0x7000115917f01f0
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: 0x0
edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0x0
wait_id=121809912 seq_num=45901 snap_id=1
wait times: snap=1 min 23 sec, exc=1 min 23 sec, total=1 min 23 sec

....

*** 2017-01-01 15:25:02.624
Process diagnostic dump for oracle@padc2dbs01 (LMS0), OS id=33621970,
pid: 13, proc_ser: 1, sid: 1093, sess_ser: 1
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 15:25:02 ]
NOTE: scheduling delay has not been sampled for 0.323812 secs 0.000000 secs from [ 15:24:58 - 15:25:03 ], 5 sec avg
0.000000 secs from [ 15:24:02 - 15:25:03 ], 1 min avg
0.003528 secs from [ 15:20:03 - 15:25:03 ], 5 min avg
loadavg : 10.62 11.33 11.31
swap info: free_mem = 34902.60M rsv = 596.00M
alloc = 2533.36M avail = 152576.00M swap_free = 150042.64M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A oracle 33621970 1 0 39 20 af39ae590 249848 Dec 19 - 181:11 ora_lms0_FCUBSPRD1

*** 2017-01-01 15:25:07.637
Short stack dump: ORA-32516: cannot wait for process 'Unix process pid: 33621970, image: oracle@padc2dbs01 (LMS0)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 4940 ms 

-------------------------------------------------------------------------------
Process diagnostic dump actual duration=5.012000 sec
(max dump time=5.000000 sec)

....
*** 2017-01-01 15:25:17.697
==============================
LMS0 (ospid: 33621970) has not moved for 101 sec (1483277117.1483277016)
Incident 1184194 created, dump file: /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/incident/incdir_1184194/FCUBSPRD1_lmhb_30148200_i1184194.trc
ORA-29770: global enqueue process LMS0 (OSID 33621970) is hung for more than 70 seconds

ORA-32515: cannot issue ORADEBUG command 'SHORT_STACK' to process 'Unix process pid: 33621970, image: oracle@padc2dbs01 (LMS0)'; prior command execution time exceeds 4955 ms

Other LMS

OSD pid info: Unix process pid: 19923988, image: oracle@padc2dbs01 (LMS1)
....
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0x0
wait_id=121728350 seq_num=29877 snap_id=1
wait times: snap=0.368767 sec, exc=0.368767 sec, total=0.368767 sec

 

 

CAUSE

Missing OS fix ENABLING 'TCP_FASTLO' OPTION COULD LEAD TO MEMORY LEAK

SOLUTION

Apply the relevant AIX fix:

http://www-01.ibm.com/support/docview.wss?uid=isg1fixinfo153400

7100 TL3 SP5 7100-03-05-1524 IV66228

6100 TL9 SP6 6100-09-06-1543 IV67463

6100 TL9 SP5 6100-09-05-1524 IV67463


<think>嗯,用户这次询问的是Oracle数据库导出过程中的一系列错误代码解决方案。结合提供的三个引用资料来看,用户遇到的是expdp导出时常见的复合型错误:ORA-31693(表数据卸载失败)实际是结果性报错,核心问题在于ORA-02354(导出过程异常)和ORA-01555(快照过旧),偶尔伴随ORA-22924(快照过旧的特殊形态)。 用户可能正在执行数据库迁移或备份任务,从引用[3]提到的105GB大表导出失败来看,业务规模不小。ta连续遇到多个同类错误,说明不是偶发现象,需要系统化解决。尤其值得注意的是引用[1]和[3]都指向视图/同义词解析问题和大事务回滚段配置不足这两类根源。 深层需求其实是保证数据导出任务的可靠性。用户没有明说但可能隐含的痛点包括:生产环境停机时间有限、数据一致性要求高、以及对Oracle内部机制不熟悉带来的排查困难。 技术层面需要分三层处理: 首先是紧急应对措施,比如针对ORA-01555错误立即扩大UNDO表空间或调整参数,这类方案能在数分钟内生效; 其次要解决结构性问题,比如引用[1]提到的视图导出失败需要重建元数据; 最后是预防性优化,像引用[2][3]都提到的UNDO保留时间调整和大事务拆分策略。 特别要注意引用[3]揭示的陷阱——当导出大表时,如果存在并发DML操作,即使UNDO表空间总量足够,也可能因快照保留时间不足导致失败。这需要综合运用UNDO_RETENTION参数优化和导出时机的选择。 用户提供的错误代码组合很典型,正好构成完整的故障链条:ORA-02354是导出的通用异常包装器,ORA-31693是被跳过对象的报告,真正致命的底层错误是ORA-01555及其变种ORA-22924。这个诊断结论应该明确传达给用户。 </think>以下是针对Oracle数据库错误代码 **ORA-31693、ORA-02354、ORA-01555 和 ORA-22924** 的综合解决方案,结合引用内容和Oracle官方建议整理: --- ### **错误原因分析** 1. **ORA-31693 + ORA-02354** - 通常在`expdp`导出时发生,表示特定表(如 `YKSOFT.T_TEMP_U90915`)因元数据问题无法导出。 - **核心问题**:尝试导出**视图或同义词**(而非实际表),或对象元数据损坏(引用[1][^1])。 2. **ORA-01555: snapshot too old** - 导出过程中查询的数据版本被覆盖,因UNDO表空间不足或事务保留时间过短(引用[2][^2][^3])。 - 常见于导出超大表时(如引用[3]中105GB的表)。 3. **ORA-22924: snapshot too old** -ORA-01555的衍生错误,通常与LOB字段导出相关(引用[2][^2])。 --- ### **解决方案步骤** #### **1. 修复元数据问题(针对ORA-31693 + ORA-02354)** ```sql -- 检查对象类型(确认是表/视图/同义词) SELECT object_name, object_type FROM dba_objects WHERE owner='YKSOFT' AND object_name='T_TEMP_U90915'; -- 若为视图/同义词:重建或替换为实体表 CREATE TABLE YKSOFT.T_TEMP_U90915_ACTUAL AS SELECT * FROM T_TEMP_U90915; -- 导出新表替代原对象 ``` #### **2. 解决ORA-01555 / ORA-22924(快照过旧)** - **增大UNDO表空间**: ```sql ALTER TABLESPACE UNDOTBS1 ADD DATAFILE '/path/undotbs02.dbf' SIZE 10G AUTOEXTEND ON; ``` - **延长UNDO保留时间**: ```sql ALTER SYSTEM SET UNDO_RETENTION = 1800; -- 单位:秒(建议≥导出耗时) ``` - **优化导出参数**: ```bash expdp user/pwd DIRECTORY=dpump_dir DUMPFILE=exp.dmp LOGFILE=exp.log FLASHBACK_TIME=SYSTIMESTAMP -- 使用闪回保证一致性 PARALLEL=4 -- 降低并行度减少UNDO压力 ``` #### **3. 分治策略导出大表** - **单独导出失败的表**(避免全局失败): ```bash expdp user/pwd TABLES=SJTBK.INC_KETTLE_ETL_HIST, SJTBK.ECPS_EASY_LOGOUT DIRECTORY=dpump_dir DUMPFILE=large_tables.dmp ``` - **使用`QUERY`条件分批导出**: ```bash expdp ... QUERY="WHERE ROWNUM<=1000000" -- 分批导出 ``` #### **4. 预防性措施** - **监控UNDO使用**: ```sql SELECT BEGIN_TIME, TUNED_UNDORETENTION FROM V$UNDOSTAT; ``` - **避免导出高峰期操作**:减少并发DML对UNDO的争用。 - **改用物理备份**:对百GB级数据,RMAN比逻辑导出更可靠(引用[2][^2])。 --- ### **关键要点总结** | 错误码 | 主因 | 解决方案 | |-----------------|-----------------------|-----------------------------| | ORA-31693+02354 | 视图/同义词元数据异常 | 重建对象或导出实体表 | | ORA-01555 | UNDO空间或保留时间不足| 增大UNDO、延长保留时间 | | ORA-22924 | LOB字段导出快照过旧 | 分批导出+优化UNDO参数 | > ⚠️ 若问题持续,检查`alert.log`获取更详细的堆栈信息,并考虑使用Oracle Support收集诊断数据(RDA工具)[^3]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值