故障描述
业务执行特定模块时报错,比如:
lis中语句如下:
select mzhm, max(shsj) as shsj, jg from (select a.outpatient_id as mzhm, a.check_time as shsj, (select quantitative_result from lis_inspection_result rst where rst.inspection_id = a.inspection_id) as jg from lis_inspection_sample a, lis_requisition_item b where a.requisition_id = b.requisition_id and check_time > sysdate - 4 and b.charge_item_id in ('LIS062943', 'LIS080455', 'LIS069172', 'LIS080454', 'LIS080073', 'LIS080072', 'LIS080450', 'LIS080459', 'LIS080451', 'LIS080491', 'LIS080492')) aa group by mzhm, jg;
报错如下图:
解决方法
查询dba_2pc_pending视图,可看到确实有此事务
一、尝试将此分布式事务rollback
rollback force '13.3.394006';
等待数分钟未能正常回滚
二、如果rollback,可以采用手工清理事务的方式
set transaction use rollback segment SYSTEM; delete from sys.pending_trans$ where local_tran_id = '13.3.394006'; delete from sys.pending_sessions$ where local_tran_id = '13.3.394006'; delete from sys.pending_sub_sessions$ where local_tran_id ='13.3.394006'; commit;
再次查询dba_2pc_pending视图
已无事务信息
但在当前数据库执行语句时依旧报错
ORA-01591: lock held by in-doubt distributed transaction 13.3.394006
即使是重启数据库实例也无法释放
三、如果前两者都不行,都不能恢复业务,那么进一步处理
参考文档:(DBMS_LOGSTDBY.BUILD Seems to Hang And Does Not Return To SQL prompt. (Doc ID 747495.1) )
查询rollback segment 基表 x$ktuxe
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status,KTUXECFL Flags FROM x$ktuxe WHERE ktuxesta!='INACTIVE' AND ktuxeusn= 1;
KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS
---------- ---------- ---------- ---------------- ------------------------
9 26 1443117 ACTIVE NONE
11 14 9246366 ACTIVE NONE
13 3 394006 PREPARED SCO|COL|REV|DEAD
尝试手工回滚这个事务
commit force '13.3.394006';
报错:ORA-02058: no prepared transaction found with ID 13.3.394006
处理方法:手工插入pending_trans$基表数据后再进行删除
a
lter system disable distributed recovery; insert into pending_trans$ ( LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) values( ‘13.3.394006’, /* <== 这里替换成报错的事务号,即13.3.394006 */ 306206, /* */ 'XXXXXXX.12345.1.2.3', /* 这些不用修改 */ 'prepared','P', hextoraw( '00000001' ), hextoraw( '00000000' ), 0, sysdate, sysdate ); insert into pending_sessions$ values( ‘13.3.394006’, /* <== 这里替换成报错的事务号,即13.3.394006 */ 1, hextoraw('05004F003A1500000104'), /* 这些不用修改 */ 'C', 0, 30258592, '', 146); /* 这些不用修改 */ commit; commit force ‘13.3.394006’; If commit force raises an error then note the error message and execute the following: delete from pending_trans$ where local_tran_id=‘13.3.394006’; delete from pending_sessions$ where local_tran_id=‘13.3.394006’; commit;
alter system enable distributed recovery;
总结与建议
ORA-01591错误一般是由于分布式事务造成的,造成分布式事务失败的原因主要是库之间的网络突然异常,造成两个库中的事务信息不一致,所以会有残余的分布式事务信息。对于绝大多数情况,当恢复连接或CRASH的数据库重新启动后,会自动解决分布式事务,不需要人工干预。当特殊情况,网络异常,触发特殊数据库BUG时,未成自动recovery事务时,才使用人工操作的方式来维护分布式事务。