环境说明:
DB:Oracle 11.2.0.4.0 RAC
OS:AIX 7.1
问题现象:
1.CV备份平台自动备份数据库失败,提示ORA-01513: INVALID CURRENT TIME RETURNED BY OPERATING SYSTEM,自动重试后成功(偶发性问题)。
2.数据库告警日志报错如下:
Sun Jan 00 00:00:00 1900
opidcl aborting process unknown ospid (18875110) as a result of ORA-1513
其中:
ORA-01513错误说明操作系统返回了错误的时间,即不是1988到2121之间的时间。
ORA-01513: invalidcurrent time returned by operating system
Cause: Theoperating system returned a time that was not between 1988 and 2121.
Action: Correct thetime kept by the operating system.
分析原因:
难道操作系统时间自动变成1900年了?
查看操作系统日志,并没有相关日志,问题期间数据库也无其他异常。
怀疑是CV备份平台在执行备份时,调用数据库操作系统时间出现异常,而不是操作系统时间真的变回1900年,否则RAC集群也会发生异常。
最终通过IBM官网,找到了相关内容。
https://www.ibm.com/support/pages/apar/IT14941
IT14941: RMAN ABORT IN CONTEXT WITH TIVOLI STORAGE MANAGER BACKUP: ORA-01513: INVALID CURRENT TIME RETURNED BY OPERATING SYSTEM
部分内容如下:
错误描述:
Error description
在UNIX或Linux上备份Oracle的用户可能会遇到Oracle通道中止,Oracle会报告错误:ORA-01513: invalid current time returned by operating system
Users who are backing up Oracle on UNIX or Linux may encounter oracle channel aborts with oracle reporting the error:
当这些共享库调用系统函数localtime_r()时,
When these shared libraries call the system function localtime_r(),
该调用由oracle进程接收,该进程与操作系统进行协商。
that call is received by the oracle process which negotiates it with the operating system.
Oracle开发承认,他们的软件中存在一个问题,可能导致无效值被重复返回到这些localtime_r()调用。
Oracle development have admitted that there is a problem in their software which can result in invalid values being repeatedly returned to these localtime_r() calls.
一旦localtime_r() 开始返回无效数据,它将一直执行此操作,直到oracle通道进程最终中止并出现错误ORA-01513
Once localtime_r() has started to return invalid data, it will be doing this until the oracle channel process finally aborts with the error ORA-01513
Oracle Reference:
SR 3-12029500711 : Zero time stamps in the sbtio.log
Bug 22617228
Patch 22617228: AIX SYSTEM CALL LOCATIME_R() - ZERO TIME STAMPS IN THE SBTIO.LOG
Oracle已在"Patches and Updates"中提供了解决此问题的修复程序,修补程序名称/编号=22617228可选择平台。
A fix has been provided by Oracle to address this issue in 'Patches and Updates', Patch Name/Number = 22617228 The platform can be selected.
症状发生的风险随着localtime_r()调用的频率增加而增加。
The risk for the occurrence of the symptom increases with the frequency of localtime_r() calls.
相比之下,在没有详细跟踪的正常备份/恢复操作期间,问题通常不会变得明显。
By contrast, during normal backup / restore operations without verbose tracing the problem often does not become apparent.
解决方案:
根据提示,在MOS上下载并更新22617228补丁应该可以解决此类问题,生产环境需要提前充分测试后在操作。
#####chenjuchao 20210831 20:15#####
欢迎关注我的公众号《IT小Chen》