Oracle Trace File Analyzer (TFA)使用方法二

本文详述了Oracle Trace File Analyzer (TFA) 的使用,包括如何使用diagcollect收集不同场景的日志,如全节点、特定实例、告警日志等,以及如何用analyze命令分析错误信息和日志,对数据库实例进行故障排查。
部署运行你感兴趣的模型镜像

第一章主要介绍了TFA的基本安装和一些问题解决,接下来我们主要看看TFA的使用方法

一、收集日志信息diagcollect

该命令可以用来收集,该命令语法结构如下

tfactl diagcollect [-all | [component_name1] [component_name2] ... [component_nameN]]  [-node all|local|n1,n2,..] [-tag description] [-z filename] [-since nh|d| -from time -to time | -for time] [-nocopy] [-notrim] [-silent] [-nocores] [-collectalldirs] [-collectdir dir1,dir2..] [-examples] [-node [node1,node2,nodeN] components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-afd|-crs|-wls|-emagent|-oms|-ocm|-emplugins|-em|-acfs|-install|-cfgtools|-os|-ips|-ashhtml|-ashtext|-awrhtml|-awrtext
看起来很复杂,很长,这些参数的具体含义可以上官网查询  diagcollect参数解释 ,以下是常用的格式

1、收集所有节点的所有trace日志

tfactl> diagcollect -all                                                        

Collecting data for the last 12 hours for this component ...
Collecting data for all nodes
Creating ips package in master node ...
Trying ADR basepath /u01/app/oracle
Trying to use ADR homepath diag/rdbms/orcl/orcl ...
Submitting request to generate package for ADR homepath /u01/app/oracle/diag/rdbms/orcl/orcl
Trying to use ADR homepath diag/rdbms/prod1/PROD1 ...
Submitting request to generate package for ADR homepath /u01/app/oracle/diag/rdbms/prod1/PROD1
Master package completed for ADR homepath /u01/app/oracle/diag/rdbms/orcl/orcl
Created package 9 based on time range 2017-03-21 20:26:00.000000 +08:00 to 2017-03-22 08:26:00.000000 +08:00, correlation level basic
Master package completed for ADR homepath /u01/app/oracle/diag/rdbms/prod1/PROD1
Created package 9 based on time range 2017-03-21 20:26:00.000000 +08:00 to 2017-03-22 08:26:00.000000 +08:00, correlation level basic


Collection Id : 2017032208265012cr2

Detailed Logging at : /u01/app/oracle/tfa/repository/collection_Wed_Mar_22_08_26_50_CST_2017_node_all/diagcollect_20170322082650_12cr2.log
2017/03/22 08:27:06 CST : Collection Name : tfa_Wed_Mar_22_08_26_50_CST_2017.zip
2017/03/22 08:27:06 CST : Collecting diagnostics from hosts : [12cr2]
2017/03/22 08:27:06 CST : Scanning of files for Collection in progress...
2017/03/22 08:27:06 CST : Collecting additional diagnostic information...
2017/03/22 08:27:11 CST : Getting list of files satisfying time range [03/21/2017 20:27:06 CST, 03/22/2017 08:27:11 CST]
2017/03/22 08:27:12 CST : Collecting ADR incident files...
2017/03/22 08:27:28 CST : Completed collection of additional diagnostic information...
2017/03/22 08:27:32 CST : Completed Local Collection
.----------------------------------.
|        Collection Summary        |
+-------+-----------+-------+------+
| Host  | Status    | Size  | Time |
+-------+-----------+-------+------+
| 12cr2 | Completed | 244kB |  26s |
'-------+-----------+-------+------'

Logs are being collected to: /u01/app/oracle/tfa/repository/collection_Wed_Mar_22_08_26_50_CST_2017_node_all
/u01/app/oracle/tfa/repository/collection_Wed_Mar_22_08_26_50_CST_2017_node_all/12cr2.tfa_Wed_Mar_22_08_26_50_CST_2017.zip

如您所见,我有两个实例,都进行了收集。篇幅所限仅下面只说明命令,不赋上打印信息。

2、收集8小时内的所有日志

tfactl diagcollect –all –since 8h

3、收集hrdb、fdb实例1小时内的的trace日志,并将打包的文件命名为foo

tfactl diagcollect -database hrdb,fdb -since 1d -z foo

4、收集6小时内RAC的1、2号节点的TRACE和OS日志

tfactl diagcollect –crs -os -node node1,node2 -since 6h

5、收集ASM的1,2号节点在2016-9-22到2016-9-23 21:00:00之间的告警日志

tfactl diagcollect -asm -node node1 -from Sep/22/2016 -to "Sep/23/2016 21:00:00"

6、收集2016-9-23日的trace日志

tfactl diagcollect -for Sep/23/2016

7、上面的命令加上时间表示在这个时间往前和王后的12小时的日志

下面表示收集2016-9-22 9:00到2016-9-23日 9:00间的trace日志
tfactl diagcollect -for "September/22/2016 21:00:00"

这些日志的关键告警信息都会被截取并打包,默认的情况下存在$ORACLE_BASE/tfa/repository/目录下,收集时您可以根据界面上信息查看该文件位置。这些日志我们可以进入相应目录查看。
这里您也可以根据自己的需求增加或者删除收集目录,自动进行收集,加入或减少节点等这里不过多介绍,详见官网说明 Diagnostic Collection Commands

二、自动分析所有日志analyze命令

如果自动收集无法自动分析,岂不是很没趣,oracle提供了analyze命令来帮助我们分析数据库当前的trace文件,以下就是部分常用命令说明
首先是语法,结构如下,详细的参数见 Analyze参数说明
tfactl analyze [-search "pattern"] [-comp db | asm | crs | acfs | os | osw | oswslabinfo | all] [-type error | warning | generic] [-since nh|d] [-from "MMM/DD/YYYY HH24:MI:SS"] [-to "MMM/DD/YYYY HH24:MI:SS"] [-for "MMM/DD/YYYY HH24:MI:SS"] [-node all | local | n1,n2,...] [-verbose] [-o file]

1、分析最近两天的“error”报错信息

tfactl> analyze -search "error" -since 2d                                       
INFO: analyzing all (Alert and Unix System Logs) logs for the last 2880 minutes...  Please wait...
INFO: analyzing host: 12cr2

                    Report title: Analysis of Alert,System Logs
               Report date range: last ~2 day(s)
      Report (default) time zone: CST - China Standard Time
             Analysis started at: 22-Mar-2017 09:12:44 AM CST
           Elapsed analysis time: 5 second(s).
              Configuration file: /u01/app/oracle/tfa/12cr2/tfa_home/ext/tnt/conf/tnt.prop
             Configuration group: all
                       Parameter: error
             Total message count:         18,645, from 14-Mar-2017 09:47:46 AM CST to 22-Mar-2017 09:10:02 AM CST
Messages matching last ~2 day(s):          4,755, from 20-Mar-2017 09:21:15 AM CST to 22-Mar-2017 09:10:02 AM CST
                  Matching regex: error
                  Case sensitive: false
                     Match count: 228

[Source: /var/log/messages-20170320, Line: 9021]
Mar 20 09:21:25 2017
12cr2 rngd: read error

[Source: /var/log/messages-20170320, Line: 9022]
Mar 20 09:21:25 2017
12cr2 rngd: read error
篇幅所限只显示部分关键信息,如上。最近两台的错误有228个,search 后面跟着是关键字,你可以自己指定,如我们平常关心的ORA报错,都可以轻松找到并发现。
tfactl> analyze -search "ORA-" -since 7d                                        
INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes...  Please wait...
INFO: analyzing host: 12cr2

                    Report title: Analysis of Alert,System Logs
               Report date range: last ~7 day(s)
      Report (default) time zone: CST - China Standard Time
             Analysis started at: 22-Mar-2017 09:17:48 AM CST
           Elapsed analysis time: 5 second(s).
              Configuration file: /u01/app/oracle/tfa/12cr2/tfa_home/ext/tnt/conf/tnt.prop
             Configuration group: all
                       Parameter: ORA-
             Total message count:         21,276, from 14-Mar-2017 09:47:46 AM CST to 22-Mar-2017 09:17:18 AM CST
Messages matching last ~7 day(s):         11,382, from 15-Mar-2017 09:20:01 AM CST to 22-Mar-2017 09:17:18 AM CST
                  Matching regex: ORA-
                  Case sensitive: false
                     Match count: 373

[Source: /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log, Line: 459]
Mar 15 10:59:47 2017
Dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL
Stopping Emon pool
ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...

2、分析最近两天数据库实例的日志

tfactl> analyze -comp db -since 2d                                              
INFO: analyzing db (DB Alert Logs) logs for the last 2880 minutes...  Please wait...
INFO: analyzing host: 12cr2

                      Report title: DB Alert Logs
                 Report date range: last ~2 day(s)
        Report (default) time zone: CST - China Standard Time
               Analysis started at: 22-Mar-2017 09:33:29 AM CST
             Elapsed analysis time: 2 second(s).
                Configuration file: /u01/app/oracle/tfa/12cr2/tfa_home/ext/tnt/conf/tnt.prop
               Configuration group: db
               Total message count:          5,514, from 14-Mar-2017 04:20:29 PM CST to 17-Mar-2017 02:10:31 PM CST
  Messages matching last ~2 day(s):              0
        last ~2 day(s) error count:              0
last ~2 day(s) ignored error count:              0
 last ~2 day(s) unique error count:              0
-comp 参数可以指定级别为os、db、asm、acfs、crs、osw、oswslabinfo、all,默认的话是all,表示所有的都收集

3、分析最近5小时的日志

 tfactl analyze -since 5h

4、分析最近1h的级别为error的错误

tfactl>  analyze -since 1h -type error                                          
INFO: analyzing all (Alert and Unix System Logs) logs for the last 60 minutes...  Please wait...
INFO: analyzing host: 12cr2

                       Report title: Analysis of Alert,System Logs
                  Report date range: last ~1 hour(s)
         Report (default) time zone: CST - China Standard Time
                Analysis started at: 22-Mar-2017 09:41:22 AM CST
              Elapsed analysis time: 1 second(s).
                 Configuration file: /u01/app/oracle/tfa/12cr2/tfa_home/ext/tnt/conf/tnt.prop
                Configuration group: all
                Total message count:          3,467, from 20-Mar-2017 10:48:02 AM CST to 22-Mar-2017 09:40:01 AM CST
  Messages matching last ~1 hour(s):             60, from 22-Mar-2017 08:43:16 AM CST to 22-Mar-2017 09:40:01 AM CST
        last ~1 hour(s) error count:              0
last ~1 hour(s) ignored error count:              0
 last ~1 hour(s) unique error count:              0

Message types for last ~1 hour(s)
   Occurrences percent  server name          type
   ----------- -------  -------------------- -----
            60  100.0%  12cr2                generic
   ----------- -------
            60  100.0%

Unique error messages for last ~1 hour(s)
   Occurrences percent  server name          error
   ----------- -------  -------------------- -----
   ----------- -------
             0  100.0%
这里的type里的error 和-search 里的error不是一个意思,这里指的是告警类别,type参数有三个, error、warning、generic
我们可以理解为严重,警告,一般三类错误
Oracle官网的说明如下:
Argument Description
error

Error message patterns for database and Oracle ASM alert logs:

.*ORA-00600:.*
.*ORA-07445:.*
.*IPC Send timeout detected. Sender: ospid.*
.*Direct NFS: channel id .* path .* to filer .* PING timeout.*
.*Direct NFS: channel id .* path .* to filer .* is DOWN.*
.*ospid: .* has not called a wait for .* secs.*
.*IPC Send timeout to .* inc .* for msg type .* from opid.*
.*IPC Send timeout: Terminating pid.*
.*Receiver: inst .* binc .* ospid.*
.* terminating instance due to error.*
.*: terminating the instance due to error.*
.*Global Enqueue Services Deadlock detected

Error message patterns for Oracle Grid Infrastructure alert logs:

.*CRS-8011:.*,.*CRS-8013:.*,.*CRS-1607:.*,.*CRS-1615:.*,
.*CRS-1714:.*,.*CRS-1656:.*,.*PRVF-5305:.*,.*CRS-1601:.*,
.*CRS-1610:.*,.*PANIC. CRSD exiting:.*,.*Fatal Error from AGFW Proxy:.*
warning

Warning message patterns for database and Oracle ASM alert logs:

NOTE: process .* initiating offline of disk .*
.*WARNING: cache read a corrupted block group.*
.*NOTE: a corrupted block from group FRA was dumped to
generic

Any messages that do not match any of the preceding patterns.


其实这里还可以对指定时间的日志进行分析,官网说明如下,但是实际上报错了
tfactl> analyze -comp os -for "Mar/21/2017 11:00:00" -search "ORA"              

ERROR: Invalid value for -for/-from. Supported format MMM/DD/YYYY HH24:MI:SS
可见还是有很多BUG未及时修复,所以关于时间参数的我们暂时可以跳过吧。
以上就是基本的使用,当然TFA还有很多其他的功能和使用命令,但是感觉用的比较少,如果有兴趣可以直接上官网进行查询。

您可能感兴趣的与本文相关的镜像

ACE-Step

ACE-Step

音乐合成
ACE-Step

ACE-Step是由中国团队阶跃星辰(StepFun)与ACE Studio联手打造的开源音乐生成模型。 它拥有3.5B参数量,支持快速高质量生成、强可控性和易于拓展的特点。 最厉害的是,它可以生成多种语言的歌曲,包括但不限于中文、英文、日文等19种语言

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值