2019年1月7日,看到zabbix上有一个告警,newsadmin.logs这张表许久没有数据变动了,根据我们实际生产环境经验,是ogg出现了延迟。这张表的变动是很频繁的,之所以加这个告警是因为ogg的bug,当出现延迟的时候, Lag at Chkpt和Time Since Chkpt都正常,所以采取了这种监控表数据变动情况的方法。
ogg版本
Version 12.2.0.1.160419 OGGCORE_12.2.0.1.0OGGBP_PLATFORMS_160322.1529.2_FBO
info all看一下没有显示出延迟
GGSCI (oradb-4023.jwe.com.sh) 4> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING 2201 00:00:02 00:00:04
REPLICAT RUNNING 1516P1R1 00:00:04 00:00:08
REPLICAT RUNNING 2201R1 00:00:03 00:00:00
info复制进程,注意RBA信息
GGSCI (oradb-4023.jwe.com.sh) 2> info 2201R1
REPLICAT 2201R1 Last Started 2019-01-07 13:31 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:03 (updated 00:00:00 ago)
Process ID 57058
Log Read Checkpoint File /u01/app/grid/acfsmounts/acfsvol1/products/ogg/dirdat/2201/E0309732
2019-01-07 18:15:04.085190 RBA 17035412
查看rpt文件:
2019-01-07 13:31:10 INFO OGG-06323 NOUSEANSISQLQUOTES is specified. Non ANSI SQL string syntax is used for parameter.
Using the following default columns with matching names:
2019-01-07 13:39:55 INFO OGG-06510 Using the following key columns for target table EMAUD.BOND_RA_RATETENDENCY_V: EID.
2019-01-07 13:40:09 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:09 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:11 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:11 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:13 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:13 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:14 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:14 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:15 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:15 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:16 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:16 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:18 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:18 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
2019-01-07 13:40:19 WARNING OGG-01431 Aborted grouped transaction on 'EMAUD.FC_NON', Mapping error.
2019-01-07 13:40:19 WARNING OGG-01003 Repositioning to rba 17732522 in seqno 309732.
查看rpt文件后确认这是一个老问题。通常只要重启一下复制进程就可以了。
但是这里要注意一个问题,那就是不能随便启动。从rpt文件中可以看到出现问题的初始时间点是在2019-01-07 13:40:09左右,还要和之前info命令得到的RBA的时间点进行比对。
事实上,在rpt文件中报出的初始时间A左右,复制进程已经没有正常工作了。但是trail中的RBA所在的时间点B却往下走了,在这种B>A的情况下,如果贸然重启复制进程,可能会造成该数据丢失,发生数据不一致。
所以要去trail文件中找到info命令得到的RBA地址对应的时间点。
下面是寻找过程:
[ogg@oradb-4023 ogg]$ logdump
Oracle GoldenGate Log File Dump Utility for Oracle
Version 12.2.0.1.160419 OGGCORE_12.2.0.1.0OGGBP_PLATFORMS_160322.1529.2
Copyright (C) 1995, 2016, Oracle and/or its affiliates. All rights reserved.
2019-01-07 18:24:13 WARNING OGG-10173 (GLOBALS) line 1: Parsing error, [NOUSEANSISQLQUOTES] is deprecated.
2019-01-07 18:24:13 INFO OGG-06323 NOUSEANSISQLQUOTES is specified. Non ANSI SQL string syntax is used for parameter.
Logdump 302 >open /u01/app/grid/acfsmounts/acfsvol1/products/ogg/dirdat/2201/E0309732
Current LogTrail is /u01/app/grid/acfsmounts/acfsvol1/products/ogg/dirdat/2201/E0309732
Logdump 303 >detail data
Logdump 304 >ggstoken detail
Logdump 305 >ghdr on
Logdump 307 >pos 17035412
Reading forward from RBA 17035412
Logdump 308 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x0c)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 648 (x0288) IO Time : 2019/01/07 13:40:05.000.000
IOType : 134 (x86) OrigNode : 255 (xff)
TransInd : . (x00) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 263078 AuditPos : 21579280
Continued : N (x00) RecCount : 1 (x01)
2019/01/07 13:40:05.000.000 GGSUnifiedUpdate Len 648 RBA 17035412
Name: NEWSADMIN.FO_FD_YLIST (TDR Index: 1)
After Image: Partition 12 GU b
0000 0142 0000 000a 0000 0218 75e0 d66b f0b4 0001 | ...B........u..k....
000a 0000 0000 0000 0000 270e 0002 0015 0000 3230 | ..........'.......20
3138 2d30 382d 3134 3a32 313a 3130 3a35 3400 0300 | 18-08-14:21:10:54...
1500 0032 3031 392d 3031 2d30 373a 3132 3a33 313a | ...2019-01-07:12:31:
3131 0004 0003 0000 3000 0500 0a00 0000 0000 0000 | 11......0...........
0000 5a00 0600 15ff ff31 3930 302d 3031 2d30 313a | ..Z......1900-01-01:
3030 3a30 303a 3030 0007 000f 0000 000b 3233 3338 | 00:00:00........2338
Before Image Len 326 (x00000146)
BeforeColumnLen 322 (x00000142)
Column 0 (x0000), Len 10 (x000a)
0000 0218 75e0 d66b f0b4 | ....u..k..
Column 1 (x0001), Len 10 (x000a)
0000 0000 0000 0000 270e | ........'.
Column 2 (x0002), Len 21 (x0015)
0000 3230 3138 2d30 382d 3134 3a32 313a 3130 3a35 | ..2018-08-14:21:10:5
34 | 4
Column 3 (x0003), Len 21 (x0015)
0000 3230 3139 2d30 312d 3037 3a31 323a 3331 3a31 | ..2019-01-07:12:31:1
31 | 1
Column 4 (x0004), Len 3 (x0003)
0000 30 | ..0
Column 5 (x0005), Len 10 (x000a)
0000 0000 0000 0000 005a | .........Z
Column 6 (x0006), Len 21 (x0015)
ffff 3139 3030 2d30 312d 3031 3a30 303a 3030 3a30 | ..1900-01-01:00:00:0
30 | 0
Column 7 (x0007), Len 15 (x000f)
0000 000b 3233 3338 3937 3435 3034 31 | ....23389745041
Column 8 (x0008), Len 10 (x000a)
0000 0000 0000 0000 0004 | ..........
Column 9 (x0009), Len 10 (x000a)
0000 0000 0000 0000 0568 | .........h
Column 10 (x000a), Len 4 (x0004)
ffff 0000 | ....
Column 11 (x000b), Len 10 (x000a)
0000 0166 d2fa b89c 0810 | ...f......
Column 12 (x000c), Len 14 (x000e)
0000 000a 3130 3030 3833 3232 3430 | ....1000832240
Column 13 (x000d), Len 12 (x000c)
0000 0008 3738 3034 3937 3732 | ....78049772
Column 14 (x000e), Len 21 (x0015)
0000 3230 3138 2d30 312d 3031 3a30 303a 3030 3a30 | ..2018-01-01:00:00:0
30 | 0
Column 15 (x000f), Len 12 (x000c)
0000 0008 b9dc c0ed c6da bbf5 | ............
Column 16 (x0010), Len 15 (x000f)
0000 000b 2d39 2e37 3234 3431 3833 34 | ....-9.72441834
Column 17 (x0011), Len 13 (x000d)
0000 0009 3131 3630 2f31 3338 36 | ....1160/1386
Column 18 (x0012), Len 14 (x000e)
0000 000a 322e 3933 3030 3136 3133 | ....2.93001613
After Image Len 322 (x00000142)
Column 0 (x0000), Len 10 (x000a)
0000 0218 75e0 d66b f0b4 | ....u..k..
Column 1 (x0001), Len 10 (x000a)
0000 0000 0000 0000 270e | ........'.
Column 2 (x0002), Len 21 (x0015)
0000 3230 3138 2d30 382d 3134 3a32 313a 3130 3a35 | ..2018-08-14:21:10:5
34 | 4
Column 3 (x0003), Len 21 (x0015)
0000 3230 3139 2d30 312d 3037 3a31 333a 3430 3a30 | ..2019-01-07:13:40:0
35 | 5
Column 4 (x0004), Len 3 (x0003)
0000 30 | ..0
Column 5 (x0005), Len 10 (x000a)
0000 0000 0000 0000 005a | .........Z
Column 6 (x0006), Len 21 (x0015)
ffff 3139 3030 2d30 312d 3031 3a30 303a 3030 3a30 | ..1900-01-01:00:00:0
30 | 0
Column 7 (x0007), Len 15 (x000f)
0000 000b 3233 3339 3032 3535 3834 30 | ....23390255840
Column 8 (x0008), Len 10 (x000a)
0000 0000 0000 0000 0004 | ..........
Column 9 (x0009), Len 10 (x000a)
0000 0000 0000 0000 0568 | .........h
Column 10 (x000a), Len 4 (x0004)
ffff 0000 | ....
Column 11 (x000b), Len 10 (x000a)
0000 0166 d2fa b89c 0810 | ...f......
Column 12 (x000c), Len 14 (x000e)
0000 000a 3130 3030 3833 3232 3430 | ....1000832240
Column 13 (x000d), Len 12 (x000c)
0000 0008 3738 3034 3937 3732 | ....78049772
Column 14 (x000e), Len 21 (x0015)
0000 3230 3138 2d30 312d 3031 3a30 303a 3030 3a30 | ..2018-01-01:00:00:0
30 | 0
Column 15 (x000f), Len 12 (x000c)
0000 0008 b9dc c0ed c6da bbf5 | ............
Column 16 (x0010), Len 15 (x000f)
0000 000b 2d39 2e37 3234 3431 3833 34 | ....-9.72441834
Column 17 (x0011), Len 13 (x000d)
0000 0009 3131 3630 2f31 3338 36 | ....1160/1386
Column 18 (x0012), Len 14 (x000e)
0000 000a 322e 3933 3331 3533 3039 | ....2.93315309
GGS tokens:
TokenID x52 'R' ORAROWID Info x00 Length 20
4141 4245 3162 4142 6541 4143 6f49 3741 4151 0001 | AABE1bABeAACoI7AAQ..
TokenID x74 't' ORATAG Info x01 Length 1
00 | .
TokenID x4c 'L' LOGCSN Info x00 Length 12
3439 3737 3133 3931 3834 3334 | 497713918434
TokenID x36 '6' TRANID Info x00 Length 14
3230 2e31 372e 3131 3133 3537 3633 | 20.17.11135763
TokenID x69 'i' ORATHREADID Info x01 Length 2
0002 | ..
从logdumo中可以看到RBA对应的时间点是2019/01/07 13:40:05.000.000,稍微比rpt文件中的2019-01-07 13:40:09小。毕竟发生问题到写入rpt有一点延迟也很正常。这说明这次的问题RBA没有往下推进,那么是可以安全的重启的。
重启过程
GGSCI (oradb-4023.jwe.com.sh) 7> stop 2201R1
Sending STOP request to REPLICAT 2201R1 ...
Request processed.
GGSCI (oradb-4023.jwe.com.sh) 8> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING 2201 00:00:00 00:00:08
REPLICAT RUNNING 1516P1R1 00:00:00 00:00:09
REPLICAT STOPPED 2201R1 00:00:07 00:00:05
GGSCI (oradb-4023.jwe.com.sh) 9> start 2201R1
Sending START request to MANAGER ...
REPLICAT 2201R1 starting
启动正常后,可以看到Lag at Chkpt列可以正常显示lag了
GGSCI (oradb-4023.jwe.com.sh) 10> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING 2201 00:00:02 00:00:06
REPLICAT RUNNING 1516P1R1 00:00:06 00:00:00
REPLICAT RUNNING 2201R1 04:47:38 00:00:08
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31480688/viewspace-2374947/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/31480688/viewspace-2374947/