DataGuard(一主一备一级联之failover)

本文详细介绍了Oracle数据库在主库发生故障时,如何通过failover机制将备库切换为主库的过程,并通过两个实验演示了使用11g版本的flush redo功能减少数据丢失,以及在主库无法启动到mount状态时,如何手动注册并应用主库日志。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


fail over

1 failover介绍

 

failover转换,主要考虑的是如何减少数据量的丢失,如果可以允许数据丢失,可以直接切换备库为主库,不必在进行下面的讨论!

 

 

减少数据的丢失,实际就是弥补日志数据没有传递的问题

在11g中,推出了日志手工flush的功能,来弥补日志数据没有传递的问题。前提是主库能启动到mount 状态。Flush 可以把没有发送的归档和current online redo 发送到备库。

SQL> startup mount                    

SQL> alter system flush redo to 'orclps1';

注意:这个过程并不会经常成功执行,而且在10g这样的版本下也没有办法自动flush redo。解决的方法也是有的,就是从Primary目录中,将日志拷贝到Standby端,手工去加载。

 

另外failover之后,原主库如何重建为备库:

1.利用flashback database将原主库恢复到failover之前的时间点,前提是开启flashback on。

2.rman备份恢复,如果有failover之前的备份,可以先恢复到之前时间点,然后在将主库转为备库。

3.环境重建,最简单的方法,删除原主库,再搭建新的备库。

 

备库转为主库的时间戳

select to_char(standby_became_primary_scn) from v$database;

常用于数据库开启了flashback on后原主库闪回到备库切换为主库的时间点,以保证原主库切换为新备库后还能继续和新主库同步并应用日志。

 

下面介绍一下后面的实验

实验一:利用11g新功能flush redo同步数据,减少数据丢失。

实验二:当主库宕机后,无法启动到mount时,通过拷贝主库归档和在线日志到备库注册加载,减少数据量的丢失。(当然,如果运气好,注册后就能应用)

 

2 实验一:failover之flush redo同步数据

现数据库环境如下

节点

网络ip地址

数据库名

unique name

数据库实例名

数据文件位置

zyx.test.com(备库)

192.168.11.111

orcl

orcl

test

/u01/app/oracle/oradata/orcl/

orcl.test.com(主库)

192.168.11.22

orcl

orclps

orclps

/u01/app/oracle/oradata/orcl/

dg2.orcl.com(级联库)

192.168.11.23

orcl

orclstd

orclstd

/u01/app/oracle/oradata/orcl/

 

----实验中,将主库failover为备库,级联库不变(为了后续实验,主库备库级联库都开启flashback)

 

2.1 环境准备

----首先,orclps主库开启flahback

sys@ORCL>shut immediate

sys@ORCL>startup mount

sys@ORCL> alter database flashback on;

sys@ORCL>alter database open;

sys@ORCL>select name,flashback_on from v$database;

----备库orcl开启flashback

sys@ORCL> alter database recover managed standby database cancel;

sys@ORCL>shut immediate

sys@ORCL>startup mount

sys@ORCL> alter database flashback on;

sys@ORCL>alter database open;

sys@ORCL>alter database recover managed standby database using current logfile disconnect;

----级联库orclstd开启flashback

SQL> alter database recover managed standby database cancel;

SQL> shut immediate

SQL> startup mount

SQL> alter database flashback on;

SQL> alter database recover managed standby database using current logfile disconnect;

 

2.2 模拟故障

----备库orcl关闭网络,然后在主库上创建表插入数据并提交

service network stop

 

----此时主库状态

sys@ORCL>select database_role,switchover_status,db_unique_name  from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS    DB_UNIQUE_NAME

---------------- -------------------- ------------------------------

PRIMARY          RESOLVABLE GAP       orclps

 

----主库上模拟故障

sys@ORCL>create table shall(shall int);

 

begin

for i in 1..100000 loop

insert into shall values(i);

end loop;

commit;

end;

/

PL/SQL procedure successfully completed.

 

sys@ORCL>alter system switch logfile;

 

sys@ORCL>insert into shall values (111);

sys@ORCL>commit;

 

sys@ORCL>shut abort

 

2.3 flush redo同步数据

----备库开启网络

service network start

 

----此时备库是无法查询到主库故障之前未传输的数据

sys@ORCL>select count(*) from shall;

select count(*) from shall

                     *

ERROR at line 1:

ORA-00942: table or view does not exist

 

----此时如果主库能启动到mount,可以利用flush redo同步

------先查看未同步日志

sys@ORCL>startup mount

sys@ORCL>select sequence#, applied from v$archived_log;

 

------开始同步

sys@ORCL> alter system flush redo to 'orcl';

 

------等待传输完毕后,在备库查看之前未传输的数据

sys@ORCL>select count(*) from shall;

  COUNT(*)

----------

    100001

 

2.4 flush同步后主备库切换

--------现备切换为新主库

sys@ORCL>alter database recover managed standby database cancel;

sys@ORCL>alter database recover managed standby database finish;

sys@ORCL>select switchover_status,db_unique_name from v$database;

sys@ORCL>alter database commit to switchover to primary with session shutdown;

sys@ORCL>alter database open;

 

-------原主库切换为新备库(没有开启flashback on和原rman备份,就重建备库)

----------先查询新主库切换时间戳

sys@ORCL>select to_char(standby_became_primary_scn) from v$database;

TO_CHAR(STANDBY_BECAME_PRIMARY_SCN)

----------------------------------------

1045602

 

----------闪回数据库

sys@ORCL>flashback database to scn 1045602;

 

----------原主库切换备库

sys@ORCL>alter database convert to physical standby;

sys@ORCL>shut immediate

sys@ORCL>startup

sys@ORCL>alter database recover managed standby database using current logfile disconnect;

 

----查看主备日志应用情况

sys@ORCL>select sequence#, applied from v$archived_log;

 

2.5 数据同步测试

------主库orcl删除shall表

sys@ORCL>drop table shall purge;

 

------备库orclps备库已经无法查看该表咯

sys@ORCL>select count(*) from shall;

select count(*) from shall

                     *

ERROR at line 1:

ORA-00942: table or view does not exist

 

------此时级联库orclstd还能查看到数据

SQL> select count(*) from shall;

  COUNT(*)

----------

    100001

 

------主库orcl切换一下日志

sys@ORCL>alter system switch logfile;

 

------此时级联库orclstd也无法查看数据咯

SQL> select count(*) from shall;

select count(*) from shall

                     *

ERROR at line 1:

ORA-00942: table or view does not exist

 

 

---------------为什么这次failover级联库上没有任何更改,还能找到新的备库去同步数据?因为之前做switover时参数都已经配置好了,主库和备库任何一个库只要是备库就会传归档到级联库。

 

 

3 实验二:failover之备库手动注册主库日志后无法应用

现数据库环境如下

节点

网络ip地址

数据库名

unique name

数据库实例名

数据文件位置

zyx.test.com(主库)

192.168.11.111

orcl

orcl

test

/u01/app/oracle/oradata/orcl/

orcl.test.com(备库)

192.168.11.22

orcl

orclps

orclps

/u01/app/oracle/oradata/orcl/

dg2.orcl.com(级联库)

192.168.11.23

orcl

orclstd

orclstd

/u01/app/oracle/oradata/orcl/

 

----实验中,将备库failover为主库

 

3.1模拟故障

----备库orclps关闭网络,然后在主库上创建表插入数据并提交

service network stop

 

----此时主库orcl状态

sys@ORCL>select database_role,switchover_status,db_unique_name  from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS    DB_UNIQUE_NAME

---------------- -------------------- ------------------------------

PRIMARY          RESOLVABLE GAP       orcl

 

----主库orcl上模拟故障

sys@ORCL>create table shall(shall int);

 

begin

for i in 1..100000 loop

insert into shall values(i);

end loop;

commit;

end;

/

PL/SQL procedure successfully completed.

 

sys@ORCL>alter system switch logfile;

 

begin

for i in 1..1234 loop

insert into shall values(i);

end loop;

commit;

end;

/

PL/SQL procedure successfully completed.

 

sys@ORCL>shut abort

 

3.2手动注册主库归档

----如果flush redo不能完成日志传递,或者说主库已经无法进入到mount模式,那么就需要手动去完成日志同步

----备库orclps开启网络

service network start

 

----此时备库orclps是无法查询到主库故障之前未传输的数据

sys@ORCL>select count(*) from shall;

select count(*) from shall

                     *

ERROR at line 1:

ORA-00942: table or view does not exist

 

----假如主库orcl已经无法启动,这时候先查看备库orclps应用日志序号,然后把主库orcl未传送归档拷贝过去

sys@ORCL>select sequence#, applied from v$archived_log;

 SEQUENCE# APPLIED

        38 YES

        38 YES

        39 YES

        39 YES

        40 YES

        40 YES

        41 IN-MEMORY

        41 YES

70 rows selected.

 

------从上面可以看到,备库orclps日志正应用到序号41,且还为发送已经同步完毕信号给主库,此时需要去找主库orcl把对应的41号序号之后的归档,传输到备库进行应用。(41 yes表示级联库已经应用完毕)---->此时建议将级联库orclstd转为主库,就不用再在当前备库orclps中考虑如何让注册的归档进行应用,为了实验环境的需要,这里还是将当前备库orclps转为主库。

------>看下面,呵呵,当前备库orclps无法应用日志,但是级联库orclstd可以应用

 

------手工归档应用 -------复制主库orcl未应用归档到备库

[oracle@zyx ~]$ cd  /u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_04_23/

[oracle@zyx 2016_04_23]$ ll -rth

-rw-r----- 1 oracle oinstall 120K Apr 23 04:02 o1_mf_1_40_cko0vbn5_.arc

-rw-r----- 1 oracle oinstall 228K Apr 23 04:10 o1_mf_1_41_cko195d2_.arc

-rw-r----- 1 oracle oinstall  31K Apr 23 04:11 o1_mf_1_42_cko1cm23_.arc

-rw-r----- 1 oracle oinstall  25M Apr 23 04:12 o1_mf_1_43_cko1dq1c_.arc

 

[oracle@zyx 2016_04_23]$ scp o1_mf_1_42_cko1cm23_.arc o1_mf_1_43_cko1dq1c_.arc o1_mf_1_42_cko1cm23_.arc 192.168.11.22:/home/oracle/

oracle@192.168.11.22's password:

o1_mf_1_42_cko1cm23_.arc         100%   31KB  31.0KB/s   00:00   

o1_mf_1_43_cko1dq1c_.arc          100%   24MB  24.5MB/s   00:01   

 

-------备库orclps上注册复制过来的归档日志

sys@ORCL>select sequence#, applied from v$archived_log;

 

sys@ORCL>alter database register physical logfile '/home/oracle/o1_mf_1_42_cko1cm23_.arc';

sys@ORCL>alter database register physical logfile '/home/oracle/o1_mf_1_43_cko1dq1c_.arc';

 

sys@ORCL>select sequence#, applied from v$archived_log;

        39 YES

        40 YES

        40 YES

        41 IN-MEMORY

        41 YES

        42 NO

        43 NO

72 rows selected.

------------如果日志未应用

 

-------应用归档日志

----如果日志未应用,需要更改日志应用方式,必须更改才能应用日志(备用数据库断开)

sys@ORCL>alter database recover managed standby database cancel;

sys@ORCL> alter database recover managed standby database disconnect from session;

sys@ORCL>select sequence#, applied from v$archived_log;

        40 YES

        41 YES

        41 YES

        42 NO

        43 NO

        42 YES

        43 YES

74 rows selected.

----发现级联库orclstd日志已经应用完毕,但是当前备库orclps还是没有应用

----级联库orclstd可以查看表shall,但是备库orclps是无法查看shall表的

SQL> select count(*) from shall;

  COUNT(*)

----------

    100000

 

3.3 处理当前备库orclps日志无法应用问题

------首先,应该查看alert日志,作为数据库管理员,任何时候都应该时刻关注日志内容

----查看告警日志之前,我再试一试看能否应用日志

sys@ORCL> alter database recover managed standby database cancel;

sys@ORCL>alter database recover managed standby database using current logfile disconnect;

sys@ORCL>select sequence#, applied from v$archived_log;

----发现还是无法应用日志

 

Mon Apr 25 21:18:16 2016

Completed:  alter database recover managed standby database disconnect from session

Mon Apr 25 21:56:02 2016

 alter database recover managed standby database cancel

Mon Apr 25 21:56:03 2016

MRP0: Background Media Recovery cancelled with status 16037

Errors in file /u01/app/oracle/diag/rdbms/orclps/orclps/trace/orclps_pr00_45894.trc:

ORA-16037: user requested cancel of managed recovery operation

Recovery interrupted!

Mon Apr 25 21:56:03 2016

 

------看一下这个trc的内容

[oracle@orcl ~]$ vim /u01/app/oracle/diag/rdbms/orclps/orclps/trace/orclps_pr00_45894.trc

Started Parallel Media Recovery

*** 2016-04-25 21:18:15.833 4329 krsh.c

Managed Standby Recovery not using Real Time Apply

Dumping database incarnation table:

Resetlogs 0 scn and time: 0x0000.000e2006 04/22/2016 22:09:31

Recovery target incarnation = 2, activation ID = 1437653945

Influx buffer limit = 8330 min(50% x 16660, 100000)

Start recovery at thread 1 ckpt scn 1046709 logseq 42 block 58

Initial buffer sizes: read 1024K, overflow 832K, change 805K

 

 

--------再看看当前的数据库scn时间戳及ckpt检查点记录的scn时间戳。

--------发现当前数据库的时间戳和恢复点ckpt时间戳不一致,怎么办??

sys@ORCL>select current_scn,checkpoint_change# from v$database;

CURRENT_SCN CHECKPOINT_CHANGE#

----------- ------------------

    1046708            1024546

 

-------利用ORA debug 修改数据库KCSGSCN(bbed 也可以直接修改)

---------修改备库当前检查点时间戳,把时间戳和恢复点时间戳改为一致

------------当前的数据库scn

sys@ORCL>oradebug setmypid

Statement processed.

sys@ORCL>oradebug dumpvar sga kcsgscn_

kcslf kcsgscn_ [06001AE70, 06001AEA0) = 000FF8B4 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 6001AB50 00000000

-------------计算一下000FF8B4该值为16进制转10进制得1046708

 

----先计算一下上面日志中ckpt scn 1046709的16进制值其实就是000FF8B5

sys@ORCL>select to_char(1046709,'xxxxxxxx') from dual;

TO_CHAR(1

---------

    ff8b5

 

----下面修改数据库scn

sys@ORCL>oradebug poke 0x06001AE70 8 0x00000000000ff8b5

BEFORE: [06001AE70, 06001AE78) = 000FF8B4 00000000

AFTER:  [06001AE70, 06001AE78) = 000FF8B5 00000000

 

sys@ORCL>oradebug dumpvar sga kcsgscn_

kcslf kcsgscn_ [06001AE70, 06001AEA0) = 000F2A26 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 6001AB50 00000000

 

------现在查看一下日志是否应用

sys@ORCL>select sequence#, applied from v$archived_log;

 SEQUENCE# APPLIED

---------- ---------

        41 YES

        41 YES

        42 NO

        43 NO

        42 YES

        43 YES

74 rows selected.

 

------重新开启日志应用在查看

sys@ORCL>alter database recover managed standby database using current logfile disconnect;

--------发现数据库hang住,强制退出,然后shut baort重启在应用日志

sys@ORCL>shut abort

sys@ORCL>startup

--------查看下当前scn

sys@ORCL>select current_scn,checkpoint_change# from v$database;

CURRENT_SCN CHECKPOINT_CHANGE#

----------- ------------------

    1046708            1024546

 

------进行日志应用

sys@ORCL>alter database recover managed standby database using current logfile disconnect;

Database altered.

 

--------再次查看当前scn

sys@ORCL>select current_scn,checkpoint_change# from v$database;

 

CURRENT_SCN CHECKPOINT_CHANGE#

----------- ------------------

    1047091            1024546

--------查看日志应用情况

sys@ORCL>select sequence#, applied from v$archived_log;

 SEQUENCE# APPLIED

---------- ---------

        41 YES

        41 YES

        42 YES

        43 YES

        42 YES

        43 IN-MEMORY

74 rows selected.

-------------哈哈!现在应用日志咯!!!

 

------查询下之前未应用的数据

sys@ORCL>select count(*) from shall;

  COUNT(*)

----------

    100000

 

 

3.4 手工注册主库redo日志

--------上面可以看到,数据归档部分数据已经完成恢复,但是主库shutdown之前的在线redo数据丢失,下面就找回主库在线redo数据

 

------复制主库redo到备库

------如果主库能看到current redo,可以只传current redo到备库

sys@ORCL>select group#,sequence#,status from v$log;

 

[oracle@zyx ~]$ scp /u01/app/oracle/oradata/orcl/redo0*.log 192.168.11.22:/home/oracle

oracle@192.168.11.22's password:

redo01.log                                        100%   50MB  16.7MB/s   00:03   

redo02.log                                        100%   50MB  50.0MB/s   00:01   

redo03.log                                        100%   50MB   4.6MB/s   00:11

 

------recover standby

------如果不知道主库current log,就每一个redo都应用一次,如果不对会报错,对了就会出现下面恢复完成的。当然你也可以dump redo查看里面scn,再确定恢复哪一个redo

sys@ORCL>alter database recover managed standby database cancel;

sys@ORCL>recover standby database until cancel;

ORA-00279: change 1047092 generated at 04/23/2016 04:11:57 needed for thread 1

ORA-00289: suggestion :

/u01/app/oracle/fast_recovery_area/ORCLPS/archivelog/2016_04_25/o1_mf_1_44_%u_.arc

ORA-00280: change 1047092 for thread 1 is in sequence #44

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/home/oracle/redo02.log

Log applied.

Media recovery complete.

 

----再次打开数据库,然后查询数据是否完成

sys@ORCL>alter database open;

Database altered.

 

sys@ORCL>select count(*) from shall;

  COUNT(*)

----------

   101234

------现在数据已经全部找回!!!下面把该备库强制转为主库

 

 

3.5 强制切换备库orclps为主库

------备库orclps切换为主库

sys@ORCL>select name,open_mode,database_role,db_unique_name,primary_db_unique_name from v$database;

NAME  OPEN_MODE  DATABASE_ROLE  DB_UNIQUE_NAME  PRIMARY_DB_UNIQUE_NAME

----- ---------- ---------------- --------------- ------------------------------

ORCL  READ ONLY  PHYSICAL STANDBY orclps        orcl

 

sys@ORCL>alter  database  activate  physical  standby  database;

 

sys@ORCL>select name,open_mode,database_role,db_unique_name,primary_db_unique_name from v$database;

NAME  OPEN_MODE  DATABASE_ROLE  DB_UNIQUE_NAME  PRIMARY_DB_UNIQUE_NAME

----- ---------- ---------------- --------------- ------------------------------

ORCL  MOUNTED    PRIMARY          orclps         orcl

 

sys@ORCL>alter database open;

 

---------备库修改scn后转为主库后,原主库切换为备库,无法应用日志,不知道如何处理,估计只能重建备库吧,告警日志如下,望高手解答

Started Parallel Media Recovery
*** 2016-04-25 19:48:59.677 4320 krsh.c
Managed Standby Recovery not using Real Time Apply
Datafile 1 belongs to incarnation with resetlogs SCN : 995548, timestamp: 363f3c82
Datafile 2 belongs to incarnation with resetlogs SCN : 995548, timestamp: 363f3c82
Datafile 3 belongs to incarnation with resetlogs SCN : 995548, timestamp: 363f3c82
Datafile 4 belongs to incarnation with resetlogs SCN : 995548, timestamp: 363f3c82
Datafile 5 belongs to incarnation with resetlogs SCN : 995548, timestamp: 363f3c82
Dumping database incarnation table:
Resetlogs 0 scn and time: 0x0000.0011d146 04/30/2016 05:23:05
Resetlogs 1 scn and time: 0x0000.000f30dc 04/25/2016 17:25:22
Recovery target incarnation = 3, activation ID = 0
Influx buffer limit = 18412 min(50% x 36825, 100000)
Start recovery at thread 1 ckpt scn 1167684 logseq 40 block 2
Initial buffer sizes: read 1024K, overflow 832K, change 805K


*** 2016-04-25 19:48:59.739
Media Recovery add redo thread 1
*** 2016-04-25 19:48:59.812 4320 krsh.c
Media Recovery Waiting for thread 1 sequence 40 branch(resetlogs_id) 910113922


*** 2016-04-25 19:49:21.838
*** 2016-04-25 19:49:21.838 4320 krsh.c
MRP0: Background Media Recovery cancelled with status 16037
ORA-16037: user requested cancel of managed recovery operation


*** 2016-04-25 19:49:21.846
Media Recovery drop redo thread 1


*** 2016-04-25 19:49:21.894
Completed Media Recovery
Managed Recovery: Not Active posted.

 

 

 

 

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30130773/viewspace-2119324/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/30130773/viewspace-2119324/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值