重启后,两节点的数据库可以正常启动了,但是数据文件的坏块还存在
SQL> run
1* select name,status from v$datafile
NAME STATUS
-------------------- --------------------
/dev/rlv_system_8g SYSTEM
/dev/rlv_undot11_8g ONLINE
/dev/rlv_sysaux_8g ONLINE
/dev/rlv_user_8g ONLINE
/dev/rlv_undot12_8g ONLINE
/dev/rlv_raw37_16g RECOVER
在 zhyw2上做恢复操作,对这个坏块尝试恢复:
SQL> recover datafile '/dev/rlv_raw37_16g';
ORA-00279: change 11318004822236 generated at 08/13/2010 16:42:39 needed for
thread 2
ORA-00289: suggestion : /arch2/bsp1922_2_229_713969898.arc
ORA-00280: change 11318004822236 for thread 2 is in sequence #229
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 11321028506146 generated at 08/17/2010 09:42:36 needed for
thread 2
ORA-00289: suggestion : /arch2/bsp1922_2_230_713969898.arc
ORA-00280: change 11321028506146 for thread 2 is in sequence #230
ORA-00278: log file '/arch2/bsp1922_2_229_713969898.arc' no longer needed for
this recovery
Log applied.
Media recovery complete.
数据文件成功被修复了,这时,它处于offline状态,我敲入下面的命令,把它恢复到online状态:
alter database datafile '/dev/rlv_raw37_16g' online;
客户反映临时表空间老是报空间不足的错误,让我乘着这次停机,也帮他看看,
我看了下,确认当前oravg7上是否有空闲空间,
[root@zhyw2]#lsvg -l oravg7
oravg7:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lv_raw93_16g raw 512 512 1 open/syncd N/A
lv_raw94_16g raw 512 512 1 open/syncd N/A
lv_raw95_16g raw 512 512 1 open/syncd N/A
lv_raw96_16g raw 512 512 1 open/syncd N/A
lv_raw97_16g raw 512 512 1 closed/syncd N/A
准备使用 lv_raw97_16g 这个lv 做为新增的temp 表空间的一个文件
查看两节点该lv对应的权限:
权限正常
[root@zhyw2]#cd /dev/
[root@zhyw2]#ls -l *raw97*
brw-rw---- 1 root system 106, 5 Mar 26 05:43 lv_raw97_16g
crw-rw---- 1 oracle dba 106, 5 Mar 26 05:43 rlv_raw97_16g
[root@zhyw2]#
[root@zhyw1]#ls -l *raw97*
brw-rw---- 1 root system 106, 5 Mar 16 14:49 lv_raw97_16g
crw-rw-r-- 1 oracle oinstall 106, 5 Mar 16 14:49 rlv_raw97_16g
使用oracle 命令增加temp 表空间
alter tablespace temp add tempfile '/dev/rlv_raw97_16g' size 15872m;
所有的活都干完了,
等黄工他们把业务起起来,确认了应用没有问题之后,
我又在客户现场等待了一段时间。一直状态正常。
我想,我终于可以回家睡个好觉了。
下午4点多,正在迷迷糊糊中,又接到了丁工的电话:“程工,不好意思,我们的生产库又出问题了,他们业务反映查询很慢,你还是过来再看看吧." 哎,看来真是好事多磨呀,这回又有什么问题在等着我呢?(未完待续)
到客户现场后,应用开发商负责的黄工已经在等着我了,他告诉我现在的数据库很不正常,虽然没有出现错误信息但是数据库一做sort操作,就变得很慢。而且一些应用的查询语句频繁的发生错误。 黄工很怀疑EMC的存储还是有坏块问题,并严重怀疑temp 表空间对应的lv 下有坏块。我查看了下当前两节点的IO使用率,感觉还比较正常,再做了下节点的awr report, 信息如下: zhyw2: 命中率: Buffer Nowait %: 99.95 Redo NoWait %: 100.00 Buffer Hit %: 99.90 In-memory Sort %: 100.00 Library Hit %: 79.74 Soft Parse %: 65.09 Execute to Parse %: 60.45 Latch Hit %: 98.06 Parse CPU to Parse Elapsd %: 8.98 % Non-Parse CPU: 99.68 等待事件 Top 5 Timed Events Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class CPU time 1,335,611 68.5 enq: CF - contention 503,175 239,024 475 12.3 Other enq: TS - contention 388,571 175,924 453 9.0 Other library cache lock 74,915 31,273 417 1.6 Concurrency gc cr multi block request 162,634,284 29,927 0 1.5 Cluster zhyw1: Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 99.87 In-memory Sort %: 100.00 Library Hit %: 80.86 Soft Parse %: 61.23 Execute to Parse %: 64.13 Latch Hit %: 98.09 Parse CPU to Parse Elapsd %: 12.10 % Non-Parse CPU: 99.56 等待事件: Top 5 Timed Events Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class CPU time 942,235 40.9 enq: CF - contention 847,042 404,685 478 17.6 Other enq: DX - contention 71,048 208,067 2,929 9.0 Other enq: TS - contention 140,442 63,747 454 2.8 Other inactive transaction branch 35,518 34,682 976 1.5 Other 我发现,群集实例library cache命中率偏低。 在全局等待事件中,锁资源等待比较突出。 我看了下两个节点的SGA_MAX_SIZE ,SGA_TARGET。他们的当前值都是16G。 考虑到当前的RAC环境下,单节点数据库服务器的内存就有72G,且平时空闲率>40%,可以考虑多划分一点出来给SGA。 我怀疑因为最近业务量增长比较大,系统原有的一些资源也出现了瓶颈,所以才出现了访问缓慢的情况。 但是鉴于 EMC DMX1500之前出过错误,它的影响了系统IO 的可能性也是有的,所以我先建议EMC 工程师检查下划给此套RAC系统的disk 资源是否有坏块。 使用下面的命令,查看磁盘的信息: # ./symdev list pd Symmetrix ID: 隐藏 Device Name Directors Device --------------------------- ------------- ------------------------------------- Cap Sym Physical SA :P DA :IT Config Attribute Sts (MB) --------------------------- ------------- ------------------------------------- 0022 /dev/rhdisk3 08A:1 01A:C0 2-Way Mir N/Grp'd VCM WD 3 0029 /dev/rhdiskpower0 08A:1 16B:D1 2-Way Mir N/Grp'd RW 3 002A /dev/rhdiskpower1 08A:1 01A:C2 2-Way Mir N/Grp'd RW 3 0059 /dev/rhdiskpower64 09A:1 16B:C2 2-Way Mir N/Grp'd RW 3 005A /dev/rhdiskpower65 09A:1 01C:C2 2-Way Mir N/Grp'd RW 3 0084 /dev/rhdiskpower2 08A:1 16A:D0 2-Way Mir N/Grp'd RW 1024 0085 /dev/rhdiskpower3 08A:1 16B:D3 2-Way Mir N/Grp'd RW 1024 013A /dev/rhdiskpower4 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140 013E /dev/rhdiskpower5 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140 0142 /dev/rhdiskpower6 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140 0146 /dev/rhdiskpower7 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140 014A /dev/rhdiskpower8 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140 014E /dev/rhdiskpower9 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140 0152 /dev/rhdiskpower10 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140 0156 /dev/rhdiskpower11 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140 015A /dev/rhdiskpower12 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140 015E /dev/rhdiskpower13 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140 0162 /dev/rhdiskpower14 08A:1 01A:DB RDF1+Mir Grp'd (M) RW 49140 0166 /dev/rhdiskpower15 08A:1 16A:DA RDF1+Mir Grp'd (M) RW 49140 016A /dev/rhdiskpower16 08A:1 01A:D5 RDF1+Mir Grp'd (M) RW 49140 016E /dev/rhdiskpower17 08A:1 16A:D4 RDF1+Mir Grp'd (M) RW 49140 0172 /dev/rhdiskpower18 08A:1 01A:D7 RDF1+Mir Grp'd (M) RW 49140 0176 /dev/rhdiskpower19 08A:1 16A:D6 RDF1+Mir Grp'd (M) RW 49140 017A /dev/rhdiskpower20 08A:1 16C:C3 RDF1+Mir Grp'd (M) RW 49140 017E /dev/rhdiskpower21 08A:1 01C:C2 RDF1+Mir Grp'd (M) RW 49140 0182 /dev/rhdiskpower22 08A:1 16C:C1 RDF1+Mir Grp'd (M) RW 49140 0186 /dev/rhdiskpower23 08A:1 01C:C0 RDF1+Mir Grp'd (M) RW 49140 018A /dev/rhdiskpower24 08A:1 16C:D0 RDF1+Mir Grp'd (M) RW 49140 018E /dev/rhdiskpower25 08A:1 16C:DA RDF1+Mir Grp'd (M) RW 49140 0192 /dev/rhdiskpower26 08A:1 01C:D9 RDF1+Mir Grp'd (M) RW 49140 0196 /dev/rhdiskpower27 08A:1 16C:DC RDF1+Mir Grp'd (M) RW 49140 019A /dev/rhdiskpower28 08A:1 01C:DB RDF1+Mir Grp'd (M) RW 49140 019E /dev/rhdiskpower29 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140 01A2 /dev/rhdiskpower30 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140 01A6 /dev/rhdiskpower31 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140 01AA /dev/rhdiskpower32 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140 01AE /dev/rhdiskpower33 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140 01B2 /dev/rhdiskpower34 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140 01B6 /dev/rhdiskpower35 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140 01BA /dev/rhdiskpower36 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140 01BE /dev/rhdiskpower37 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140 01C2 /dev/rhdiskpower38 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140 01C6 /dev/rhdiskpower39 08A:1 01A:DB RDF1+Mir Grp'd (M) RW 49140 01CA /dev/rhdiskpower40 08A:1 16A:DA RDF1+Mir Grp'd (M) RW 49140 01CE /dev/rhdiskpower41 08A:1 01A:D5 RDF1+Mir Grp'd (M) RW 49140 01D2 /dev/rhdiskpower42 08A:1 16A:D4 RDF1+Mir Grp'd (M) RW 49140 01D6 /dev/rhdiskpower43 08A:1 01A:D7 RDF1+Mir Grp'd (M) RW 49140 01DA /dev/rhdiskpower44 08A:1 16A:D6 RDF1+Mir Grp'd (M) RW 49140 01DE /dev/rhdiskpower45 08A:1 16C:C3 RDF1+Mir Grp'd (M) RW 49140 01E2 /dev/rhdiskpower46 08A:1 01C:C2 RDF1+Mir Grp'd (M) RW 49140 01E6 /dev/rhdiskpower47 08A:1 16C:C1 RDF1+Mir Grp'd (M) RW 49140 01EA /dev/rhdiskpower48 08A:1 01C:C0 RDF1+Mir Grp'd (M) RW 49140 01EE /dev/rhdiskpower49 08A:1 16C:D0 RDF1+Mir Grp'd (M) RW 49140 01F2 /dev/rhdiskpower50 08A:1 16C:DA RDF1+Mir Grp'd (M) RW 49140 01F6 /dev/rhdiskpower51 08A:1 01C:D9 RDF1+Mir Grp'd (M) RW 49140 01FA /dev/rhdiskpower52 08A:1 16C:DC RDF1+Mir Grp'd (M) RW 49140 01FE /dev/rhdiskpower53 08A:1 01C:DB RDF1+Mir Grp'd (M) RW 49140 0202 /dev/rhdiskpower54 08A:1 16B:CE RDF1+Mir Grp'd (M) RW 49140 0206 /dev/rhdiskpower55 08A:1 01C:D5 RDF1+Mir Grp'd (M) RW 49140 020A /dev/rhdiskpower56 08A:1 16C:D6 RDF1+Mir Grp'd (M) RW 49140 020E /dev/rhdiskpower57 08A:1 01C:D7 RDF1+Mir Grp'd (M) RW 49140 0212 /dev/rhdiskpower58 08A:1 16C:D8 RDF1+Mir Grp'd (M) RW 49140 0216 /dev/rhdiskpower59 08A:1 16A:C3 RDF1+Mir Grp'd (M) RW 49140 021A /dev/rhdiskpower60 08A:1 01A:C2 RDF1+Mir Grp'd (M) RW 49140 021E /dev/rhdiskpower61 08A:1 16A:C1 RDF1+Mir Grp'd (M) RW 49140 0222 /dev/rhdiskpower62 08A:1 01A:D9 RDF1+Mir Grp'd (M) RW 49140 0226 /dev/rhdiskpower63 08A:1 16A:D8 RDF1+Mir Grp'd (M) RW 49140 我建议EMC的王工检查 ID 为 0022-0226之间的磁盘坏块情况。 同时,当前此套数据库已经搭建好了 从DMX 1500到 DMX950的SRDF关系,但是还没有同步。 因此有必要把他们之间的底层同步启动起来。保证业务数据的安全。我建议王工把SRDF跑起来。 我们选择了下面的方式启动SRDF关系: EMC工程师做了下面的操作,检查相关信息: # ./symcfg disc This operation may take up to a few minutes. Please be patient... [root@zhyw1]#./symdg list D E V I C E G R O U P S Number ofName Type Valid Symmetrix ID Devs GKs BCVs VDEVs TGTs zhyw_1500_950 RDF1 Yes 隐藏 60 0 0 0 0 [root@zhyw1]#./symrdf -g zhyw_1500_950 query |more Device Group (DG) Name : zhyw_1500_950 DG's Type : RDF1 DG's Symmetrix ID : 隐藏 (Microcode Version: 5773) Remote Symmetrix ID : 000290301387 (Microcode Version: 5773) RDF (RA) Group Number : 2 (01) Source (R1) View Target (R2) View MODES -------------------------------- ------------------------ ----- ------------ ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE -------------------------------- -- ------------------------ ----- ------------ DEV001 013A RW 0 786240 NR 02EB WD 0 786240 C.D Suspended DEV002 013E RW 0 786240 NR 02EF WD 0 786240 C.D Suspended DEV003 0142 RW 0 786240 NR 02F3 WD 0 786240 C.D Suspended DEV004 0146 RW 0 786240 NR 02F7 WD 0 786240 C.D Suspended DEV005 014A RW 0 786240 NR 02FB WD 0 786240 C.D Suspended DEV006 014E RW 0 786240 NR 02FF WD 0 786240 C.D Suspended DEV007 0152 RW 0 786240 NR 0303 WD 0 786240 C.D Suspended DEV008 0156 RW 0 786240 NR 0307 WD 0 786240 C.D Suspended DEV009 015A RW 0 786240 NR 030B WD 0 786240 C.D Suspended DEV010 015E RW 0 786240 NR 030F WD 0 786240 C.D Suspended DEV011 0162 RW 0 786240 NR 0313 WD 0 786240 C.D Suspended DEV012 0166 RW 0 786240 NR 0317 WD 0 786240 C.D Suspended DEV013 016A RW 0 786240 NR 031B WD 0 786240 C.D Suspended DEV014 016E RW 0 786240 NR 031F WD 0 786240 C.D Suspended DEV015 0172 RW 0 786240 NR 0323 WD 0 786240 C.D Suspended DEV016 0176 RW 0 786240 NR 0327 WD 0 786240 C.D Suspended DEV017 017A RW 0 786240 NR 032B WD 0 786240 C.D Suspended DEV018 017E RW 0 786240 NR 032F WD 0 786240 C.D Suspended DEV019 0182 RW 0 786240 NR 0333 WD 0 786240 C.D Suspended DEV020 0186 RW 0 786240 NR 0337 WD 0 786240 C.D Suspended DEV021 018A RW 0 786240 NR 033B WD 0 786240 C.D Suspended DEV022 018E RW 0 786240 NR 033F WD 0 786240 C.D Suspended DEV023 0192 RW 0 786240 NR 0343 WD 0 786240 C.D Suspended DEV024 0196 RW 0 786240 NR 0347 WD 0 786240 C.D Suspended DEV025 019A RW 0 786240 NR 034B WD 0 786240 C.D Suspended DEV026 019E RW 0 786240 NR 034F WD 0 786240 C.D Suspended DEV027 01A2 RW 0 786240 NR 0353 WD 0 786240 C.D Suspended DEV028 01A6 RW 0 786240 NR 0357 WD 0 786240 C.D Suspended DEV029 01AA RW 0 786240 NR 035B WD 0 786240 C.D Suspended DEV030 01AE RW 0 786240 NR 035F WD 0 786240 C.D Suspended DEV031 01B2 RW 0 786240 NR 0363 WD 0 786240 C.D Suspended DEV032 01B6 RW 0 786240 NR 0367 WD 0 786240 C.D Suspended DEV033 01BA RW 0 786240 NR 036B WD 0 786240 C.D Suspended DEV034 01BE RW 0 786240 NR 036F WD 0 786240 C.D Suspended DEV035 01C2 RW 0 786240 NR 0373 WD 0 786240 C.D Suspended DEV036 01C6 RW 0 786240 NR 0377 WD 0 786240 C.D Suspended DEV037 01CA RW 0 786240 NR 037B WD 0 786240 C.D Suspended DEV038 01CE RW 0 786240 NR 037F WD 0 786240 C.D Suspended DEV039 01D2 RW 0 786240 NR 0383 WD 0 786240 C.D Suspended DEV040 01D6 RW 0 786240 NR 0387 WD 0 786240 C.D Suspended DEV041 01DA RW 0 786240 NR 038B WD 0 786240 C.D Suspended DEV042 01DE RW 0 786240 NR 038F WD 0 786240 C.D Suspended DEV043 01E2 RW 0 786240 NR 0393 WD 0 786240 C.D Suspended DEV044 01E6 RW 0 786240 NR 0397 WD 0 786240 C.D Suspended DEV045 01EA RW 0 786240 NR 039B WD 0 786240 C.D Suspended DEV046 01EE RW 0 786240 NR 039F WD 0 786240 C.D Suspended DEV047 01F2 RW 0 786240 NR 03A3 WD 0 786240 C.D Suspended DEV048 01F6 RW 0 786240 NR 03A7 WD 0 786240 C.D Suspended DEV049 01FA RW 0 786240 NR 03AB WD 0 786240 C.D Suspended DEV050 01FE RW 0 786240 NR 03AF WD 0 786240 C.D Suspended DEV051 0202 RW 0 786240 NR 03B3 WD 0 786240 C.D Suspended DEV052 0206 RW 0 786240 NR 03B7 WD 0 786240 C.D Suspended DEV053 020A RW 0 786240 NR 03BB WD 0 786240 C.D Suspended DEV054 020E RW 0 786240 NR 03BF WD 0 786240 C.D Suspended DEV055 0212 RW 0 786240 NR 03C3 WD 0 786240 C.D Suspended DEV056 0216 RW 0 786240 NR 03C7 WD 0 786240 C.D Suspended DEV057 021A RW 0 786240 NR 03CB WD 0 786240 C.D Suspended DEV058 021E RW 0 786240 NR 03CF WD 0 786240 C.D Suspended DEV059 0222 RW 0 786240 NR 03D3 WD 0 786240 C.D Suspended DEV060 0226 RW 0 786240 NR 03D7 WD 0 786240 C.D Suspended Total -------- -------- -------- -------- Track(s) 0 47174400 0 47174400 MB(s) 0 2948400 0 2948400 Legend for MODES: M(ode of Operation): A = Async, S = Sync, E = Semi-sync, C = Adaptive Copy D(omino) : X = Enabled, . = Disabled A(daptive Copy) : D = Disk Mode, W = WP Mode, . = ACp off 采用下面的命令来开始源存储到目标存储的同步操作: [root@zhyw1]#./symrdf -g zhyw_1500_950 resume Execute an RDF 'Resume' operation for device group 'zhyw_1500_950' (y/[n]) ? y An RDF 'Resume' operation execution is in progress for device group 'zhyw_1500_950'. Please wait... Merge device track tables between source and target.......Started. Devices: 015A-0179 in (3435,002)......................... Merged. Devices: 013A-0159 in (3435,002)......................... Merged. Devices: 019A-01B9 in (3435,002)......................... Merged. Devices: 017A-0199 in (3435,002)......................... Merged. Devices: 01BA-01D9 in (3435,002)......................... Merged. Devices: 01DA-01F9 in (3435,002)......................... Merged. Devices: 021A-0229 in (3435,002)......................... Merged. Devices: 01FA-0219 in (3435,002)......................... Merged. Merge device track tables between source and target.......Done. Resume RDF link(s)........................................Started. Resume RDF link(s)........................................Done. The RDF 'Resume' operation successfully executed for device group 'zhyw_1500_950'. 我们使用了下面的方式来查看srdf的同步情况 [root@zhyw1]#./symrdf -g zhyw_1500_950 query –i 5 我查了下,同步稳定的时候,从源存储到目标存储的同步速度大概在280MB/s。 因为整个同步过程大概需要3小时,我决定趁着这个功夫,去沙发小趟一下。 丁工进监控室后, 发现我躺在沙发上,建议我去他们的休息室休息。我想想也好,养足精深再干。 睡梦中被叫醒,应用系统的许工告诉我同步已经差不多结束了,但是同步到后来, 发现同步速度很慢。 srdf 同步 缓慢: [root@zhyw1]#./symrdf -g zhyw_1500_950 query –i 5 DEV001 013A RW 0 0 RW 02EB WD 0 0 S.. Synchronized DEV002 013E RW 0 0 RW 02EF WD 0 0 S.. Synchronized DEV003 0142 RW 0 0 RW 02F3 WD 0 0 S.. Synchronized DEV004 0146 RW 0 0 RW 02F7 WD 0 0 S.. Synchronized DEV005 014A RW 0 31569 RW 02FB WD 0 31569 S.. SyncInProg DEV006 014E RW 0 0 RW 02FF WD 0 0 S.. Synchronized DEV007 0152 RW 0 0 RW 0303 WD 0 0 S.. Synchronized DEV008 0156 RW 0 0 RW 0307 WD 0 0 S.. Synchronized DEV009 015A RW 0 0 RW 030B WD 0 0 S.. Synchronized DEV010 015E RW 0 0 RW 030F WD 0 0 S.. Synchronized DEV011 0162 RW 0 0 RW 0313 WD 0 0 S.. Synchronized DEV012 0166 RW 0 0 RW 0317 WD 0 0 S.. Synchronized DEV013 016A RW 0 0 RW 031B WD 0 0 S.. Synchronized DEV014 016E RW 0 0 RW 031F WD 0 0 S.. Synchronized DEV015 0172 RW 0 0 RW 0323 WD 0 0 S.. Synchronized DEV016 0176 RW 0 0 RW 0327 WD 0 0 S.. Synchronized DEV017 017A RW 0 35641 RW 032B WD 0 35641 S.. SyncInProg DEV018 017E RW 0 0 RW 032F WD 0 0 S.. Synchronized DEV019 0182 RW 0 0 RW 0333 WD 0 0 S.. Synchronized DEV020 0186 RW 0 0 RW 0337 WD 0 0 S.. Synchronized DEV021 018A RW 0 0 RW 033B WD 0 0 S.. Synchronized DEV022 018E RW 0 0 RW 033F WD 0 0 S.. Synchronized DEV023 0192 RW 0 0 RW 0343 WD 0 0 S.. Synchronized DEV024 0196 RW 0 0 RW 0347 WD 0 0 S.. Synchronized DEV025 019A RW 0 0 RW 034B WD 0 0 S.. Synchronized DEV026 019E RW 0 0 RW 034F WD 0 0 S.. Synchronized DEV027 01A2 RW 0 0 RW 0353 WD 0 0 S.. Synchronized DEV028 01A6 RW 0 0 RW 0357 WD 0 0 S.. Synchronized DEV029 01AA RW 0 0 RW 035B WD 0 0 S.. Synchronized DEV030 01AE RW 0 33371 RW 035F WD 0 33363 S.. SyncInProg DEV031 01B2 RW 0 0 RW 0363 WD 0 0 S.. Synchronized DEV032 01B6 RW 0 0 RW 0367 WD 0 0 S.. Synchronized DEV033 01BA RW 0 0 RW 036B WD 0 0 S.. Synchronized DEV034 01BE RW 0 0 RW 036F WD 0 0 S.. Synchronized DEV035 01C2 RW 0 0 RW 0373 WD 0 0 S.. Synchronized DEV036 01C6 RW 0 0 RW 0377 WD 0 0 S.. Synchronized DEV037 01CA RW 0 0 RW 037B WD 0 0 S.. Synchronized DEV038 01CE RW 0 0 RW 037F WD 0 0 S.. Synchronized DEV039 01D2 RW 0 0 RW 0383 WD 0 0 S.. Synchronized DEV040 01D6 RW 0 0 RW 0387 WD 0 0 S.. Synchronized DEV041 01DA RW 0 0 RW 038B WD 0 0 S.. Synchronized DEV042 01DE RW 0 30881 RW 038F WD 0 30848 S.. SyncInProg DEV043 01E2 RW 0 0 RW 0393 WD 0 0 S.. Synchronized DEV044 01E6 RW 0 0 RW 0397 WD 0 0 S.. Synchronized DEV045 01EA RW 0 0 RW 039B WD 0 0 S.. Synchronized DEV046 01EE RW 0 0 RW 039F WD 0 0 S.. Synchronized DEV047 01F2 RW 0 0 RW 03A3 WD 0 0 S.. Synchronized DEV048 01F6 RW 0 0 RW 03A7 WD 0 0 S.. Synchronized DEV049 01FA RW 0 0 RW 03AB WD 0 0 S.. Synchronized DEV050 01FE RW 0 0 RW 03AF WD 0 0 S.. Synchronized DEV051 0202 RW 0 0 RW 03B3 WD 0 0 S.. Synchronized DEV052 0206 RW 0 0 RW 03B7 WD 0 0 S.. Synchronized DEV053 020A RW 0 0 RW 03BB WD 0 0 S.. Synchronized DEV054 020E RW 0 0 RW 03BF WD 0 0 S.. Synchronized DEV055 0212 RW 0 30301 RW 03C3 WD 0 30301 S.. SyncInProg DEV056 0216 RW 0 0 RW 03C7 WD 0 0 S.. Synchronized DEV057 021A RW 0 0 RW 03CB WD 0 0 S.. Synchronized DEV058 021E RW 0 0 RW 03CF WD 0 0 S.. Synchronized DEV059 0222 RW 0 0 RW 03D3 WD 0 0 S.. Synchronized DEV060 0226 RW 0 0 RW 03D7 WD 0 0 S.. Synchronized Total -------- -------- -------- -------- Track(s) 0 161763 0 161722 MB(s) 0.0 10110.2 0.0 10107.6 Synchronization rate : 0.0 MB/S Estimated time to completion : 3 days, 02:53:24 Legend for MODES: M(ode of Operation): A = Async, S = Sync, E = Semi-sync, C = Adaptive Copy D(omino) : X = Enabled, . = Disabled A(daptive Copy) : D = Disk Mode, W = WP Mode, . = ACp off 我看了下,现在同步基本已经hang在那里了.当然,对应的也只有几个disk 没有同步结束了。 Emc工程师解释说,因为现在的磁盘通道变少了,所以同步变慢了。 我想,如果停止对源存储的IO访问,应该可以加快同步的进程,于是我问黄工,现在是不是可以停止数据库应用,他说没问题,申请停机时间一直到凌晨5点,现在还有时间。 于是我很快的停掉了数据库应用。 果然,停止应用后,同步速度又上来了,达到了38M/s.我们都感到很兴奋。 很快同步就只剩下一个DEV了。但是奇怪的是,这个dev的同步一致没有完成。 看到这么慢的速度,EMC工程师也无法解释了,他们马上去检查了阵列的情况, 过了一会,EMC 张工给了我确认,下面的这块盘上有坏块: DEV017 017A RW 0 1 RW 032B WD 0 1 S.. SyncInProg 我查了下 017A对应的是 hdiskpower20, 017A /dev/rhdiskpower20 09B:1 16C:C3 RDF1+Mir N/Grp'd (M) RW 49140 而这块盘正好对应了temp 表空间所在的 vg oravg7 看来黄工最开始的怀疑是正确的! 磁盘有问题,必须修复,但如何修复值得仔细研究。 我首先查看了下这个磁盘影响到的数据库文件有多少。 使用下面命令查看到磁盘对应的pdisk ./symdev list pd 017A /dev/rhdiskpower20 09B:1 16C:C3 RDF1+Mir N/Grp'd (M) RW 49140 使用下面命令查看到pdisk对应的 vg Lspv hdiskpower16 00c450b57c60a7cf oravg7 concurrent hdiskpower17 00c450b57c56d256 oravg7 concurrent hdiskpower18 00c450b57c5a2e83 oravg7 concurrent hdiskpower19 00c450b57c5ee285 oravg7 concurrent hdiskpower20 00c450b57c530a23 oravg7 concurrent hdiskpower21 00c450b57c57d497 oravg7 concurrent hdiskpower22 00c450b57c5bb0e0 oravg7 concurrent hdiskpower23 00c450b57c5f9003 oravg7 concurrent 查看此vg下的lv信息: # lsvg -l oravg7 oravg7: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT lv_raw93_16g raw 512 512 1 open/syncd N/A lv_raw94_16g raw 512 512 1 open/syncd N/A lv_raw95_16g raw 512 512 1 open/syncd N/A lv_raw96_16g raw 512 512 1 open/syncd N/A lv_raw97_16g raw 512 512 1 open/syncd N/A lv_raw98_16g raw 512 512 1 closed/syncd N/A lv_raw99_16g raw 512 512 1 closed/syncd N/A lv_raw100_16g raw 512 512 1 closed/syncd N/A lv_raw101_16g raw 512 512 1 closed/syncd N/A lv_raw102_16g raw 512 512 1 closed/syncd N/A lv_raw104_16g raw 512 512 1 closed/syncd N/A lv_raw105_16g raw 512 512 1 closed/syncd N/A lv_raw106_16g raw 512 512 1 closed/syncd N/A lv_raw107_16g raw 512 512 1 closed/syncd N/A lv_raw108_16g raw 512 512 1 closed/syncd N/A lv_raw109_16g raw 512 512 1 closed/syncd N/A lv_raw110_16g raw 512 512 2 closed/syncd N/A lv_raw111_16g raw 512 512 2 closed/syncd N/A lv_raw112_16g raw 512 512 2 closed/syncd N/A lv_raw113_16g raw 512 512 2 closed/syncd N/A lv_raw114_16g raw 512 512 2 closed/syncd N/A lv_raw115_16g raw 512 512 2 closed/syncd N/A lv_raw116_16g raw 512 512 2 closed/syncd N/A 在数据库层次查看lv使用情况: 对应了表空间 BSPLOG,temp,以及SRADM_TBS,即这些表空间受影响 SQL> run 1* select a.name,b.name from v$tablespace a ,v$datafile b where a.ts#=b.ts# BSPLOG /dev/rlv_raw94_16g BSPLOG /dev/rlv_raw93_16g BSPLOG /dev/rlv_raw95_16g SRADM_TBS /dev/rlv_raw96_16g SQL> select name from v$tempfile; NAME -------------------------------------------------------------------------------- /dev/rlv_temp_8g /dev/rlv_raw97_16g 王工和我解释,这个BSPLOG 表空间信息很少,而且容易恢复。而SRADM_TBS是之前流复制应用的表空间,可以忽略,最后一个lv对应了临时表空间的文件,也可以修复。 听到这么解释,我也有了底气: 准备采用下面的方式来恢复: 1>对数据库的BSPLOG表空间用户进行exp备份。 2>调整临时表空间的位置到没有问题的 oravg3,oravg6,再停止数据库 3>EMC工程师恢复底层的disk fracture 4>开启数据库,如果有问题则恢复 5>把BSPLOG表空间上的用户,表空间重建并利用第一步的备份来imp数据 好,可以开始行动了。 1》首先建立lv,供临时表空间使用,并赋予权限 mklv -y 'lv_temp2_14g' -T O -w n -s n -r n -t raw oravg3 448 mklv -y 'lv_temp3_14g' -T O -w n -s n -r n -t raw oravg6 448 mklv -y 'lv_templinshi_14g' -T O -w n -s n -r n -t raw oravg2 448 # chown oracle:oinstall /dev/rlv_temp2_14g # chown oracle:oinstall /dev/rlv_temp3_14g [root@zhyw2]#chown oracle:oinstall /dev/rlv_templinshi_1 2》新建temp1 表空间,利用temp1做桥梁重建temp create temporary tablespace temp1 tempfile '/dev/rlv_templinshi_1 ' size 500M reuse autoextend on next 100M maxsize unlimited extent management local uniform size 1M; SQL> create temporary tablespace temp1 2 tempfile '/dev/rlv_templinshi_1' size 500m reuse 3 autoextend on next 100m maxsize unlimited 4 extent management local uniform size 1m; Tablespace created. SQL> alter database default temporary tablespace temp1; Database altered. SQL> drop tablespace temp including contents and datafiles; Tablespace dropped. SQL> create temporary tablespace temp 2 tempfile '/dev/rlv_temp_8g' size 8000m reuse 3 autoextend on next 100m maxsize unlimited 4 extent management local uniform size 1m; Tablespace created. SQL> alter tablespace temp add tempfile '/dev/rlv_temp2_14g' size 13824m; Tablespace altered. SQL> alter tablespace temp add tempfile '/dev/rlv_temp3_14g' size 13824m; Tablespace altered. alter database default temporary tablespace temp; drop tablespace temp1 including contents and datafiles; 3》做数据库 BSPLOG 表空间对应的用户的备份 giaplog nohup exp giaplog/password file='/tmp/exp/giaplog.dmp' owner=giaplog log='/tmp/exp/giaplog.log' & 4》 EMC工程师恢复错误 5》启动数据库,关闭监听后,重建bsplog表空间和上面的用户,并导回数据 6》分析数据 SQL> execute dbms_utility.analyze_schema('GIAPLOG','COMPUTE'); PL/SQL procedure successfully completed Executed in 6.9 seconds

本文记录了一次数据库重启后遇到的数据文件坏块问题及其解决过程。通过查询数据文件状态、调整表空间位置等步骤,最终成功修复了坏块,并恢复了数据库的正常运行。
1万+





