近日,应客户要求对生产服务器进行了一次巡检,发现了很多问题,逐一帮助排除,其中首要问题是存储空间的告警,特别是备机同步失败,特记录下本次DG恢复的过程。
一、验证是否同步
备库
archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/oracle/archive/netdata
Oldest online log sequence 63
Next log sequence to archive 0
Current log sequence 67
col name format A50;
col dest_name format A40
col error format A20
set line 200;
select name,sequence#,applied from v$archived_log a where a.sequence#=(select max(sequence#) from v$archived_log);
NAME SEQUENCE# APP
-------------------------------------------------- ---------- ---
/u01/app/oracle/archive/netdata/1_66_857898543.arc 66 YES
col dest_name format A40
SQL> select dest_name,status,error from v$archive_dest where rownum<3;
DEST_NAME STATUS ERROR
---------------------------------------- --------- --------------------
LOG_ARCHIVE_DEST_1 VALID
LOG_ARCHIVE_DEST_2 VALID
主库:
archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/oracle/archive/netdata
Oldest online log sequence 65
Next log sequence to archive 67
Current log sequence 67
col name format A50;
SQL> col dest_name format A40
col error format A20
set line 200;
select name,sequence#,applied from v$archived_log a where a.sequence#=(select max(sequence#) from v$archived_log);
SQL> col dest_name format A40
select dest_name,status,error from v$archive_dest where rownum<3;
NAME SEQUENCE# APP
-------------------------------------------------- ---------- ---
/u01/app/oracle/archive/netdata/1_66_857898543.arc 66 NO
netdata_sd 66 YES
SQL> SQL>
DEST_NAME STATUS ERROR
---------------------------------------- --------- --------------------
LOG_ARCHIVE_DEST_1 VALID
LOG_ARCHIVE_DEST_2 INACTIVE
#LOG_ARCHIVE_DEST_2这里不可用,是之前做处理的时候把这个参数置空了,先恢复过来
SQL> show parameter DEST_2
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_create_online_log_dest_2 string
log_archive_dest_2 string
alter system set log_archive_dest_2='service=SFLMESDG lgwr async valid_for=(online_logfiles,primary_roles) db_unique_name=SFLMESDG' scope=both;
SQL> select dest_name,status,error from v$archive_dest where rownum<3;
DEST_NAME STATUS ERROR
---------------------------------------- --------- --------------------
LOG_ARCHIVE_DEST_1 VALID
LOG_ARCHIVE_DEST_2 VALID
通过上面的检查处理后,先在主库生成RMAN备份
提示:要运用backup as compressed backupset,可以节省相当的空间
run
{
allocate channel c1 device type disk;
backup as compressed backupset format '/oradata_his/arch0/rmanbak_dg_%U.dbf' database;
sql 'alter system archive log current';
release channel c1;
}
在主库生成备份的过程中我同步进行下面的一些准备工作(以下内容根据各自情况可选)
#1. 备库空间整理,/disk/arch/sflmes是从库归档目录,空间略小,其中有些无用的归档日志,转移到其他存储位置,为后面备份文件导入预留空间
[oracle@sfloracle02 oradata_his]$ su - root
Password:
[root@sfloracle02 ~]# cd /oradata_his
[root@sfloracle02 oradata_his]# ls
hisdata
[root@sfloracle02 oradata_his]# mkdir disk_arch_sflmes
[root@sfloracle02 oradata_his]# mv /disk/arch/sflmes/lost_arch*.dbf /oradata_his/disk_arch_sflmes
[root@sfloracle02 oradata_his]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 3.2T 2.6T 490G 85% /
tmpfs 63G 2.7G 61G 5% /dev/shm
/dev/sda1 380M 85M 271M 24% /boot
/dev/nvme0n1p1 734G 8.3G 688G 2% /disk/arch/sflmes ***空间增加了
#2. 确定从库控制文件存放位置
SQL> select name from v$controlfile;
NAME
--------------------------------------------------------------------------------
/disk/oracle/oradata/SFLMES/control01.ctl
/disk/oracle/oradata/SFLMES/control02.ctl
/disk/oracle/oradata/SFLMES/control03.ctl
#3. 把已生成的备份文件复制到从库指定目录
scp rmanbak*.dbf root@192.168.70.254:/disk/arch/sflmes
#4. 检查主备目录存储使用情况
du -h --max-depth=1 /disk/oracle/oradata ***--max-depth参数指定检索目录层级
二、主库生成从库的控制文件
SQL> alter database create standby controlfile as '/oradata_his/control01.ctl';
Database altered.
三、主库生成的RMAN备份和控制文件放置到从库制定的位置
因为RMAN备份直接放置到了用从库建立了NFS上,所以无需额外的传输,但控制文件需要移动到备库的相应目录
[oracle@dgserver standby]$ scp /oradata_his/control01.ctl root@192.168.70.254:/disk/arch/sflmes
[root@***]chown oracle:oinstall /disk/arch/sflmes -R #文件夹拥有者修改为oracle
[oracle@dgserver standby]$ cp /disk/arch/sflmes/control01.ctl /disk/oracle/oradata/SFLMES/control01.ctl
[oracle@dgserver standby]$ cp control01.ctl control03.ctl
[oracle@dgserver standby]$ cp control01.ctl control02.ctl
# 启动备库到nomount
rman>restore standby controlfile from '/disk/oracle/oradata/SFLMES/control01.ctl';
四、从库RMAN恢复
[oracle@dgserver standby]$ rman target /
Recovery Manager: Release 10.2.0.5.0 - Production on Sun May 5 09:38:16 2013
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: PRIMARY (DBID=1656746419, not open)
RMAN> run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;
allocate channel c4 device type disk;
restore database;
switch datafile all;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
}
#报错,找不到备份文件,注册目录后重新执行
RMAN> catalog start with '/disk/arch/sflmes';
#执行完成恢复后,同步应用日志出错
SQL>show parameter dump **查看告警日志所在目录
[oracle@dgserver standby]$ cd /disk/oracle/diag/rdbms/sflmesdg/SFLMES/trace
在查看告警日志的过程中发现应用日志还是正常在走的
#由于磁盘空间有限,需要定期删除无效或已应用的归档日志文件节省空间,root用户下
mkdir -p /home/temp
cd temp
mkdir bin log tmp
chown -R oracle:oinstall /home/temp/bin
chown -R oracle:oinstall /home/temp/log
chown -R oracle:oinstall /home/temp/tmp
cd /home/temp/bin
vi del_arc.sh
vi del_arc.sh
#! /bin/bash
source ~/.bash_profile
$ORACLE_HOME/bin/rman target /<<EOF
crosscheck archivelog all;
delete noprompt expired archivelog all;
delete noprompt archivelog until time 'sysdate-7';
exit;
EOF
crontab -e
* 1 * * * /home/temp/bin/del_arc.sh
#清理其他空间
du -lh --max-depth=1 /
查看文件夹大小,--max-depth:设置递归层级 ,/:指定文件夹,可选,不写则查看当前文件夹,
统计当前文件夹(目录)大小,并按文件大小排序
du -sh * | sort -n
# 以下为查询得到的占用空间最大的部分文件夹
1.1T /disk >> 1.1T /disk/oracle >>1.1T /disk/oracle/oradata>>1.1T # /disk/oracle/oradata/SFLMES ***都是数据文件不能删
1.6T /u01>>1.5T /u01/archbak