Monday, February 04, 2008

Practicing Block Recovery in Oracle Database

Dear all,

During backup and recovery practice sessions, we often struggle to perform block recovery scenario. This is because we find it difficult to corrupt an Oracle block.

I performed this test on Oracle 10g Release 2 (10.2.0.1) on Windows XP and for the purpose of this test we need to keep our database in archivelog mode. In this article, I will discuss how to corrupt an Oracle data block, but before beginning this discussion, I would like to answer:

Why to corrupt an Oracle Block?

We will be corrupting an Oracle block in order to practice recovery procedures involved when one encounters a Block Corruption in a production environment. If a block gets corrupted in any of our production databases we will be in a position to rectify and correct the error instead of wandering for help.

This is purely for educational purpose and please do not practice this on any of your production/development/testing databases, rather create a new database for this purpose and practice it there.

For the purpose of this test, I have created a separate tablespace and a new schema.

SQL> create tablespace corrupt_ts datafile 'c:\mydb\data\corrupt01.dbf' size 10m; 
Tablespace created. 
SQL>
SQL> create user test identified by test default tablespace corrupt_ts; 
User created. 
SQL>
SQL> grant connect, resource to test;
Grant succeeded. 
SQL>

Create and populate test table with dummy data as shown:

SQL> conn test/test
Connected.
SQL>
SQL> create table t1 as select rownum rno, object_name from all_objects
  2  where object_name like 'AQ%';

Table created.

SQL> select count(*) from t1;

  COUNT(*)
----------
        42

Insert a record into this table which we will be corrupting:

SQL> insert into t1 values (99, 'LET ME CORRUPT');

1 row created.

SQL> commit;

Commit complete.

Let us take RMAN full database backup before we corrupt the block.

RMAN> backup format 'c:\mydb\rman\fulldb_%U' database plus archivelog;

Starting backup at 01-FEB-08
current log archived
:
:
piece handle=C:\MYDB\RMAN\FULLDB_0LJ7LIML_1_1 tag=TAG20080201T234641 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:28
Finished backup at 02-FEB-08

Starting backup at 02-FEB-08
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=105 recid=105 stamp=645581560
channel ORA_DISK_1: starting piece 1 at 02-FEB-08
channel ORA_DISK_1: finished piece 1 at 02-FEB-08
piece handle=C:\MYDB\RMAN\FULLDB_0MJ7LINS_1_1 tag=TAG20080202T001242 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04
Finished backup at 02-FEB-08

RMAN>

Take the tablespace offline so that we can make changes to the datafile. There are many freeware and shareware Hex Editors available in the market. I am using UltraEdit editor to make changes in our datafile.
SQL> alter tablespace corrupt_ts offline;

Tablespace altered.

Open datafile “'c:\mydb\data\corrupt01.dbf” using UltraEdit (press “Ctrl+h” to toggle between Hex Mode). Search for our record entry “LET ME CORRUPT” in the file and changed “CORRUPT” to “NORRUPT” and save the file and close UltraEdit. I just changed “C” to “N”.

Bring back the tablespace to online mode.

SQL> alter tablespace corrupt_ts online;

Tablespace altered.

You may notice that Oracle doesn’t complain when it brings the datafile online because the file header wasn’t modified. Oracle will complain only when it tries to access the corrupt blocks. Let’s see what happens when we try to query table “T1”.

SQL> conn test/test
Connected.
SQL> select * from t1;

       RNO OBJECT_NAME
---------- ------------------------------
         1 AQ$_AGENT
         2 AQ$_DEQUEUE_HISTORY
          :
          :

          30 AQ$_JMS_NAMEARRAY
ERROR:
ORA-01578: ORACLE data block corrupted (file # 6, block # 13)
ORA-01110: data file 6: 'C:\MYDB\DATA\CORRUPT01.DBF'


30 rows selected.

Query returns 30 records and then complains of block corruption in file 6. Block numbered 13 is being reported as corrupt. Let us see what all blocks are corrupt in “corruption01.dbf” datafile by running dbv utility.

C:\ora10g\BIN>dbv file=C:\MYDB\data\corrupt01.dbf blocksize=8192

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Feb 4 00:00:11 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = C:\MYDB\data\corrupt01.dbf
Page 13 is marked corrupt
Corrupt block relative dba: 0x0180000d (file 6, block 13)
Bad check value found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0180000d
 last change scn: 0x0000.0039aa9f seq: 0x3 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xaa9f0603
 check value in block header: 0x85b0
 computed block checksum: 0x1b00

DBVERIFY - Verification complete

Total Pages Examined         : 1280
Total Pages Processed (Data) : 4
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 11
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1264
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 3779231 (0.3779231)
C:\ora10g\BIN>

This utility scans all the blocks in a given datafile and outputs the corrupt ones. In my case, I have one block marked as corrupt. Make a note of all the corrupt blocks as we need to recover them to previous state.

Start RMAN session and recover all the corrupt blocks. The beauty of RMAN is that it leaves the entire datafile online except the corrupted blocks and we need to recover only those corrupt blocks instead of entire datafile.

 
RMAN> blockrecover datafile 6 block 13;

Starting blockrecover at 04-FEB-08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=44 devtype=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1 tag=TAG20080201T234641
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:36

starting media recovery

archive log thread 1 sequence 105 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_02\O1_MF_1_10
5_3T72T48S_.ARC
archive log thread 1 sequence 106 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_03\O1_MF_1_10
6_3TD97K0Z_.ARC
media recovery complete, elapsed time: 00:00:35
Finished blockrecover at 04-FEB-08

RMAN>

RMAN reports success of block recovery command. Let us query the table again by logging in to SQL*Plus:

 
SQL> select * from t1;

       RNO OBJECT_NAME
---------- ------------------------------
         1 AQ$_AGENT
         2 AQ$_DEQUEUE_HISTORY
          :
          :

        41 AQ$_JMS_ARRAY_ERROR_INFO
        42 AQ$_JMS_ARRAY_ERRORS
        99 LET ME CORRUPT

43 rows selected.

SQL>

Wow, the query runs successfully and our original record is restored. Similar article on block recovery in UNIX environment can be found here.

Happy recovery (block)!!!