Thursday, September 17, 2009

Mailbox: “How to optimize this query when table is large?”

I received a mail seeking advice on tuning a query executing on a table with 4 million records. According to the information provided, this query is run very frequently. Here's the query:

select max(to_char(dt, 'YYYYMMDDHH24MISS')) 
from big_table;

Let's create a table and insert 4 Million records to reproduce the problem.

SQL> create table big_table(dt date);

Table created.

SQL> set timing on
SQL> begin
  2    for i in 1..365 loop
  3      insert into big_table
  4          select to_date('01-01-1980', 'dd-mm-yyyy') + level - 1
  5            from dual
  6            connect by level <= 11000;
  7    end loop;
  8    commit;
  9  end;
 10  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.86
SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'BIG_TABLE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.51
SQL> select num_rows from user_tables where table_name = 'BIG_TABLE';

  NUM_ROWS
----------
   4020746

Elapsed: 00:00:00.10
SQL>

Enable SQL trace at the session-level and execute the problem query:

SQL> alter session set tracefile_identifier='BIG_TABLE';

Session altered.

Elapsed: 00:00:00.01
SQL> alter session set sql_trace = true;

Session altered.

Elapsed: 00:00:00.00
SQL> select max(to_char(dt, 'YYYYMMDDHH24MISS')) from big_table;

MAX(TO_CHAR(DT
--------------
20100211000000

Elapsed: 00:00:03.37
SQL>

The TKPROF output shows the following information:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          0          0          0           0
Execute      1      0.00       0.04          0          0          0           0
Fetch        2      3.85       3.86          0       7265          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      3.85       3.93          0       7265          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=7265 pr=0 pw=0 time=3868185 us)
4015000   TABLE ACCESS FULL BIG_TABLE (cr=7265 pr=0 pw=0 time=79 us)

Information from the trace:

Query Elapsed Time: 3.93 Sec
Time spent on CPU: 3.85 Sec
Consistent Gets: 7265
Table Access:  Full Table Scan

So, most of the time the query was on the CPU. 7265 consistent gets were performed to arrive at the maximum value. We can do two things to speedup this query are: 1) Apply TO_CHAR function after fetching the maximum value. 2) Create an index on STAT_CHG_DT column

Let's trace the modified query (before creating the index) and hope things get better:

SQL> alter session set tracefile_identifier='BIG_TABLE_MAX';

Session altered.

Elapsed: 00:00:00.01
SQL> alter session set sql_trace = true;

Session altered.

Elapsed: 00:00:00.00
SQL> select to_char( max(dt), 'YYYYMMDDHH24MISS') from big_table;

TO_CHAR(MAX(DT
--------------
20100211000000

Elapsed: 00:00:00.84
SQL> alter session set sql_trace =false;

Session altered.

Elapsed: 00:00:00.00
SQL>

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.67       0.67          0       7265          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.68       0.69          0       7265          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=7265 pr=0 pw=0 time=671191 us)
4015000   TABLE ACCESS FULL BIG_TABLE (cr=7265 pr=0 pw=0 time=61 us)

Fantastic! The CPU usage has drastically come down and the query is no more a burden for the poor CPU. However, to arrive at the maximum value, Oracle still has to perform 7265 consistent gets. This seems to be still high.

How did it affect:

In the original query, TO_CHAR function was applied on all the 4 Million rows and then a maximum value was selected among them. But after interchanging positions of TO_CHAR and MAX functions, database had to do a little less job. MAX function fetched the maximum value from the 4 Million records table and later database applied TO_CHAR function against the maximum value. So, we got rid of 4 Million - 1 TO_CHAR transformations and this saved CPU time.

To further enhance the query, we will create an index on "DT" column and trace it again:

SQL> create index t_idx on big_table(dt);

Index created.

Elapsed: 00:00:44.39
SQL> alter session set tracefile_identifier='BIG_TABLE_IDX';

Session altered.

Elapsed: 00:00:00.17
SQL> alter session set sql_trace = true;

Session altered.

Elapsed: 00:00:00.03
SQL> select to_char( max(dt), 'YYYYMMDDHH24MISS') from big_table;

TO_CHAR(MAX(DT
--------------
20100211000000

Elapsed: 00:00:00.04
SQL> alter session set sql_trace =false;

Session altered.

Elapsed: 00:00:00.01

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.01          2          3          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.01          2          3          0           1


Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=3 pr=2 pw=0 time=14415 us)
      1   INDEX FULL SCAN (MIN/MAX) T_IDX (cr=3 pr=2 pw=0 time=14394 us)(object id 93231)

The index brings a huge huge difference. Everything seems to be under control. The consistent gets are reported as "3" compared to "7265", also CPU and Elapsed times are near zero. So, this query performs 2421 times better than the previous queries.

How did it affect:

After creating an index, instead of doing a Full Table Scan the optimizer went with "INDEX FULL SCAN (MIN/MAX)" operation.

Oracle will access the requested data by applying a shortcut to reach to the leaf index block. It will first hit the root block, follows the right most intermediate branch blocks, finally hitting the rightmost leaf block to get to the maximum value.

Tuesday, July 14, 2009

Who says Full Table Scans are Evil?

Recently, I had a developer complaining that a query which normally runs very fast is now taking huge amount of time to complete.

Following is the culprit query and its execution plan:

SELECT  DISTINCT SUBSTR(A.MF_NO,4) MF_NO,
 B.SRT_SEQ,
 SUBSTR(B.MAIN_CODE,1,6) MAIN_CODE
  FROM  DETAIL_TBL A,
        MASTER_TBL B    
  WHERE A.MRF_ID   = B.MRF_ID    
    AND A.MRF_CODE = B.MRF_CODE    
    AND TRIM(SUBSTR(A.MF_NO,4)) IN (SELECT LPAD(MF_NO,4,0) 
                                     FROM REMOTE_TBL@RMLINK 
                                    WHERE USER_LEVEL = 'RO'  
                                      AND STATUS_CODE <> 'D');

Execution Plan
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                       |       |       |  6646 (100)|          |        |      |
|   1 |  HASH UNIQUE                 |                       |     1 |    62 |  6646   (2)| 00:01:34 |        |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| MASTER_TBL            |     1 |    30 |     5   (0)| 00:00:01 |        |      |
|   3 |    NESTED LOOPS              |                       |     1 |    62 |  6645   (2)| 00:01:34 |        |      |
|*  4 |     HASH JOIN                |                       |     1 |    32 |  6640   (2)| 00:01:33 |        |      |
|   5 |      REMOTE                  | REMOTE_TBL            |     1 |    12 |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |      TABLE ACCESS FULL       | DETAIL_TBL            |  4159K|    79M|  6589   (2)| 00:01:33 |        |      |
|*  7 |     INDEX RANGE SCAN         | SYS_C0011692          |     3 |       |     2   (0)| 00:00:01 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access(TRIM(SUBSTR("A"."MF_NO",:SYS_B_3))=LPAD("MF_NO",:SYS_B_4,TO_CHAR(:SYS_B_5)))
   7 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")

Using AWR SQL report (awrsqrpt.sql) for the time when the query was performing better shows that a Full Table Scan was performed on the MASTER_TBL.

Execution Plan
----------------------------------------------------------
Plan hash value: 1931109479

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |  2919K|   172M|       | 42564   (2)| 00:09:56 |        |      |
|   1 |  HASH UNIQUE         |                       |  2919K|   172M|   424M| 42564   (2)| 00:09:56 |        |      |
|*  2 |   HASH JOIN          |                       |  2919K|   172M|    11M| 15147   (2)| 00:03:33 |        |      |
|   3 |    TABLE ACCESS FULL | MASTER_TBL            |   289K|  8494K|       |  5590   (2)| 00:01:19 |        |      |
|*  4 |    HASH JOIN         |                       |  1225K|    37M|       |  6833   (2)| 00:01:36 |        |      |
|   5 |     REMOTE           | REMOTE_TBL            |    29 |   348 |       |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |     TABLE ACCESS FULL| DETAIL_TBL            |  4191K|    79M|       |  6781   (2)| 00:01:35 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")
   4 - access(TRIM(SUBSTR("A"."MF_NO",4))=LPAD("MF_NO",4,'0'))

When we add a Full Table Scan (FTS) hint on the MASTER_TBL, the same query ran in less than 20 seconds. We can modify the query to include FTS hint as it is a home grown application.

SELECT  /*+ FULL(A) */ DISTINCT SUBSTR(A.MF_NO,4) MF_NO,
 B.SRT_SEQ,
 SUBSTR(B.MAIN_CODE,1,6) MAIN_CODE
  FROM  DETAIL_TBL A,
        MASTER_TBL B    
  WHERE A.MRF_ID   = B.MRF_ID    
    AND A.MRF_CODE = B.MRF_CODE    
    AND TRIM(SUBSTR(A.MF_NO,4)) IN (SELECT LPAD(MF_NO,4,0) 
                                     FROM REMOTE_TBL@RMLINK 
                                    WHERE USER_LEVEL = 'RO'  
                                      AND STATUS_CODE <> 'D');


Execution Plan
----------------------------------------------------------
Plan hash value: 1931109479

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |  2919K|   172M|       | 42564   (2)| 00:09:56 |        |      |
|   1 |  HASH UNIQUE         |                       |  2919K|   172M|   424M| 42564   (2)| 00:09:56 |        |      |
|*  2 |   HASH JOIN          |                       |  2919K|   172M|    11M| 15147   (2)| 00:03:33 |        |      |
|   3 |    TABLE ACCESS FULL | MASTER_TBL            |   289K|  8494K|       |  5590   (2)| 00:01:19 |        |      |
|*  4 |    HASH JOIN         |                       |  1225K|    37M|       |  6833   (2)| 00:01:36 |        |      |
|   5 |     REMOTE           | REMOTE_TBL            |    29 |   348 |       |    24   (0)| 00:00:01 |  REMDB | R->S |
|   6 |     TABLE ACCESS FULL| DETAIL_TBL            |  4191K|    79M|       |  6781   (2)| 00:01:35 |        |      |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."MRF_ID"="B"."MRF_ID" AND "A"."MRF_CODE"="B"."MRF_CODE")
   4 - access(TRIM(SUBSTR("A"."MF_NO",4))=LPAD("MF_NO",4,'0'))

This worked as an emergency painkiller before the routine lab tests are performed and an appropriate action is decided.

In the bad execution plan, to access the master table (MASTER_TBL), optimizer is doing an index range scan (“SYS_C0011692”) and the cardinality reported is “3”. This is where the real problem is hiding. Drilling into the tables, columns and indexes for both the tables revealed this information:

Table Name: MASTER_TBL
Primary Key: MRF_ID, MRF_CODE, ORDNO
Num Rows: 287,456

Table Name: CHILD_TBL
Primary Key: MRF_ID, MRF_CODE, ORDNO
Num Rows: 4,159,964

Oops… there are nearly 300,000 records in the MASTER_TBL, but the execution plan indicates a cardinality of merely 3. So, there is something’s serious wrong somewhere. First suspect is table and index statistics. As suspected, statistics on MASTER_TBL were botched up. These were the steps performed on the MASTER_TBL:

1) Data was cleaned 2) Initial data load was run 3) Statistics were collected 4) Remaining major chunk of the data loaded

After collecting fresh statistics on the MASTER_TBL things got back on track. Also, data belonging to columns “MRF_ID” & “MRF_CODE” is evenly spread across all the used blocks in MASTER_TBL & CHILD_TBL tables. Moreover, there is no limiting condition on these columns which could encourage optimizer to go with an index access path. So, it would make no sense for the optimizer to go with an index access path.

Sunday, May 17, 2009

Cardinality Analysis - A Review

After reading Michelle Deng’s “Tuning by Cardinality Feedback” paper on Jonathan's blog, I thought of re-creating the whole scenario to actually feel what is going on.

I did this test on Oracle 10g Release 2 (10.2.0.3) database with a block size of 16K. We first create the table and populate it in such a way that we match exactly with Michelle’s data.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Productio
NLSRTL Version 10.2.0.3.0 - Production

Elapsed: 00:00:00.03
SQL>

SQL> show parameter db_block_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
db_block_size                        integer     16384
SQL>

I performed couple of tests with different data distributions and here’s the first test case:

Test Case 1

Create and Populate Data

drop table prod_fctr_mv purge;

create table prod_fctr_mv(
 extract_mkt_ssk varchar2(5), 
 mkt_ssk     varchar2(20), 
 mkt_id      number(16), 
 hier_type    varchar2(20), 
 txt       varchar2(500));

insert into prod_fctr_mv 
 select '00006', '00006', 3, 'CORP', 
        rpad('*', 285, '*') 
   from dual 
  connect by level <= 42;


insert into prod_fctr_mv
  select
         lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
         lpad(mod(level, 73), 5, '0') mkt_ssk ,
         mod(level, 91) mkt_id ,
         decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP', 
                               3, 'WALL', 'CORP') hier_type ,
         rpad('*', 285, '*')
    from dual
   connect by level <= 4965 - 42 ;


update prod_fctr_mv set mkt_ssk = null 
 where mkt_ssk <> '00006' 
   and rownum <= 296;

update (select mkt_id 
          from prod_fctr_mv 
         where mkt_id <> 3) 
   set mkt_id = 3
 where rownum <= 72;

update (select extract_mkt_ssk  
          from prod_fctr_mv 
         where extract_mkt_ssk <> '00006'
           and mkt_id <> 3) 
   set extract_mkt_ssk = '00006' 
 where rownum <= 77;

select count(*) 
  from prod_fctr_mv
 where mkt_ssk  = '00006' ;

update (select mkt_ssk 
          from prod_fctr_mv 
         where mkt_ssk <> '00006' 
           and mkt_ssk is not null ) 
   set mkt_ssk = '00006' 
 where rownum <= 59;

update (select hier_type 
          from prod_fctr_mv 
         where hier_type = 'CORP' 
           and mkt_id <> 3
           and extract_mkt_ssk <> '00006')
   set hier_type = decode(mod(rownum, 2), 0, 'TALL', 'BALL') 
 where rownum <= 554;

commit;

Verify Data

Now that the data is populated, let’s see did we get on the right track.

SQL> exec dbms_stats.gather_table_stats(user, 'prod_fctr_mv');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.42
SQL> 
SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, 
  2         chain_cnt, avg_row_len
  3   from user_tables
  4  where table_name = 'PROD_FCTR_MV';

TABLE_NAME                       NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
------------------------------ ---------- ---------- ------------ ---------- ---------- -----------
PROD_FCTR_MV                         4965        108            0          0          0         305

Elapsed: 00:00:00.01
SQL> 
SQL> 
SQL> select column_name, num_distinct, density, num_buckets, 
  2         num_nulls, histogram
  3    from user_tab_columns
  4   where table_name = 'PROD_FCTR_MV';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS  NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------- ---------------
TXT                                       1          1           1          0 NONE
HIER_TYPE                                 5 .000100705           5          0 FREQUENCY
MKT_ID                                   91 .000100705          91          0 FREQUENCY
MKT_SSK                                  73 .000107089          73        296 FREQUENCY
EXTRACT_MKT_SSK                         100 .000100705         100          0 FREQUENCY

Elapsed: 00:00:00.07
SQL> 
SQL> 
SQL> select count(decode(mkt_id, 3, 1)) mkt_id,
  2         count(decode(extract_mkt_ssk, '00006', 1)) extract_mkt_ssk,
  3         count(decode(mkt_ssk, '00006', 1)) mkt_ssk,
  4         count(decode(hier_type, 'CORP', 1)) hier_type
  5    from prod_fctr_mv ;

    MKT_ID EXTRACT_MKT_SSK    MKT_SSK  HIER_TYPE
---------- --------------- ---------- ----------
       169             169        169       2441

Elapsed: 00:00:00.01
SQL> 
SQL> select count(*) 
  2    from prod_fctr_mv
  3   where mkt_id  = 3
  4     and extract_mkt_ssk = '00006'
  5     and mkt_ssk = '00006'
  6     and hier_type = 'CORP';

  COUNT(*)
----------
        42

Elapsed: 00:00:00.03
SQL> 
SQL>

Search an optimal value for OPTIMIZER_DYNAMIC_SAMPLING parameter

Yea, the total number of rows, the count for the key columns, count for the query in question, and the statistics, they all match to what was presented by Michelle.

The real fun starts here:

SQL> set autotrace traceonly exp
SQL> 
SQL> select count(*) 
  2    from prod_fctr_mv
  3   where mkt_id  = 3
  4     and extract_mkt_ssk = '00006'
  5     and mkt_ssk = '00006'
  6     and hier_type = 'CORP';
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    20 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |     1 |    20 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Delete the statistics and run the same query:

SQL> exec dbms_stats.delete_table_stats(user, 'prod_fctr_mv');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.39
SQL> 
SQL> select count(*) 
  2    from prod_fctr_mv
  3   where mkt_id  = 3
  4     and extract_mkt_ssk = '00006'
  5     and mkt_ssk = '00006'
  6     and hier_type = 'CORP';
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    72 |  2952 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL>

Optimizer has considered dynamic sampling but still the number of rows reported is now “72”, which is nearly twice than the actual number of rows. Let’s gradually increase the value of OPTIMIZER_DYNAMIC_SAMPLING initialization parameter and check the plan.

SQL> alter session set optimizer_dynamic_sampling=3;

Session altered.

Elapsed: 00:00:00.00
SQL> @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    72 |  2952 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL> alter session set optimizer_dynamic_sampling=4;

Session altered.

Elapsed: 00:00:00.00
SQL>  @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    72 |  2952 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL> alter session set optimizer_dynamic_sampling=5;

Session altered.

Elapsed: 00:00:00.00
SQL>   @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    72 |  2952 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL> alter session set optimizer_dynamic_sampling=6;

Session altered.

Elapsed: 00:00:00.01
SQL> @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    42 |  1722 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL>
SQL> set autotrace off
SQL>

Finally, when OPTIMIZER_DYNAMIC_SAMPLING parameter reached to a value of “6”, the plan is now reporting correct cardinality.

Test Case 2

Let’s run through the entire test again with a slight modification. Instead of inserting all the 42 rows together, we will insert them scattered in between.

Create and Populate Data

In this test case, we will insert one row of our interest after every 100 rows. So, here goes the script:

drop table prod_fctr_mv purge;

create table prod_fctr_mv(
 extract_mkt_ssk varchar2(5), 
 mkt_ssk     varchar2(20), 
 mkt_id      number(16), 
 hier_type    varchar2(20), 
 txt       varchar2(500));

begin
  for i in 1..42 loop
    insert into prod_fctr_mv 
     select '00006', '00006', 3, 'CORP', 
            rpad('*', 285, '*') 
       from dual 
      connect by level <= 1;

    insert into prod_fctr_mv
      select
            lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
            lpad(mod(level, 73), 5, '0') mkt_ssk ,
            mod(level, 91) mkt_id ,
            decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP', 
                                  3, 'WALL', 'CORP') hier_type ,
            rpad('*', 285, '*')
       from dual
      connect by level <= 100 ;
  end loop;

  commit;
end;
/

select count(*) 
  from prod_fctr_mv;

  insert into prod_fctr_mv
    select
          lpad(mod(level, 100), 5, '0') extract_mkt_ssk ,
          lpad(mod(level, 73), 5, '0') mkt_ssk ,
          mod(level, 91) mkt_id ,
          decode(mod(level, 5), 0, 'CORP', 1, 'CALL', 2, 'CORP', 
                                3, 'WALL', 'CORP') hier_type ,
          rpad('*', 285, '*')
     from dual
    connect by level <= 723 ;

update prod_fctr_mv set mkt_ssk = null 
 where mkt_ssk <> '00006' 
   and rownum <= 296;

update (select mkt_id 
          from prod_fctr_mv 
         where mkt_id <> 3
           and extract_mkt_ssk <> '00006') 
   set mkt_id = 3
 where rownum <= 35;

update (select extract_mkt_ssk  
          from prod_fctr_mv 
         where extract_mkt_ssk <> '00006'
           and mkt_id <> 3) 
   set extract_mkt_ssk = '00006' 
 where rownum <= 77;

update (select mkt_ssk 
          from prod_fctr_mv 
         where mkt_ssk <> '00006' 
           and mkt_ssk is not null ) 
   set mkt_ssk = '00006' 
 where rownum <= 33;

update (select hier_type 
          from prod_fctr_mv 
         where hier_type = 'CORP' 
           and mkt_id <> 3
           and extract_mkt_ssk <> '00006')
   set hier_type = decode(mod(rownum, 2), 0, 'TALL', 'BALL') 
 where rownum <= 554;

commit;

Verify Data

Let us run through the same queries to verify data population.

SQL> exec dbms_stats.gather_table_stats(user, 'prod_fctr_mv');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.09
SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, 
  2         chain_cnt, avg_row_len
  3   from user_tables
  4  where table_name = 'PROD_FCTR_MV';

TABLE_NAME                       NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
------------------------------ ---------- ---------- ------------ ---------- ---------- -----------
PROD_FCTR_MV                         4965        108            0          0          0         305

Elapsed: 00:00:00.01
SQL> 
SQL> 
SQL> select column_name, num_distinct, density, num_buckets, 
  2         num_nulls, histogram
  3    from user_tab_columns
  4   where table_name = 'PROD_FCTR_MV';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS  NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------- ---------------
TXT                                       1          1           1          0 NONE
HIER_TYPE                                 5 .000100705           5          0 FREQUENCY
MKT_ID                                   91 .000100705          91          0 FREQUENCY
MKT_SSK                                  73 .000107089          73        296 FREQUENCY
EXTRACT_MKT_SSK                         100 .000100705         100          0 FREQUENCY

Elapsed: 00:00:00.03
SQL> 
SQL> select count(decode(mkt_id, 3, 1)) mkt_id,
  2         count(decode(extract_mkt_ssk, '00006', 1)) extract_mkt_ssk,
  3         count(decode(mkt_ssk, '00006', 1)) mkt_ssk,
  4         count(decode(hier_type, 'CORP', 1)) hier_type
  5    from prod_fctr_mv ;

    MKT_ID EXTRACT_MKT_SSK    MKT_SSK  HIER_TYPE
---------- --------------- ---------- ----------
       169             169        169       2441

Elapsed: 00:00:00.01
SQL> 
SQL> select count(*) 
  2    from prod_fctr_mv
  3   where mkt_id  = 3
  4     and extract_mkt_ssk = '00006'
  5     and mkt_ssk = '00006'
  6     and hier_type = 'CORP';

  COUNT(*)
----------
        42

Elapsed: 00:00:00.01
SQL>

Search an optimal value for OPTIMIZER_DYNAMIC_SAMPLING parameter

We managed to get the number of records and other statistics same as that of the original document. Now, we will hunt for the optimal value of OPTIMIZER_DYNAMIC_SAMPLING parameter.

SQL> set autotrace traceonly exp
SQL> 
SQL> @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    20 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |     1 |    20 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

SQL> exec dbms_stats.delete_table_stats(user, 'prod_fctr_mv');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.25
SQL> 
SQL> @t
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2899207340

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    41 |    42   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |              |     1 |    41 |            |          |
|*  2 |   TABLE ACCESS FULL| PROD_FCTR_MV |    36 |  1476 |    42   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MKT_ID"=3 AND "EXTRACT_MKT_SSK"='00006' AND
              "MKT_SSK"='00006' AND "HIER_TYPE"='CORP')

Note
-----
   - dynamic sampling used for this statement

SQL>

We get a better cardinality reported by the Optimizer at the default level of dynamic sampling (OPTIMIZER_DYNAMIC_SAMPLING=2).

Conclusion

In the above test cases the Oracle optimizer started displaying correct number of rows when OPTIMIZER_DYNAMIC_SAMPLING was set to 6 with interested data being inserted together while the default setting for OPTIMIZER_DYNAMIC_SAMPLING worked when data was almost equally distributed.

Michelle’s data distribution might be such that the Optimizer was able report correct number of rows at OPTIMIZER_DYNAMIC_SAMPLING=4.

Lessons learned include:

1) Column correlation could really mislead Optimizer, and

2) Data distribution plays equally critical role.

Wednesday, April 15, 2009

ORA-07445 When Querying V$SQL_PLAN View

On one of our production databases, I was looking for a particular index usage as how often was it used and at what times it was used. To achieve this, I did a little mining on AWR repository tables and found out that it was used 11 times in the last one month (AWR retention period).

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

Elapsed: 00:00:00.01
SQL>


SQL> select sp.sql_id,
  2         sp.options,
  3         count(1) cnt
  4    from dba_hist_sql_plan sp,
  5         dba_hist_sqlstat ss
  6    where sp.sql_id = ss.sql_id
  7      and sp.object_owner  =  'PAYMAST' 
  8      and sp.operation like '%INDEX%'
  9      and sp.object_name = 'IDX_COMMIT_DET'
 10  group by sp.sql_id, 
 11           sp.options
 12  order by sp.sql_id, 
 13           sp.options;

SQL_ID        OPTIONS                               CNT
------------- ------------------------------ ----------
1b4xf87ntvfgn SAMPLE FAST FULL SCAN                   1
8qaj7c6wqcyvg SAMPLE FAST FULL SCAN                   7
9jcrv4kunuhg4 SAMPLE FAST FULL SCAN                   1
afzq942kp848t RANGE SCAN                              1
ar6c0r4s4kntp RANGE SCAN                              1

Elapsed: 00:00:00.54
SQL>          

However, querying V$SQL_PLAN view for the same index threw ORA-03113 error. Hmm, I connected to the database and ran the same query again and it resulted in the same error.

SQL> select * from V$SQL_PLAN where  operation  ='INDEX' and object_name ='IDX_COMMIT_DET';

ADDRESS          HASH_VALUE SQL_ID        PLAN_HASH_VALUE CHILD_ADDRESS    CHILD_NUMBER TIMESTAMP       
---------------- ---------- ------------- --------------- ---------------- ------------ ------------
FILTER_PREDICATES                                                                                       
----------------------------------------------------------------------------------------------------
REMARKS                                                                                                 
----------------------------------------------------------------------------------------------------
07000000D98392B0 2773750041 afzq942kp848t       380836487 07000000D986D420            0 15-APR2009 12:2
"VOU_TYPE_NAME"=:B1                                                                                     


ERROR:
ORA-03113: end-of-file on communication channel



ERROR:
ORA-03114: not connected to ORACLE


Elapsed: 00:00:03.82
SQL>

Oracle has bumped some information into the alert log and here it is:

Errors in file /dbdata1/oradba/admin/PAYPROD/udump/payprod_ora_6488244.trc:
ORA-07445: exception encountered: core dump [msqsub+0008] [SIGSEGV] [Address not mapped to object] [0x35B000000000070] [] []

and it created a 18MB trace file in the USER_DUMP_DEST which has all the gory details for the Oracle Support to work on. Below is an excerpt from the trace file:

*** 2009-04-15 12:27:46.646
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [msqsub+0008] [SIGSEGV] [Address not mapped to object] [0x35B000000000070] [] []
Current SQL statement for this session:
select * from V$SQL_PLAN where  operation  ='INDEX' and object_name ='IDX_COMMIT_DET'

The ORA-00600/ORA-07445 Lookup Tool in Oracle Metalink reports “A description for this ORA-07445 error is not yet available”.

I have to end up by lodging an SR and wait for Oracle Support’s response.

Tuesday, March 17, 2009

Strange Data Guard Issue

We have recently failed over one of our database to the DRC. At that time the production archivelog sequence was 80,000 plus. After staying there for couple of hours we recreated a physical standby database on the primary and switched back to original primary location. Everything went fine without any hoo-ha.

The failover and switchover took place on February 20, 2009. We have an automated job on all our standby databases to check whether the Standby is lagging behind the Primary, if so, then it automatically restores the missing archivelogs and applies them.

This job was working fine until day before yesterday (March 15, 2009) but started reporting errors thereafter. It failed with “RMAN-20242” error.

Starting restore at 15-MAR-09
released channel: ch12
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 03/15/2009 07:26:59
RMAN-06004: ORACLE error from recovery catalog database: 
RMAN-20242: specification does not match any archive log in the recovery catalog

When notified, I queried V$ARCHIVE_GAP and the output of this query took me to a surprise:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

SQL>
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           776          82468

SQL> 

Well, all of a sudden Oracle thinks that it is 81 thousand archive logs behind the primary. I tried to switching couple of archive logs on the production and the gap was still intact. I don't really know from where did Oracle took this information.

The automated job was trying to restore archives between 776 and 82468 and was kicked out by the recovery catalog database saying, there are no such archive logs (in fact there exist no such archivelogs).

I created a standby controlfile on the production database and replaced the controlfile on the standby database with the new one. The standby database seems to be happy with this version of controlfile and have stopped reporting any gap and the automated job is also leading a happy life.

SQL> select * from v$archive_gap;

no rows selected

SQL>

Saturday, February 07, 2009

Working with DBMS_CRYPTO Package

Oracle Database 10g provides means to encrypt and decrypt your sensitive data using the built-in API called DBMS_CRYPTO. Using DBMS_CRYPTO API one can encrypt sensitive information like SSN, Credit Card Numbers, Debit Card Numbers, etc stored in the database as these are not supposed to be stored in plain text for security reasons. The DBMS_CRYPTO toolkit is easy to use and is intended to replace DBMS_OBFUSCATION_TOOLKIT which was introduced in 8.1.6.

The concept behind any encryption toolkit is to have two things: 1) An algorithm, the actual logic, and 2) The encryption key

Using these combination's we can safely encrypt our data.

In this blog entry, let us see how DBMS_CRYPTO can be used to encrypt data and also consider its performance implications.

To start with, lets create a test table and insert couple of records in plain text:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> create table emp(empid number, ename varchar2(100), ssn varchar2(80));

Table created.

SQL> insert into emp values (1, 'SMITH',        123456701);

1 row created.

SQL> insert into emp values (2, 'ALLEN',        123456702);

1 row created.

SQL> insert into emp values (3, 'WARD',         123456703);

1 row created.

SQL> commit;
SQL> set line 10000
SQL> column ename format a30
SQL> column ssn format a40
SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ -------------
         1 SMITH                          123456701
         2 ALLEN                          123456702
         3 WARD                           123456703

SQL>

The SSN has been inserted into the table in plain text and can be read by anybody having SELECT privilege on EMP table. Now, lets create a package using DBMS_CRYPTO API to handle encryption and decryption.

SQL> Create or replace package cryptit is
  2    Function encrypt_data( p_data IN VARCHAR2 ) Return RAW DETERMINISTIC;
  3    Function decrypt_data( p_data IN RAW ) Return VARCHAR2 DETERMINISTIC;
  4  End cryptit;
  5  /

Package created.

SQL>
SQL> Create or replace package body cryptit is
  2    V_Key       RAW(128) := UTL_RAW.cast_to_raw('testkey1');          -- Key
  3
  4    Function encrypt_data( p_data IN VARCHAR2 ) Return RAW DETERMINISTIC
  5    IS
  6      l_data RAW(2048) := utl_raw.cast_to_raw(p_data);
  7      l_encrypted RAW(2048);
  8    BEGIN
  9      NULL;
 10      l_encrypted := dbms_crypto.encrypt                        -- Algorithm
 11                     ( src => l_data,
 12                       typ => DBMS_CRYPTO.DES_CBC_PKCS5,
 13                       key => V_KEY );
 14
 15      Return l_encrypted;
 16    END encrypt_data;
 17
 18    Function decrypt_data( p_data IN RAW ) Return VARCHAR2 DETERMINISTIC
 19    IS
 20      l_decrypted RAW(2048);
 21    BEGIN
 22      l_decrypted := dbms_crypto.decrypt                              -- Algorithm
 23                      ( src => p_data,
 24                        typ => DBMS_CRYPTO.DES_CBC_PKCS5,
 25                        key => V_KEY );
 26
 27      Return utl_raw.cast_to_varchar2(l_decrypted);
 28    END decrypt_data;
 29  End cryptit;
 30  /

Package body created.

SQL>

The package body consists of functions to encrypt and decrypt data along with a key. As these functions will always output the same value for a specific input, we can make them Deterministic functions. (A Deterministic Function always returns the same result any time they are called with a specific set of input values.)

Let's now make use of these functions to encrypt our sensitive data.

SQL> update emp set ssn = cryptit.encrypt_data(ssn);

3 rows updated.

SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112

SQL>
SQL>

When inserting new records we simply make use of "cryptit.encrypt_data" function in the INSERT statement.

SQL> insert into emp values( 4, 'MOMEN', cryptit.encrypt_data(123456704));

1 row created.

SQL> commit;

Commit complete.

SQL> select * from emp;

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187

SQL>

SQL> column decrypted_ssn format a15
SQL> select empid, ename, cryptit.decrypt_data(ssn) decrypted_ssn, ssn from emp;

     EMPID ENAME                          DECRYPTED_SSN   SSN
---------- ------------------------------ --------------- ----------------------------------------
         1 SMITH                          123456701       5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN                          123456702       5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           123456703       5F3168C22E54060D8166D3757932A112
         4 MOMEN                          123456704       5F3168C22E54060D2CFF7E7A35B14187

SQL>

All is fine as far as encryption is concerned, but lets see what happens when SSN is used in the WHERE clause of queries. We will create an index on SSN and run a query against it.

SQL> create index t_ssn on emp(ssn);

Index created.

SQL> set autotrace on exp
SQL> select * from emp where ssn = cryptit.encrypt_data('123456701');

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         1 SMITH                          5F3168C22E54060DE7D97B31F7E38BB6


Execution Plan
----------------------------------------------------------
Plan hash value: 2894032564

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   107 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP   |     1 |   107 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_SSN |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SSN"=RAWTOHEX("CRYPTIT"."ENCRYPT_DATA"('123456701')))

Note
-----
   - dynamic sampling used for this statement

SQL>

Well, the index "T_SSN" has been selected by the optimizer to be cost efficient. When a range of values is to be scanned, optimizer falls flat on its face as it has no idea and picks up FTS.

SQL> select * from emp where cryptit.decrypt_data(ssn) between '123456702' and '123456704';

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187


Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   107 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |   107 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))>='123456702'
              AND "CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))<='123456704')

Note
-----
   - dynamic sampling used for this statement

We can also create Function Based Indexes on encrypted columns such that data is accessed faster with a better execution plan. This is the reason I have marked both ENCRYPT_DATA and DECRYPT_DATA as DETERMINISTIC functions.

SQL> create index f_ssn_idx on emp(cryptit.decrypt_data(ssn));

Index created.

SQL> select * from emp where cryptit.decrypt_data(ssn) between '123456702' and '123456704';

     EMPID ENAME                          SSN
---------- ------------------------------ ----------------------------------------
         2 ALLEN                          5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD                           5F3168C22E54060D8166D3757932A112
         4 MOMEN                          5F3168C22E54060D2CFF7E7A35B14187


Execution Plan
----------------------------------------------------------
Plan hash value: 9274740

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |     1 |   107 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP       |     1 |   107 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | F_SSN_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))>='123456702' AND
              "CRYPTIT"."DECRYPT_DATA"(HEXTORAW("SSN"))<='123456704')

Note
-----
   - dynamic sampling used for this statement

SQL>

The optimizer has rightly picked up the Function based index to access requested data quickly.

DBMS_CRYPTO is a great way to encrypt sensitive data and we can also create indexes on the encrypted columns to speedup our queries. Lastly, remember to wrap the CRYPTIT package body so that the key is not exposed.

DBMS_CRYPTO can also be used to encrypted data recursively. Let us try to encrypt the already encrypted SSN from the above example.

SQL> select * from emp;

     EMPID ENAME    SSN
---------- -------- --------------------------------------------------------------------------------
         1 SMITH    5F3168C22E54060DE7D97B31F7E38BB6
         2 ALLEN    5F3168C22E54060D0C2527FBBD3DCD6C
         3 WARD     5F3168C22E54060D8166D3757932A112
         4 MOMEN    5F3168C22E54060D2CFF7E7A35B14187

SQL> update emp set ssn = cryptit.encrypt_data(ssn);

4 rows updated.

SQL> select * from emp;

     EMPID ENAME    SSN
---------- -------- --------------------------------------------------------------------------------
         1 SMITH    455E357D80B1C81F91D52088DA0FE03A0C133D129CFC3879EE7A8DE993FD459DB5C9E1EBEF37062C
         2 ALLEN    455E357D80B1C81F91D52088DA0FE03A98FC410A71CE64D57935691692D433B770C5F8AF07C9113F
         3 WARD     455E357D80B1C81F91D52088DA0FE03A9D32D14FE64F41DD66BE7FEF4C31384F032C16826EAA4830
         4 MOMEN    455E357D80B1C81F91D52088DA0FE03A04D5C0AD79E689EABD46F8EC335308E32D6F7E94167F6EDF

SQL> commit;

Commit complete.

SQL> select empid, ename, cryptit.decrypt_data(cryptit.decrypt_data(ssn)) decrypted_ssn, ssn from emp;

     EMPID ENAME    DECRYPTED_SSN   SSN
---------- -------- --------------- --------------------------------------------------------------------------------
         1 SMITH    123456701       455E357D80B1C81F91D52088DA0FE03A0C133D129CFC3879EE7A8DE993FD459DB5C9E1EBEF37062C
         2 ALLEN    123456702       455E357D80B1C81F91D52088DA0FE03A98FC410A71CE64D57935691692D433B770C5F8AF07C9113F
         3 WARD     123456703       455E357D80B1C81F91D52088DA0FE03A9D32D14FE64F41DD66BE7FEF4C31384F032C16826EAA4830
         4 MOMEN    123456704       455E357D80B1C81F91D52088DA0FE03A04D5C0AD79E689EABD46F8EC335308E32D6F7E94167F6EDF

SQL>
For information on DBMS_CRYPTO check Oracle documentation.

Sunday, February 01, 2009

Performance Impact of NOT NULL Constraint in PL/SQL

A NOT NULL constraint in PL/SQL requires that a variable contains value, if no value is specified for that variable then an error will occur. But, using this constraint in PL/SQL, we have to pay a small performance price.

Let us create two procedures and perform a small test, the first procedure (TEST_NOTNULL) will be contain "NOT NULL" constraint while the second procedure (TEST_NULL) will be without it.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>
SQL> create or replace procedure test_notnull(p_iterations in number) is
  2    i  number;
  3    x  number NOT NULL := 0;
  4    t1 number;
  5  begin
  6    t1 := dbms_utility.get_time;
  7    for i in 1..p_iterations loop
  8      x := x + i;
  9    end loop;
 10    dbms_output.put_line( 'Time taken : ' || to_char(dbms_utility.get_time - t1));
 11  end;
 12  /

Procedure created.

SQL>
SQL>
SQL> create or replace procedure test_null(p_iterations in number) is
  2    i  number;
  3    x  number := 0;
  4    t1 number;
  5  begin
  6    t1 := dbms_utility.get_time;
  7    for i in 1..p_iterations loop
  8      x := x + i;
  9      if x is null then
 10        dbms_output.put_line('i know this will never happen, but still ... ');
 11      end if;
 12    end loop;
 13    dbms_output.put_line( 'Time taken : ' || to_char(dbms_utility.get_time - t1));
 14  end;
 15  /

Procedure created.

SQL>
SQL>
SQL>
SQL> set serveroutput on
SQL> exec perf_notnull(1000000);
Time taken : 20

PL/SQL procedure successfully completed.

SQL> exec perf_null(1000000);
Time taken : 17

PL/SQL procedure successfully completed.

SQL>

Executing these procedures, we notice that TEST_NOTNULL procedure consumes a little more than its counterpart TEST_NULL. Yes, we are looping heavily as the point is to compare performance of NOT NULL contraint in PL/SQL.

The reason the NOT NULL Constraint incurs performance cost is: As x is constriant by NOT NULL, the value of the expression "x + i" is first assigned to a temporary variable, this temporary variable is validated against nullity. If the variable is not null, its value is assigned to "x", otherwise an exception is raised. So, this is where the extra time is consumed when one applies NOT NULL constraint in PL/SQL.

An alternative approach is to manually test for nullity and raise an exception as we did in TEST_NULL procedure.

Wednesday, January 21, 2009

"ORA-01403: no data found" on Oracle University's Website

Hi,

Today while looking for information on Oracle Certified Master (OCM), I stumbled upon "ORA-01403" error on OU's website. Here is how I ended up with this error.

This page lists availability of seats in US Cities, when I tried to change my country to "Saudi Arabia" (from the drop-down list on the upper-right corner of the page), it gave me "ORA-01403" error.

It's time now to add the EXCEPTION section.

Regards