Hotsos Symposium 2013 continued

April 19, 2013

During my presentation at Hotsos 2013, an attendee questioned my statement that each filter/selection predicate contributes to the overall cardinality error. She believed that only the predicate with the maximum error mattered.

I believe the following paper supports my position:

http://www.btw-2013.de/proceedings/Taking%20the%20Edge%20off%20Cardinality%20Estimation%20Errors%20using%20Incremental%20Execution.pdf

I got the Hotsos 2013 “Speaker scoring and verbatim comments” information a couple of weeks ago.

I am happy that the presentation was very well received!!!

Thanks for your feedback. I appreciate all your comments. They would not only making  this presentation better, but also improve my skills as a speaker.

Here are some of the comments:

This was an intriguing   session, but I left feeling like there was still much more to be said on the   subject.

There is indeed much more to be said. The problem with digging deeper in the subject is that it gets quite academic and might be less interesting for the audience. The article I mentioned earlier in the post, as well as Ioannidis Y. And S. Christodoulakis (1991) paper cited in the reference can be a good starting point for further research.

is this accurate, did Maria review this info?

Good question :)

I am not sure who reviewed the paper before the presentation, but I believe that Maria Colgan was in audience. I too will be very interested in her feedback…

He spent a lot of time talking about Terradata, The package he showed seemed way to simplistic to be of any real practical use. He did show some creative ideas on solving some issues, however there wasn’t a real example to show how to do what he purposed.

The package is quite limited for most practical uses. It was intended to be for “demonstration purposes only”.

Fair point about examples though. I’ll try to publish some examples in this blog in the coming months. Stay tuned…


Hotsos Symposium 2013

March 14, 2013

It was a great pleasure presenting at Hotsos.

Thank you for attending my presentation – your comments and questions were quite insightful. I owe a bit more explanation to the lady who asked about the mechanism/formula for passing filter errors into the resulting set. I’ll get more information about that soon – stay tuned.

Title:

Working with Confidence: How Sure Is the Oracle CBO about its Cardinality Estimates, and Why Does It Matter?

Abstract:

The Oracle CBO is tasked to provide an execution plan for every syntactically correct statement. To tackle this very broad requirement, the CBO has to come up with selectivity coefficients for every possible type of predicate. In some cases, the selectivity coefficients are deeply rooted in math and common sense — we know for a fact that lookup by primary key would result in one or zero records.

In other cases, however, the CBO has to take a guess, sometimes even a wild guess, on how much a particular predicate would affect the cardinality of the resulting set. A simple LIKE clause with a leading wildcard would force the CBO to default the selectivity coefficient to around 5%, a guess that could be very far from reality.

To measure the level of guesswork employed by the CBO, the notion of “confidence” level of a cardinality estimate is introduced. Estimates of the confidence level the Oracle CBO should be getting for common SQL predicates are presented. The effects of joins on the confidence levels are reviewed as well as methods to reduce the level of guesswork the optimizer has to employ.

The practical implications of considering CBO’s confidence in its cardinality estimates are discussed in detail. Guidelines on how to vet new SQL and PL/SQL features, designs, and coding standards are given to aid the CBO in avoiding unnecessary guesses about predicate selectivity. A technique for analysis and resolution of performance issues based on dealing with constructs that force the CBO to make guesses is presented.

Files:

The presentation is here.

The white paper is here. It is quite comprehensive, with lots of details and references.

Use XPLAN_CONFIDENCE package entirely at your own risk:

The deployment instructions are here. The spec of the package is here, and the body is here.


Dependent/correlated sets in Oracle – definition, problems and solutions.

February 27, 2013

To better understand dependent/correlated sets, let’s take a brief look at dependent/correlated columns.

Oracle works under the assumption that the data in each column is independent. If an equality predicate on column COL1 leaves 10% of the records, and an equality predicate on column COL2 leaves 20% of the records, then by default, the Oracle CBO would assume that predicates on both COL1 and COL2 would leave 2 % (10%x20%) of the records. If the data is COL1 is correlated with the data in COL2 then equality predicates on both COL1 and COL2 would leave significantly more than 2% of the records. That difference in estimated cardinality could cause huge troubles when estimating consequent execution steps.

Oracle recognized the problem and came with a comprehensive solution – extended statistics in Oracle 11g.

Now about dependent/correlated sets – two sets are dependent/correlated if they contain the same or similar records. Oracle assumes that sets are independent. If a predicate like this “COL1 IN (select.. from .. SET1 ) “  leaves 10% of the records, and a similar predicate “COL1 IN (select.. from .. SET2 ) “  leaves 20% of the records, then Oracle assumed that both predicates “COL1 IN (select.. from .. SET1 ) “ AND “COL1 IN (select.. from .. SET2 ) “   would leave 2%(10%x20%) of the records. If SET1 is identical or very similar to SET2, then the “COL1 IN (select.. from .. SET1 ) “ AND “COL1 IN (select.. from .. SET2 ) “ would leave significantly more than 2% of the records. Needless to say, the difference in estimated cardinality could cause huge troubles when estimating consequent execution steps.

The big challenge is that there is nothing out of the box that can help us in the above scenario.

Tables TAB1 and TAB2 would provide us with the correlated sets (SET1,SET2)

create table tab1 as
with generator as (
select
  rownum id
 from dual
 connect by
  rownum <= 4000
)
select
  id col1, 
  mod(id,1024) col2 , 
  mod(id,2) col3
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
     rownum <= 1000000
)
;

create table tab2 as
with generator as (
select
  rownum id
from dual
connect by
  rownum <= 4000
)
select
  id col1, 
  mod(id,1024) + 134 col2 , 
  mod(id,2) col3
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
rownum <= 1000000
)
;

TAB3 would be the table we are going to apply the filters with those correlated sets

create table tab3 as
with generator as (
select
  rownum id
from dual
connect by
  rownum <= 4000
)
select
  id col1
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
rownum <= 1000000000
)

Gather stats:

exec dbms_stats.gather_table_stats(NULL,'TAB1');

exec dbms_stats.gather_table_stats(NULL,'TAB2');

exec dbms_stats.gather_table_stats(NULL,'TAB3');

The query we are interested in is

select
    t3.col1
from
    tab3 t3 ,
    tab1 t1 ,
    tab2 t2
where
    t3.col1 = t1.col1
and t1.col2 in (66,166,316,416,516,616)
and t3.col1 = t2.col1
and t2.col2 in (200,300,450,550,650,750)
and t1.col3 = t2.col3

Please note that the sets that come from TAB1 and TAB2 are identical, so this clause

t3.col1 = t1.col1 and t3.col1 = t2.col1

would eliminate significantly fewer records that Oracle CBO’s estimate.

Due to the described dependent/correlated sets behaviors, the CBO (DBMS_XPLAN.DISPLAY_CURSOR) estimates that only 17 records will be returned by the query, even though the query returns 5861 records.

---------------------------------------------------------
| Id | Operation         |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0  | SELECT STATEMENT  |     |     |      |8182 (100) |
|* 1 | HASH JOIN         |     |17   |510   |8182 (4)   |
|* 2 | TABLE ACCESS FULL |TAB2 |5859 |70308 |657 (4)    |
|* 3 | HASH JOIN         |     |5872 |103K  |7524 (4)   |
|* 4 | TABLE ACCESS FULL |TAB1 |5859 |70308 |653 (4)    |
| 5  | TABLE ACCESS FULL |TAB3 |16M  |91M   |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
—————————————————

1 - access("T3"."COL1"="T2"."COL1" AND "T1"."COL3"="T2"."COL3")
2 - filter(("T2"."COL2"=200 OR "T2"."COL2"=300 OR "T2"."COL2"=450 OR
"T2"."COL2"=550 OR "T2"."COL2"=650 OR "T2"."COL2"=750))
3 - access("T3"."COL1"="T1"."COL1")
4 - filter(("T1"."COL2"=66 OR "T1"."COL2"=166 OR "T1"."COL2"=316 OR
"T1"."COL2"=416 OR "T1"."COL2"=516 OR "T1"."COL2"=616))

Dynamic sampling, even at the max level, actually makes the estimate worse

exec dbms_stats.delete_table_stats(NULL,'TAB1');

exec dbms_stats.delete_table_stats(NULL,'TAB2');

exec dbms_stats.delete_table_stats(NULL,'TAB3');

alter session set optimizer_dynamic_sampling=10

---------------------------------------------------------
| Id | Operation         |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0  | SELECT STATEMENT  |     |     |      |8156 (100) |
|* 1 | HASH JOIN         |     |1    |91    |8156 (4)   |
|* 2 | TABLE ACCESS FULL |TAB2 |5861 |223K  |644 (3)    |
|* 3 | HASH JOIN         |     |5861 |297K  |7511 (4)   |
|* 4 | TABLE ACCESS FULL |TAB1 |5861 |223K  |640 (3)    |
| 5  | TABLE ACCESS FULL |TAB3 |16M  |198M  |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T3"."COL1"="T2"."COL1" AND "T1"."COL3"="T2"."COL3")
2 - filter(("T2"."COL2"=200 OR "T2"."COL2"=300 OR "T2"."COL2"=450 OR
"T2"."COL2"=550 OR "T2"."COL2"=650 OR "T2"."COL2"=750))
3 - access("T3"."COL1"="T1"."COL1")
4 - filter(("T1"."COL2"=66 OR "T1"."COL2"=166 OR "T1"."COL2"=316 OR
"T1"."COL2"=416 OR "T1"."COL2"=516 OR "T1"."COL2"=616))

Note
-----
- dynamic sampling used for this statement (level=10)

One way to resolve this is to force the execution order, so TAB3 is visited before TAB2 or TAB1.

Another solution is to create a virtual column identical to COL1

alter table tab3 add col1_corr generated always as (col1*1 ) not null ;

,gather stats

exec dbms_stats.gather_table_stats(NULL,'TAB1');

exec dbms_stats.gather_table_stats(NULL,'TAB2');

exec dbms_stats.gather_table_stats(NULL,'TAB3');

Then set the number of distinct records for that new virtual column to 1

exec dbms_stats.set_column_stats(NULL,'TAB3','COL1_CORR',distcnt=>1);

and use the new virtual column (COL1_CORR), instead of COL1,  for one of the predicates

select
t3.col1
from 
    tab3 t3 ,
    tab1 t1 ,
    tab2 t2
where
    t3.col1_corr = t1.col1
and t1.col2 in (66,166,316,416,516,616)
and t3.col1 = t2.col1
and t2.col2 in (200,300,450,550,650,750)
and t1.col3 = t2.col3

Now, the CBO expects 2936 records – much better than the original 17 records.

---------------------------------------------------------
| Id | Operation         |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0  | SELECT STATEMENT  |     |     |      |8182 (100) |
|* 1 | HASH JOIN         |     |2936 |103K  |8182 (4)   |
|* 2 | TABLE ACCESS FULL |TAB1 |5859 |70308 |653 (4)    |
|* 3 | HASH JOIN         |     |5872 |137K  |7528 (4)   |
|* 4 | TABLE ACCESS FULL |TAB2 |5859 |70308 |657 (4)    |
| 5  | TABLE ACCESS FULL |TAB3 |16M  |183M  |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("COL1"*1="T1"."COL1" AND "T1"."COL3"="T2"."COL3")
2 - filter(("T1"."COL2"=66 OR "T1"."COL2"=166 OR "T1"."COL2"=316 OR
"T1"."COL2"=416 OR "T1"."COL2"=516 OR "T1"."COL2"=616))
3 - access("T3"."COL1"="T2"."COL1")
4 - filter(("T2"."COL2"=200 OR "T2"."COL2"=300 OR "T2"."COL2"=450 OR
"T2"."COL2"=550 OR "T2"."COL2"=650 OR "T2"."COL2"=750))

Setting the number of distinct values for COL1_CORR to 1 forced the CBO to believe that the clause on COL1_CORR would not eliminate any records. That’s the only way (for me anyway) to tell the optimizer that the COL1_CORR clause would not further reduce the number of records.

Interestingly, I get the same result for any value of distcnt …


“ORA-20008 Number of extensions …” – what to do about this unnecessary error

October 28, 2012

There are times when Oracle puts restrictions to save us from our own bad practices. For instance, committing in triggers is forbidden so we do not make a mess, among other things.

And then there is “ORA-20008: Number of extensions in table ….. already reaches the upper limit (20)”. We can create thousands of tables, each of them with thousands of columns, but when we create column groups/extended stats, we face severe limitations – 20 or 10% of the number of columns. Why?!? It takes little disk space to accommodate those statistics. The incremental impact on the time to gather stats would be quite small as well.

In short, this restriction has no technologies or business sense, it is probably trivial to correct, but it is still there (as of 11.2.0.3.0).

Some would think that this restriction would almost never affect a “real” system. I disagree! A fact table in a data-warehouse environment could be queried in many different ways – and that’s the beauty of the Kimball approach. I have personally seen this situation in my systems a couple of times so far.

Then what can we do about it?

If the extended statics were generated automatically using the methods here

http://iiotzov.wordpress.com/2011/11/01/get-the-max-of-oracle-11gr2-right-from-the-start-create-relevant-extended-statistics-as-a-part-of-the-upgrade

http://blogs.oracle.com/optimizer/entry/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload

, then we can weed out the extended statistics that do not help the CBO a lot, and put in place only those extended stats that make a significant difference.

Since CBO can use column groups with more columns than are present in predicates (http://www.hotsos.com/sym12/sym_speakers_matuszyk.html) ,we can eliminate extended stats groups that are covered by another extended stats group.

Another way is to measure the dependent columns error (err)/correlation strength (http://iiotzov.wordpress.com/2011/07/05/getting-the-best-of-oracle-extended-statistics) and eliminate those column groups with smallest dependent columns error.

Here is an example:

create table test_tab as
with v1 as (select rownum n from dual connect by level <= 10000)
select
rownum  id , mod (rownum , 100) corr_gr1_col1, mod (rownum , 25) corr_gr1_col2 , mod (rownum ,
50) corr_gr1_col3 ,
200 +  mod (rownum , 11)  corr_gr2_col4 , 200 +  mod (rownum , 22)  corr_gr2_col5 , 200 + mod (
rownum , 33) corr_gr2_col6
from
v1, v1
where
rownum <= 1000000
;

Table created.

select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2)') from dual

--------------------------------------------------------------------------------

SYS_STU_4QATWX9PNVNKREP$YZIDOX

… 19 more CREATE_EXTENDED_STATS statements

select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2,corr_gr2_col4,corr_gr2_col5,corr_gr2_col6)') from dual ;
select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2,corr_gr2_col
*
ERROR at line 1:
ORA-20008: Number of extensions in table SCOTT.TEST_TAB already reaches the
upper limit (20
ORA-06512: at "SYS.DBMS_STATS", line 8415
ORA-06512: at "SYS.DBMS_STATS", line 32587

We can now score each column group and decide whnich ones will remain and which ones will be dropped.

select (count(distinct(corr_gr1_col1) ) * count(distinct(corr_gr1_col3)))/count(distinct(sys_op
_combined_hash(corr_gr1_col1,corr_gr1_col3))) from TEST_TAB ;

(COUNT(DISTINCT(CORR_GR1_COL1))*COUNT(DISTINCT(CORR_GR1_COL3)))/COUNT(DISTINCT(S

--------------------------------------------------------------------------------

50

… a few column group queries

select (count(distinct(corr_gr1_col1) ) * count(distinct(corr_gr2_col4)))/count(distinct(sys_op
_combined_hash(corr_gr1_col1,corr_gr2_col4))) from TEST_TAB ;
(COUNT(DISTINCT(CORR_GR1_COL1))*COUNT(DISTINCT(CORR_GR2_COL4)))/COUNT(DISTINCT(S
--------------------------------------------------------------------------------
1

The column group (corr_gr1_col1,corr_gr2_col4) does not add any value to the CBO, so we can drop it and create  (corr_gr1_col1,corr_gr1_col2,corr_gr2_col4,corr_gr2_col5,corr_gr2_col6) instead.


Oracle Enterprise Manager Oracle Enterprise Manager Cloud Control 12c is here. Let’s look under the hood!

May 18, 2012

       There are many great features in the new OEMCC 12c, and probably as many great presentations or blog posts about them.

       This post is not about a particular new feature, it is about the internal OEMCC 12c repository structures and how to make use of them. Most, if not all, of OEM 10g/11g repository views are still present and used by OEMCC12c . Also, all the ideas and techniques I presented earlier (presentation, white paper) are still conceptually valid.

        The way the structures are populated, however, can be slightly different. User Defined Metrics (UDMs) functionality is totally overhauled in OEM 12c. The new Metrics Extensions provide a flexible, structured and efficient mechanism for creating custom metrics. The new functionality now populates Metric_Name column in mgmt$metric_current with the Metric Extension name; prior versions simply inserted the hardcoded “SQLUDM” value.  Metric_Column column in mgmt$metric_current exhibits the same pattern – in OEM 12c one can populate it with a custom value, while in prior versions it always contained a pre-defined strings ( ‘StrValue’).

        The good news is that all reports and queries discussed in my presentation and white paper are still valid, except for the previously mentioned UDM/Metrics Extension change.  It took my just a couple of hours to move all custom OEM repository structures, including the advanced tablespace forecasting, to the new 12c OEM.

        This is what needs to be changed in the ForcedLoggingAlert UDM/ Metrics Extension to work with Metrics Extensions:

and   not exists (

      select 
               * 
         from
               mgmt$metric_current i
         where
               i.target_guid           = o.member_target_guid
         and   metric_name             =  ‘ME$ForcedLogging’
         and   column_label            = 'ForcedLogging'
         and   Metric_Column           = ‘ForcedLogging’
         and   collection_timestamp    > sysdate - 20/1440
      and   value                   = 'YES‘
             )

Where ‘ME$ForcedLogging’  is the name of the Metrics Extension and ‘ForcedLogging’ is the specified metric column.


Cardinality feedback for queries that use volatile tables with locked statistics.

April 27, 2012

As if dealing with volatile tables alone is not complicated enough, in Oracle 11gR2 we also have to consider how the new cardinality feedback feature (on by default) would affect us.

While there is a lot to be said about volatile tables (presentation, white paper)  , there are two mainstream approaches to dealing with them – dynamic sampling and locking table statistics.
In 11gR2, dynamic sampling turns off cardinality feedback, so there is no change in behavior from previous versions.
Locking table statistics, however, does not turn cardinality feedback off and that is a big concern. Oracle’s official blog entry acknowledges that carnality feedback is not useful for managing volatile tables: “Cardinality feedback is useful for queries where the data volume being processed is stable over time. For a query on volatile tables, the first execution statistics are not necessarily reliable. This feature is not meant to evolve plans over time as the data in the table changes”. The statement is a little vague though. The precise statement should be “Cardinality feedback should not be used for volatile tables with locked statistics”.

Here is why:
According to MOS Best Practices for Automatic Statistics Collection [ID 377152.1], we lock the table statistics when the table is at its maximum size. The idea is to use those stats regardless of what is actually in the table. Cardinality feedback essentially overwrites the estimates derived from the data dictionary with real-time statistics derived from the table at that time, which is exactly what we do not want to do when managing volatile tables.

This example will illustrate this point.

Let’s create volatile_tab and static_tab tables, with 50K each and an index on static_tab

create table volatile_tab
as
with generator as (
       select --+ materialize
                rownum id
       from dual
       connect by
                level < 10000
)
select
       mod(rownum,100) col1,
       rownum          col2
from
       generator v1,
       generator v2
where
       rownum < 50000;

Table created.

create table static_tab
as
with generator as (
        select --+ materialize
             rownum id
        from dual
        connect by
              level < 10000)
select
        mod(rownum,100) col1,
        rownum          col2
from
        generator v1,
        generator v2
where
        rownum < 50000 ;

Table created.

create index static_tab_i on static_tab (col2);

Index created.

Now, let’s gather tables statistics and lock the statistics of volatile_tab, the volatile table.

SQL>exec dbms_stats.gather_table_stats('TEST','static_tab');

PL/SQL procedure successfully completed.

SQL>exec dbms_stats.gather_table_stats('TEST','volatile_tab');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.lock_table_stats('TEST','volatile_tab');

PL/SQL procedure successfully completed.

Since volatile_tab is a volatile table with a max size of 50K, it could contain significantly fewer records at some point in time. That is a typical scenario for a volatile table.

SQL> truncate table volatile_tab ;

Table truncated.

SQL> insert into volatile_tab
with generator as (
        select --+ materialize
               rownum id
        from dual
        connect by
               level  < 10000 )
select
        mod(rownum,100) col1,
        rownum col2
from
        generator v1,
        generator v2
where
        rownum < 5 ;

commit;

Commit complete.

Now, let run this simple query:

SQL>
select
           /*+ gather_plan_statistics */ count(*)
from
        volatile_tab  v,
        static_tab    s
where
        v.col2 = s.col2
and     mod(v.col1,130) = 3 ;


COUNT(*)
----------
1

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6dx6gbc8kc030, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from volatile_tab v,
static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3


Plan hash value: 2579530871


-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 98 | | | |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 98 | | | |
|* 2 | HASH JOIN | | 1 | 500 | 1 |00:00:00.01 | 98 | 1517K| 1517K| 371K (0)|
|* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 500 | 1 |00:00:00.01 | 6 | | | |
| 4 | TABLE ACCESS FULL| STATIC_TAB | 1 | 50000 | 50000 |00:00:00.03 | 92 | | | |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


2 - access("V"."COL2"="S"."COL2")
3 - filter(MOD("V"."COL1",130)=3)

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


23 rows selected.

SQL>

We got exactly what we want when dealing with volatile table – the robust HASH join. The plan would perform OK when volatile_tab is small and would work great when volatile_tab is large.

Let run this statement one more time:

SQL> select
        /*+ gather_plan_statistics */ count(*)
from
        volatile_tab   v,
        static_tab     s
where
        v.col2 = s.col2
and     mod(v.col1,130) = 3 ;


COUNT(*)
----------
1

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6dx6gbc8kc030, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from volatile_tab v,
static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3

Plan hash value: 3273547765

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 8 |


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 8 |
| 2 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 1 | 1 |00:00:00.01 | 6 |
|* 4 | INDEX RANGE SCAN | STATIC_TAB_I | 1 | 1 | 1 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter(MOD("V"."COL1",130)=3)
4 - access("V"."COL2"="S"."COL2")

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note
-----
- cardinality feedback used for this statement

27 rows selected.

Now, the cardinality feedback option kicked in, and the plan uses nested loops (NL) – the less robust join method. Even though this plan is OK for the current volume of volatile_tab, it would be bad when volatile_tab grows.

Let’s see:

SQL> insert into volatile_tab
     with generator as (
select --+ materialize
     rownum id
from dual
connect by
     level < 10000 )
select
     mod(rownum,100) col1,
     rownum          col2
from
     generator v1,
     generator v2
where
     rownum < 50000 ;


commit ;

Commit complete.

SQL>
select
      /*+ gather_plan_statistics */ count(*)
from
       volatile_tab v,
       static_tab s
where
       v.col2 = s.col2
and    mod(v.col1,130) = 3 ;


COUNT(*)
----------
501


SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6dx6gbc8kc030, child number 2
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from volatile_tab v,
static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3


Plan hash value: 3273547765

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 643 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.02 | 643 |
| 2 | NESTED LOOPS | | 1 | 1 | 501 |00:00:00.02 | 643 |
|* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 1 | 501 |00:00:00.02 | 98 |
|* 4 | INDEX RANGE SCAN | STATIC_TAB_I | 501 | 1 | 501 |00:00:00.01 | 545 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


3 - filter(MOD("V"."COL1",130)=3)
4 - access("V"."COL2"="S"."COL2")

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note
-----
- cardinality feedback used for this statement

27 rows selected.

The nested loops (NL) plans, a clearly bad choice, is used even when volatile_tab has plenty of records.

Cardinality feedback for volatile tables with locked statistics is can be one, although not the only,  reason for plan instability. Oracle should modify  MOS Best Practices for Automatic Statistics Collection [ID 377152.1] to specifically request cardinality feedback to be disabled for volatile tables.


Hotsos 2012 – the benefits of a discussion with a really knowledgeable audience

March 12, 2012

               After a short conversation with Alex Gorbachev, I realized that maybe I should have added some implementation details about the “two-phase removal of data” approach for reducing table volatility.

               First, the approach does not reduce the volatility of the content of the tab view, it reduces the volatility of what the CBO sees or assumes about the tab view. But, again, that is all that matters…

               Second, please do not create a histogram on the deleted column of tab_internal table, even though the column would likely be heavily skewed. Doing so would introduce volatility because the histogram would represent the deleted/active distribution as of when the stats are gathered. What we need is the “standard” 50/50 split we get when the column does not have a histogram. Make sure that there are exactly two distinct values for the deleted column in tab_internal table.

               Third, an index (regular of function-based /covering only the “N” value/ ) on the deleted column in tab_internal would be quite useful.

               On a different note, someone from the back of the hall, I think that was Kerry Osborne, but I am not quite sure, mentioned that the JUST_STATS package should explicitly invalidate all cursors that use the table we are gathering stats on. I answered that I thought the package did invalidate cursors, but I was incorrect. Anyway, the current version does invalidate cursors (NO_INVALIDATE=> FALSE).  Thanks a lot for this observation.


Advanced Methods for Managing Statistics of Volatile Tables in Oracle (Hotsos 2012)

March 7, 2012

Thanks to all who attended my presentation at Hotsos 2012. You were a great audience with insightful questions and relevant comments.

Here is my presentation  along with the white paper.

The code for JUST_STATS package (spec and body) is available for free. Note that I cannot support the package and I do not accept any legal liability or responsibility for the content or the accuracy of the information so provided, or, for any loss or damage caused arising directly or indirectly in connection with reliance on the use of such information.

The readme and the file with examples could quite useful as well.

Hotsos is such a great conference… I had to fly back home on Tuesday morning, but I wish I could stay longer.


Presenting at Hotsos 2012

February 17, 2012

I am very excited to present at this great symposium. My presentation will start at 11 am on Monday (March 5, 2012) at Salon B.

This is my abstract:

               Adequate, up-to-date table and index statistics are of utmost importance for achieving optimal database performance. Unlike profiles, hints, and outlines, which can only help tune a narrow set of queries, database statistics assist the optimizer, a very sophisticated program, to deliver excellent results for every query every time.  

               Oracle’s default statistics gathering process collects statistics every night and over the weekend. The process works well for tables that follow a traditional growth pattern or have low volatility; however, it is inadequate for tables which experience fluctuations in size or data distribution on a frequent basis. The challenge is even greater when we are not allowed to change the application design and queries, an increasingly common situation thanks to the growing use of off-the-shelf solutions.

               After a definition of volume and distribution volatility, methods and consequences of reducing it are explored. Tradeoffs between statistics management simplicity and resource utilization are discussed.

               Following a review of the benefits and the pitfalls of using dynamic sampling and locking statistics to manage the statistics of volatile tables, a robust algorithm that delivers both plan stability and system flexibility is proposed. The algorithm works by allowing the statistics to change only under specific circumstances.

               Since handing volatile table statistics often involves directly invoking DBMS_STATS procedures, the missed opportunities due to the fact that Oracle’s DBMS_STATS package issues an implicit commit are discussed. Transaction consistency and easiness to recover after a failure, need for functional testing, and inability to gather statistics in triggers are covered. JUST_STATS, a novel custom PL/SQL package for collecting table and index statistics, is proposed and explained. The JUST_STATS package is functionally equivalent to a subset of DBMS_STATS package, except that it does not issue a commit. Examples illustrating the use of JUST_STATS in batch processing and off-the-shelf applications, including statistics gathering in table triggers, are shown.

Hope to see you there!


Get the max of Oracle 11gR2 right from the start – create relevant extended statistics as a part of the upgrade

November 1, 2011

Introduction

As I  posted earlier, dbms_stats.seed_col_usage is a very powerful new function (available in 11gR2) that enables us to get the column groups that are used in filter combinations, join conditions and “group by”s.

There are number of examples out there of how to use this new functionality to generate extended stats based on a running  representative load.

This entry is about how to get a set of column groups from the AWR of production 10g or 11gR1 databases, convert it into a SQL tuning tet,  and use it to generate extended stats for the upgraded 11gR2 database. By following this method, the DBA can get advantage of the power of extended statistics right from the the start, instead of reactively adding them when issues occur.

The more AWR data we have, the smaller chance of missing an important SQL, the better. Thus, the DBA should try to increase the AWR retention as much as possible before the upgrade.

Get a SQL tuning set from a representative workload

First, we have to create a SQL tuning set in the 10g database.

then

then

then specify that the SQL tuning set will contain AWR snapshots.

then specify SQL statements that are to be excluded (none in this case)

then

then

then let the DB job run for a couple of minutes and verify that completed successfully

After that, verify that the SQL tuning set was indeed created and start exporting it to a file

and then specify an Oracle Directory to hold the export (EXPDAT_TS_TEST.DMP)

then verify that the export is done

The next step is to import the SQL tuning set into 11gR2

The export file must be available on the 11gR2 server. Oracle Directory mapping to the directory that holds the file must be created.

then

and then verify that the import was OK

and finally check in the SQL tuning set tab

Extract column groups from the SQL tuning set

Now, we’ll start extracting the column group usage from this SQL tuning set.

First, we need to make sure that no previous work would interfere with this exercise:

select
     *
from
     sys.col_group_usage$

If the query returns nothing, then we are good to go.

If there are records, we should clear them before proceeding. dbms_stats.reset_col_usage function, introduced in  11.2.0.2, can do that job.

Now, we can start extracting column usage with the following statement

exec dbms_stats.seed_col_usage('TS_TEST','SYSTEM',3600);

where “TS_TEST” is the SQL tuning set name ,”SYSTEM” is DB user that owns “TS_TEST” and 3600 is the time limit (in seconds).

This operation could take a while – allocate at least couple of hours for it.

Review column groups 

 dbms_stats.seed_col_usage procedure will populate sys.col_group_usage$. We can also view the results in a user friendly way with dbms_stats.report_col_usage. The function works in SQL*Plus and requires some settings

set long 100000
set lines 120
set pages 0 

select
       dbms_stats.report_col_usage('SCOTT','EMP')
from
       dual ;

where “SCOTT” is the DB user and “EMP” is the table.

Create extended stats in the 11gR2 DB 

To create the extended stats we can use dbms_stats.create_extended_stats function. This is how to create extended stats on an individual table:

set long 100000
set lines 120
set pages 0 

select
     dbms_stats.create_extended_stats('SCOTT','EMP')
from
     dual ;

To create all extended stats for a DB user, we can utilize the following syntax

set long 100000
set lines 120
set pages 0

select
      dbms_stats.create_extended_stats('SCOTT',NULL)
from
       dual ;

Create a SQL script with the extended stats  

Since we do not want to repeat this procedure for every upgraded database, we have to extract a simple SQL script with the create statements for all extended stats, and execute it immediately after upgrade (or more specifically, immediately after the compatible init.ora parameter is updated). The script below can be used to extract the SQL statements

select
     'select dbms_stats.create_extended_stats(ownname => '''||owner||''',tabname => '''||table_name||''', extension => '''||extension||''') from dual ;'
from
      dba_stat_extensions
where
      owner not in ('SYS','SYSTEM');

Please note that just declaring extended stats is not enough. We need to actually  gather statistics for the tables/DB owners that have newly created extended stats to get benefit.

Conclusion

 Best practices are techniques that present little or no risks and consistently deliver in superior results.  The process described here allows the DBA to reap the benefits of a great new feature with very little risk, and should be utilized in every upgrade to Oracle 11gR2.


Follow

Get every new post delivered to your Inbox.