Oracle 12c Adaptive Execution Plans – Do Collected Stats Still Matter?

July 31, 2013

Adaptive execution plans is one of the most exciting new features in Oracle 12c. Now, in Oracle 12c, the optimizer can pick the join method based on the actual amount of data processed, rather than the estimates it had at compile time.  Does that mean that the estimated cardinalities/set sizes and the table/index statistics they are based upon no longer matter?

Not so! Here is why:

Even though Oracle uses actual information from the STATISTICS COLLECTOR step, it also uses lots of estimated information to decide the join strategy. I am not aware of the specific formula used to select the join type, but I think it is reasonable to assume that Oracle uses the estimated cost of a Nested Loop and the estimated cost of a Hash Join for that formula. Those estimated costs are based on collected data dictionary statistics (DBA_TABLES, DBA_INDEXES,..).

Here is an example:

Table TEST1 has 20 million records. The data dictionary statistics (DBA_TABLES, etc ) for TEST1 represent the data in the table.

Table TEST2 has 2 million records. The data dictionary statistics (DBA_TABLES, etc ) for TEST2 represent the data in the table.

This sample query

SELECT sum(length(TEST1.STR))

FROM TEST1 ,

     TEST2

WHERE

     TEST1.ID = TEST2.ID

generates an adaptive plan, whoch picks Hash Join, as expected.

Let’s see at how much data we need to have in TEST2 in order to switch to from Nested Loops to Hash Join.

To do that, we need to truncate TEST2 without re-gathering stats. Also, we need to create a loop that inserts data in TEST2 (without stats gathering) and saves the execution plan.

For my case, using this setup, we can see that the switch from NL to HJ happens when TEST2 has approx. 11500 cords.

The switch point depends on the TEST1 stats, not the actual data in TEST1.

We can double the data in TEST1 and we’ll get the same result as long as the stats are not re-gathered.

Conversly, we can delete 70% of the data in TEST1 and we’ll get the same result as long as the stats are not re-gathered.

In short, the cardinality estimates and the stats they are based upon are still very important.

If the actual cardinality of the second set in a join is higher than the estimated cardinality , then for some values Oracle would be using HJ, even though NL would produce better results.

Conversely, if the actual cardinality of the second set in a join is lower than the estimated cardinality, then for some values Oracle would be using NL, even though HJ would produce better results.

P.S.

I was not able to get the Adaptive Execution Plan to work properly from a PL/SQL procedure. I had to write a shell script…

Advertisements