In 11gR2 Oracle introduced DBMS_STATS.SEED_COL_USAGE (http://download.oracle.com/docs/cd/E18283_01/appdev.112/e16760/d_stats.htm#i997460) procedure to show us which column groups are used by a workload, so we can add extended stats on those column groups.
Oracle added in 22.214.171.124.0 another two useful DBMS_STATS subroutines ( http://www.morganslibrary.com/reference/pkgs/dbms_stats.html ):
REPORT_COL_USAGE function – reports column usage in friendly format, need to set serverout (SQLPlus)
RESET_COL_USAGE procedure – resets column usage statistics to unused
The whole process is nicely explained in
In short, the above techniques and procedures answer the question which column groups are used by a workload.
On the other hand, the procedure I developed tries to find out which column groups would benefit from creating extended statistics. It is an effort to separate the good from the useless extended stats.
Why would we care which column groups will help the optimizer when we know all columns groups that are used?
We should create extended stats on all column groups that are used by the workload?!?
That would be a good start, but there are some things to consider.
First, maintaining extended stats consumes resources and we should not expend resources for things or processes that bring no benefit. In short, we should get rid of the “useless” extended stats.
Second, there is limit of 20 extended stats for a table. Functional indexes and virtual column count towards that limit, so it is possible, particularly for fact tables in DW system, to reach the limit. If that is to happen, we would have to pick “the best” extended stats.
Knowing which extended stats would benefit the optimizer the most can help up to be truly proactive. We can create extended stats not only for column groups that are used by an existing workload, but also for column groups that are not used by a query yet.
I will be enhancing my procedure to take advantage of this great new Oracle functionality…