Importance of Data Clustering when Deleting in Batches

July 27, 2015

The inspiration for this post comes from  the OTN discussion “Bulk Deletes from a Large Table”  where I volunteered the idea that we can improve the performance by taking into the account the clustering of the data to be deleted. Since my statement was rather general, I decided to create this post to fill in all the details.

First, we start with the table definition:

CREATE TABLE test
(
dt DATE ,
st NUMBER ,
other_num NUMBER ,
other_str VARCHAR(100)
)

The the first column of the table is a DATE, the second one status – a number between 0 and 100.

We want to populate the table with test data with special data clustering characteristics. We want to simulate the data distribution (clustering) we would get if users insert data in chronological order – i.e. the older data (the smallest dt value) gets inserted first, then slightly newer data, and so forth.
The following code fragment will not only fill the table with data, but it will also get us the desired data clustering.

BEGIN
  FOR i IN 1..200
  LOOP
    INSERT INTO test
  WITH v1 AS
    (SELECT rownum n FROM dual CONNECT BY level <= 10000
    )
  SELECT trunc(sysdate - i ),
    mod(rownum , 100) ,
    rownum ,
    'BLAHBLAH123'
  FROM v1,
    v1
  WHERE rownum <= 5000;
  COMMIT;
END LOOP;
END;
/

The data in the TEST table is clustered by DT. Records with the same DT are likely to be in the same block. That cannot be said for records with the same ST. Those records are scattered all over the table.

Now, let’s create indexes that would support equally well each of the purge methods and gather stats:

CREATE INDEX idx1 ON test
  (dt , st
  )
CREATE INDEX idx2 ON test
  (
    st ,
    dt
  )
  EXEC dbms_stats.gather_table_stats
  (
    '??????',
    'TEST'
  )

To better simulate memory pressure, let create a procedure that would flush the shared pool.
As SYS:

CREATE
PROCEDURE flush_bc
AS
BEGIN
  EXECUTE immediate 'alter system flush buffer_cache';
END;
/
GRANT EXECUTE ON flush_bc TO ??????;

The first purge technique accesses the data via DT. Since the data is clustered on DT, it is expected this technique to be faster.

BEGIN
  FOR i IN 100..200
  LOOP
    DELETE TEST WHERE DT BETWEEN TRUNC(SYSDATE - i ) AND TRUNC(SYSDATE - i +1 ) ;
    COMMIT;
    sys.flush_bc ;
  END LOOP;
END;
/ 

It takes 20263 physical reads.

The second one (please rebuild the TEST table before retrying) accesses the data via ST. The data for a ST is spread across many blocks, so this technique is expected to be slower.

BEGIN
  FOR i IN 0..101
  LOOP
    DELETE TEST WHERE ST = i AND DT < TRUNC(SYSDATE - 98 ) ;
    COMMIT;
    sys.flush_bc;
  END LOOP;
END;
/

took 239100 physical reads – more than 10 times the first one.

This test clearly shows that the first technique is better than the second one. The memory pressure in the test scenario is significant, so it is likely that the difference between the techniques would not be as great is most real world settings.


Detecting Connection Imbalance in Oracle RAC

May 8, 2015

For Oracle RAC configurations that rely on spreading the load equally among instances, ensuring that inbound connections are balanced is quite important.

Since I was not able to find a suitable metric in the Oracle Enterprise Manager, I had to create one myself.

As I started thinking about it, some fundamental questions started popping up. How much difference in connection count should trigger an alert? Should I measure the percentage difference, the actual difference (number of sessions) , or something else?

Resisting the temptation to start coding convoluted logic, I reviewed what applied statistics has to offer and this is what I found – the Chi Square test !

Here is an example for homogeneity testing and here is one for independence testing.

The way I understand it, you can use that test to see if the connections are independently (uniformly) distributed across DB instances, or if there are some DB instances tend to get more or fewer connections than “expected”.

Another great thing about Chi Square test is that it is already implemented in the Oracle RDBMS.

The build-in Oracle function (STATS_CROSSTAB) can give us the value of Chi-squared (CHISQ_OBS) , the degree of freedom (CHISQ_DF), and the statistical significance (CHISQ_SIG).  What we are interested in is the statistical significance. A number less that 0.05 indicates that the data is likely not distributed uniformly.

Here is the query that can detect if a DB user prefers/avoids a DB instance:
—————————————————————————————–

SELECT COUNT(*) cnt
FROM
    (SELECT STATS_CROSSTAB(inst_id, username, 'CHISQ_SIG') p_value
    FROM gv$session
    )
WHERE p_value < 0.05

—————————————————————————————–

Detecting connection imbalance at client machine level is bit more tricky because each instance received a few connections from the server it is on.
That can be easily accounted for my excluding the servers that run the DBs:
—————————————————————————————–

SELECT COUNT (*) cnt
FROM
    (SELECT STATS_CROSSTAB(inst_id, machine, 'CHISQ_SIG') p_value
     FROM gv$session
     WHERE machine NOT LIKE '%dbp%.newsamerica.com'
     )
WHERE p_value < 0.05

—————————————————————————————–

These monitoring queries work without modification for any size RAC cluster. Adding or removing nodes is handled without issues, apart from a temporary imbalance that may come with adding nodes.


Confidence of Cardinality Estimates Optimization Techniques – When to Use?

March 31, 2015

Presenting on professional conferences frequently brings out important points that were not highlighted well enough.

RMOUG 2015 was no different.

After I presented my techniques for performance tuning by accounting for the confidence of cardinality estimates (slides 35-46 ), an attendee asked why my way of optimization was better than well-established optimization methods, such as tuning by cardinality feedback.

Well, that was a good question!

The short answer is that the methods I presented are better when Oracle gets low confidence cardinality estimates, i.e. it is forced to guess, because the selectivity varies greatly across executions and Oracle has no way to account for that.

Since the matter got s bit abstract, let’s go through an example:

Let’s have a column Name, that would contain the names of people. Let see what our options for dealing with predicate, such as

 Name like ‘%<Specific Name>%’

are?
This predicate can be very selective, if the name is something like… IOTZOV. That is,  predicate

 name like ‘%IOTZOV%’

would return very few records.

The very same predicate can be not that selective, if the name is something like SMITH. That is,  predicate

name like ‘%SMITH%’

could return a few records.

If there is no way for Oracle to figure out that one name (IOTZOV) is much more selective than another (SMITH) then the techniques I proposed for accounting for  the confidence of cardinality estimates are probably the best choice.

If there were a way for Oracle to figure out that one name (IOTZOV) is much more selective than another (SMITH), then Oracle would have probably gotten a good plan anyway.

If all names (IOTZOV, SMITH) have similar selectivity, but Oracle cannot figure it out for whatever reason, then we can use other techniques to feed Oracle the correct info. In this case, the other optimization techniques can lead to faster execution plans than the confidence of cardinality techniques I proposed.


RMOUG 2015

December 31, 2014

I am very excited to be selected to present at RMOUG 2015.

I got two presentations – “Managing Statistics of Volatile Tables in Oracle” on Wednesday and “Working with Confidence: How Sure Is the Oracle CBO about Its Cardinality Estimates, and Why Does It Matter?”  on Thursday. The presentations are updated with new 12c stuff, so they could be useful even if you have seen them before.

I’ll be there from Wednesday noon to Thursday afternoon. Hope to see you!

 

Update (Jan. 14 2015):

I just uploaded my presentations and white papers on RMOUG 2015 web site.

Volatile Tables Presentation:

Here are the latest package spec and package body  and the install instructions for JUST_STATS. Here is the code for the example.

 

Working with Confidence Presentation:

Here are the latest package spec and package body for XPLAN_CONFIDENCE. Here are the install instructions.


What to Do about Tools Hoarding Parallel Slaves in Oracle

November 26, 2014

Most GUI tools use cursors with relatively small fetch size (around 50) to retrieve data from Oracle. They open the cursor, fetch some data, show it and then wait on user input. All resources related to the connection and the open session are held while the tool waits on the user. While those resources are usually trivial for serial SELECT statements, they can be significant for parallel SELECT statements.

Each parallel statement gets assigned a number of parallel slaves and those slaves are not freed until their respective cursor is closed, regardless of the amount of work the slaves do. Since there are limited number of parallel slaves available in an instance (PARALLEL_MAX_SERVERS init.ora parameter) the hoarding of parallel slaves can prevent future statements to be executed in parallel, severely impacting the performance of those statements.

This following blog post describes the situation quite well:
http://structureddata.org/2012/02/29/pitfalls-of-using-parallel-execution-with-sql-developer/

Since this behavior is not a bug, i.e. there is never going to be a fix, we need to find a way to manage it.

One solution is to disable parallelism for all sessions coming from GUI tools that use cursors and therefore could cause this problem. This is a radical step that would deprive the users from running anything in parallel.

The second option is to educate the users of the problems associated with open cursors and ask them to close all cursor as soon as possible. This approach is ideal when executed diligently, but in reality not all users are compliant.

The approach I would like to propose is to allow parallelism for all, but monitor those who do not close their open cursors. Here is the query that I use to monitor:

select count(*) from 
  (
      select * 
      from gv$px_session px_QC   
      where px_QC.qcinst_id IS NULL
      minus
      select * from gv$px_session px_QC   
      where px_QC.qcinst_id IS NULL
      and  exists
            (select * 
             from gv$px_session  px_Slaves , gv$session sess
             where px_QC.qcsid = px_Slaves.qcsid
             and px_Slaves.sid = sess.sid
             and (sess.wait_class = 'Idle'
                 or ( sess.seconds_in_wait < 600
                      and sess.wait_class = 'Idle'
                     )
                 )
             )
   )

This query returns zero if the parallel slaves are actively used and greater than zero if there is a set of parallel slaves that have been idle for 600 second. This query can be used to get the offending session and “talk” with the end user. It could be integrated with OEM using Metric Extensions or it could be part of a monitoring script that kills the offending sessions. The possibilities are endless.


“Orphaned” LOB Segments

October 31, 2014

Recently, after doing reorganization in Oracle 11.2.0.2.0 DB, I came upon a LOB segment that does not link to a LOB.
That is, there was an entry in USER_SEGMENTS and USER_OBJECTS, but there was no entry in USER_TABLES, USER_LOBS and USER_INDEXES.
It turned out that the in the recycle bin (USER_RECYCLEBIN).  It appears that the DROP command did not “remove” all references to the dropped object.
The moral of the story is that if something is missing, look for it in the trash (the recycle bin, that is). Some things never change…


A Patch for JUST_STATS Package

August 11, 2014

An alert user recently notified my about a problem with the JUST_STATS package. It appears that it does not work properly with PARTITIONs. So, click here to download the first patch.
Please note that you are free to review and modify the code of the package.


Follow

Get every new post delivered to your Inbox.