Measuring the Benefits of Database Virtualization/Thin Provisioning Solutions

May 14, 2013

Overview

Database virtualization and thin provisioning are new and powerful ways to share/reuse storage among non-production database copies.

Solutions for reducing the disk footprint of non-production Oracle databases range from complete database virtualization (Delphix, etc), to solutions build around storage vendor’s features (EMC,NetApps, etc), to purely software solutions (cloneDB, etc).

Sharing read-only tablespaces – a “poor man’s” database virtualization – is less known approach for saving disk space and improving refresh times in non-production environments. Here is a link to the presentation I did at NYOUG (2012) about this topic.

What are we measuring and why?

In general, most solutions are quite effective, both in terms of storage and refresh time, in delivering a copy of the production database.

Once the copy is delivered, it is open for users and batch jobs, so it starts changing. Those changes belong to the new database and therefore cannot be shared with the other non-production databases. That means that they consume “real” disk space. The more DB changes since refresh, the lower the benefits of the DB virtualization/thin provisioning solution.

Measuring the amount of change after a refresh is important in understanding how much disk space would be needed for the non-production Oracle environments after DB virtualization/thin provisioning is in place. The number is essential in computing ROI of the project.

How are we measuring it?

One way to measure the change is to run an incremental RMAN backup and see how big the resulting backup file(s) is. In many cases, however, that is not doable.

The method described here works only on 11gR2. It only requires the SCN of the DB at the time of refresh.

The information can be find in V$DATABASE (RESETLOG_CHANGE#) for databases created by RMAN DUPLICATE.

If the data was transferred with DataPump, or other similar method, a suitable SCN can be found in V$ARCHIVED_LOG (FIRST_CHANGE#, NEXT_CHANGE#).

The new DBVerify feature allows us to utilize HIGH_SCN parameter to find out how many blocks where changes since HIGH_SCN.

Let’s see how many blocks were modified after SCN 69732272706.

It is really easy:

dbv userid=sys/?????????? file=/tmp/test.dbf HIGH_SCN=69732272706

Page 64635 SCN 1012797077 (16.1012797077) exceeds highest scn to check 1012795970 (16.1012795970)

Page 66065 SCN 1012796687 (16.1012796687) exceeds highest scn to check 1012795970 (16.1012795970)

Page 66187 SCN 1012796687 (16.1012796687) exceeds highest scn to check 1012795970 (16.1012795970)

Page 87759 SCN 1012796692 (16.1012796692) exceeds highest scn to check 1012795970 (16.1012795970)

DBVERIFY - Verification complete

Total Pages Examined : 93440

Total Pages Processed (Data) : 62512

Total Pages Failing (Data) : 0

Total Pages Processed (Index): 9399

Total Pages Failing (Index): 0

Total Pages Processed (Other): 6119

Total Pages Processed (Seg) : 1705

Total Pages Failing (Seg) : 0

Total Pages Empty : 13705

Total Pages Marked Corrupt : 0

Total Pages Influx : 0

Total Pages Encrypted : 0

Total Pages Exceeding Highest Block SCN Specified: 39

Highest block SCN : 1012797131 (16.1012797131)

In this case, only 39 blocks (out of 100K) were modified after SCN 69732272706.

Please note that even though DBVerify (dbv) can work with open DB files, it will not account for changed blocks that were not yet written to the file.

Also, as per MOS document 985284.1 – ” Why HIGH_SCN Option Doesn’t Work While Running DBV Against ASM Files?” – the HIGH_SCN flag works only for files that are not stored in ASM. Hope they fix that problem soon.

 


Hotsos 2013 and some more useful OEM queries

January 31, 2013

Busy preparing for Hotsos 2013. My presentation this year is “Working with Confidence: How Sure Is the Oracle CBO about Its Cardinality Estimates, and Why Does It Matter?” I know it is a mouthful. The presentation/white paper would be up to the point with lots of pictures though :)

The only thing I have for the blog now is a revised version of a query I posted the last time. The original version was way too crude.

select   a.rollup_timestamp ,

         count(*) ,

         sum(c.cpu_count) ,

         sum(a.average*c.cpu_count)/ sum(c.cpu_count) ,

         sum(a.maximum*c.cpu_count)/ sum(c.cpu_count) ,

         sum((a.average+3*a.standard_deviation)*c.cpu_count) / sum(c.cpu_count)

from

         mgmt$metric_hourly a ,

         mgmt$target b ,

         sysman.MGMT_ECM_HW c

where    a.metric_name = 'Load'

and      a.column_label = 'CPU Utilization (%)'

and      a.target_guid = b.target_guid

and      b.target_name like 'dev%'

and      c.hostname||'.'||c.domain = b.target_name

and      c.vendor_name = 'Intel Based Hardware'

group by a.rollup_timestamp

order by 5 desc

The query now takes into account the number of CPUs on a server. I use the undocumented MGMT_ECM_HW view – it contains a wealth of information.

The query assumes that all CPUs are identical – not perfect, but much better than the original.


More Useful OEM Repository Queries

December 31, 2012

I already included a number of queries against the OEM repository in my presentation.
Now, I would like to add two more.

The first query shows an approximation of the average and max CPU utilization for a group of servers (names starting with “dev”).
Since the query assumed that each server has the same “weight”, the results are not completely correct for most environments.
They are a good start for a consolidation and/or virtualization projects though.

select
      a.rollup_timestamp           time, 
      count(*)                     number_of_servers,
      sum(a.average)/count(*)      average_load ,
      sum(a.maximum)/count(*)      max_recorded,
      sum(a.average+3*a.standard_deviation)/count(*) max_derived
from
      mgmt$metric_hourly a ,
      mgmt$target b
where
      a.metric_name = 'Load'
  and a.column_label = 'CPU Utilization (%)'
  and a.target_guid = b.target_guid
  and b.target_name like 'dev%'
group by a.rollup_timestamp
order by max_derived desc

1/31/2013 – The query above is quite crude - I just posted a better query  – http://wp.me/p1DHW2-5J

The second query shows all databases that belong to the TEST OEM group and are in ARCHIVELOG mode. The query can be modified to look at all databases or to any subset.

select *
from 
sysman.MGMT$GROUP_DERIVED_MEMBERSHIPS O ,
    sysman.MGMT$TARGET T ,
    sysman.MGMT$AVAILABILITY_CURRENT st
where    
     o.composite_target_name = 'TEST'
and  o.member_target_type IN ('oracle_database', 'rac_database')
and  ( t.target_type ='rac_database'
       or (t.target_type ='oracle_database'
           and t.type_qualifier3 != 'RACINST'))
and  o. member_target_guid = t.target_guid
and  t.target_guid = st.target_guid
and  st.availability_status = 'Target Up'
and  (not exists 
      (select
             *
       from
            sysman.mgmt$metric_current i
       where 
             i.target_guid = o.member_target_guid 
       and   metric_name   = 'archFull'
       and   metric_Column = 'archTotal'
       and   metric_label  = 'Archive Area'
       and   column_label  = 'Total Archive Area (KB)'
       and   key_value     = 'NOARCHIVELOG'
    ));

This query is not strictly based on the documentation, but it work in OEM 12c.


NYOUG Sept. 2012 – Reduce Your Disk Footprint by Sharing Read-Only Tablespaces

September 13, 2012

Thanks to all who attended my presentation (Reduce Your Disk Footprint by Sharing Read-Only Tablespaces)

at NYOUG.

We had a great discussion!

The Power Point slides are here – I did some animation, so this is the best way to look at the presentation. Here is the PDF version of the slides. The white paper is here - it is much more detailed than the presentation.

If you think that the non-supported universal method can save you disk space, feel free to call Oracle and ask them to take a look at it.


How to Use the New OEM 12c Metric Extensions to Enforce Enterprise-Wide Custom Policies

August 8, 2012

Oracle Enterprise Manager’s User-Defined Metrics (UDMs) were a very suitable vehicle for enforcing enterprise-wide policies. My NYOUG 2011/VirtaThon presentation and white paper contain a comprehensive example – monitoring forced logging status of all production databases.

Since UDMs are gone in OEM 12c, we’ll see how to use Metric Extensions, the successor of UDMs, to accomplish the same goal.

Metric Extensions (MEs) represent a major improvement over UDMs in terms of code maintenance and ability to deploy to large number of targets. MEs, however, do not allow us to customize the SQL query for a specific target. Even though there is a good reason behind the behavior, it requires us to substantially change the architecture we have previously used.

In the original presentation, I used one UDM ( ForcedLogging ) that had different definitions for different targets.

In OEM 12c, we’ll need to have different Metric Extensions for each of the definitions and target types.

For most single instance DB targets, we’ll use the standard “ForcedLogging” ME that is defined with this query:


select force_logging from v$database

For most RAC DB targets, we’ll use “ForcedLoggingRAC” ME that is defined with the same query:


select force_logging from v$database

And if we want to ignore the “force logging” status of a database, then we can use ForcedLoggingIgnore” ME defined with this query:


select 'YES' from v$database

The query/Metalized View that actually enforces the enterprise-wide force logging policy gets a little more complex:


CREATE MATERIALIZED VIEW "SYSMAN"."UDM_FORCED_LOGGING" ("CNT") ORGANIZATION HEAP PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "MGMT_TABLESPACE" BUILD IMMEDIATE USING INDEX REFRESH COMPLETE ON DEMAND START WITH sysdate+0 NEXT SYSDATE + 1/96 USING DEFAULT LOCAL ROLLBACK SEGMENT USING ENFORCED CONSTRAINTS DISABLE QUERY REWRITE
AS
SELECT COUNT( member_target_guid ) cnt
FROM SYSMAN.MGMT$GROUP_DERIVED_MEMBERSHIPS O ,
SYSMAN.MGMT$TARGET T ,
sysman.MGMT$AVAILABILITY_CURRENT st
WHERE o.composite_target_name = 'PRODUCTION'
AND o.member_target_type     IN ('oracle_database', 'rac_database')
AND ( t.target_type           ='rac_database'
OR (t.target_type             ='oracle_database'
AND t.type_qualifier3        != 'RACINST'))
AND o. member_target_guid     = t.target_guid
AND t.target_guid             = st.target_guid
AND st.availability_status    = 'Target Up'
AND (NOT EXISTS
(SELECT *
FROM sysman.mgmt$metric_current i
WHERE I.TARGET_GUID      = O.MEMBER_TARGET_GUID
AND metric_name          = 'ME$ForcedLogging'
AND column_label         = 'ForcedLogging'
AND Metric_Column        = 'ForcedLogging'
AND collection_timestamp > sysdate - 30/1440
AND value                = 'YES'
)
AND NOT EXISTS
(SELECT *
FROM sysman.mgmt$metric_current i
WHERE i.target_guid      = o.member_target_guid
AND metric_name          = 'ME$ForcedLoggingRAC'
AND column_label         = 'ForcedLogging'
AND Metric_Column        = 'ForcedLogging'
AND collection_timestamp > sysdate - 30/1440
AND value                = 'YES'
)
AND NOT EXISTS
(SELECT *
FROM sysman.mgmt$metric_current i
WHERE i.target_guid      = o.member_target_guid
AND metric_name          = 'ME$ForcedLoggingIgnore'
AND column_label         = 'ForcedLogging'
AND Metric_Column        = 'ForcedLogging'
AND collection_timestamp > sysdate - 30/1440
AND value                = 'YES'
));


To autoextend or not to autoextend?

December 28, 2011

This important Oracle DB question has been around for a long time without a definitive answer. This post not only gives an objective overview of the pros and cons of using autoextend (on) ,but also provides some guidance on how to handle common issues related to the autoextend option.

Here are some definitions I am going to use:

  • “Autoextend ” is the primary disk space management mechanism when DBA does not do any monitoring on tablespace level, but rather forecasts and monitors the file system(s) or ASM disk group(s) that hold the database.
  • “Autoextend ” is a secondary disk space management mechanism when DBA monitors and forecasts tablespace usage and makes sure that there is free space in each tablespace to accommodate growth for some time. The DBA uses “autoextend on” only as a way to handle unexpected data growth.

Pros of autoextend on:

  • Ability to accommodate unexpected data growth.

That is the major benefit of that feature.  Some believe that rigorous forecasting and monitoring are enough for disk space management. I firmly believe that a DBA should monitor and forecast tablespace usage. I even presented a paper that explains, among many other things, how to forecast tablespace usage using OEM repository tables. Nevertheless, I am convinced that no monitoring or forecasting would be able handle an unexpected surge in activity (hence unexpected :)  ).

Resumable space allocation (RESUMABLE_TIMEOUT ) is another way to handle unexpected data growth.

  • Easiness to manage

Autoextend as a primary disk space management mechanism can simplify somehow disk monitoring and forecasting of databases with large number of tablespaces but small number of  ASM diskgroups /file systems.

In general, however, what autoextend does is just shift the burden of monitoring and forecasting from DB tablespace monitoring and forecasting (DBA) to file system/ASM monitoring and forecasting (SA/DBA). That’s not that big of a benefit.

Cons of autoextend on:

/Wording slightly modified on 1/22/2013 – Thanks to J. Garry for the useful suggestion/

  • As a result of an application or user error, a single tablespace can fill up the whole filesystem/ASM group, bringing all applications using that filesystem/ASM group to a halt.

This is a valid concern that can be addressed in most cases.

First, tablespaces that have higher chance of experiencing uncontrolled growth, such as TEMP, UNDO, “staging”, etc, should not use autoextend on.

Second, DBA should limit (utilizing max file size parameter) how much a single tablespace can grow. A single tablesopace should not be allowed to use all available disk space. For instance, if there is 50G free in an ASM group, DBA should make sure that no single tablespace can grow more than 40G (leaving 10G safety buffer).

If a single table growth can affect more than one tablespace (DATA and INDEX, for instance), then the combined tablesopace growth should be less than the free disk space. If we have 50G free in an ASM group, DBA should allow DATA and INDEX tablespace to grow by only 20G each, so both can grow no more than 40G.

This is a workable solution only when autoextend is a secondary disk management mechanism. If autoextend is a primary disk management mechanism then the free space will change rather frequently (after each autoextend), so staying on top of those limitations can be very hard.

  • The actual process of extending DB files, allowed by autoextend on, causes performance issues.

This could be a valid concern, particularly for OLTP system with end-users who expect response in sub-second time.  Using autoextend as a secondary mechanism greatly reduces the chances of this happening.  In that case, autoextend growth, with performance issues and all, it is much better than the alternative – outright failure.

Conclusion

Autoextend is an excellent DB option that can provide us with protection against unexpected data growth. Used properly, autoextend has minimal downside. In overwhelming majority of cases, autoextend should be used as a secondary disk space management mechanism.


Is controlling parallelism with an Oracle resource plan ever a good option?

September 30, 2011

Ever wanted to be able to run some code in parallel, while keeping the rest execute sequentially? Oracle Resource Groups allows us to dispatch sessions into groups and impose restrictions about the resource usage of those groups. Parallelism is one of the resources that can be managed by Resource Groups, so things look promising.

 It works too. I was fortunate enough to avoid hitting any bugs.

 So here the problem: All plans are generated as if all parallel processes are available for the query. A query that is to be executed serially can get a parallel execution plan, which often evolves full table scans. When it comes to actually running the query, Oracle will abide by restriction of the Resource Plan and run it sequentially. This is major structural problem causing suboptimal plans. It is like planning your move based on the assumption that would get a large U-Haul truck, and coming move day all you can get is a compact car… Not pretty.


Follow

Get every new post delivered to your Inbox.