Wednesday, March 12, 2014

Querying the Top 5 Timed Events

Querying the Top 5 Timed Events

In a previous life, I spent the better part of two years digesting other people's AWR reports.  Text, HTML, Statspack etc., all in an effort to show how slow their storage was an how much better their databases would perform on our all-flash storage array.

One of the frustrations of this job was that every problem generally only had one solution (a flash storage array, if you hadn't guessed) and along the way I encountered some pretty horribly tuned databases but of course could do nothing about that.

I did get to work with some excellent and talented people though, including the English duo of Chris Buckel and Ash Ubhi who have produced some sterling work in the field of AWR analysis - and parsing AWR reports and transforming them into lovely Excel charts...  All very useful sales stuff and actually quite essential for in-house DBA's for reporting purposes.  Most management like a pretty chart.

Writing my own AWR scripts to circumvent having to run a thousand reports led me down various rat holes - I was really looking for the top 5 timed events in query form.... and then I stumbled across the blog of my fellow Enkitecian, Alex Fatkulin.  In his blog entry (that's where the link goes), he provides a script to do just what I wanted.   Here's Alex's original script.

As you can see, it will give a nicely formatted output....

Top 5 Wait Events
Great for various purposes so utilizing the work Alex already did, I modified it slightly to give me statistics that I could turn into useful presentation materials.  Here's a snippet of the data I extracted:

More Top 5 Wait Events
In general, I only care about a few events - I/O or contention related for my purposes on this client engagement.  

Graphical wait event data
Not rocket science but it makes my life easier.  My code modifications to the original are minimal.

Have fun with your own AWR analysis.

Sunday, February 23, 2014

Beware the transparent HugePages in Linux 6

Beware the transparent HugePages in Linux 6

I came across this one earlier this week on a client site which had me scratching my head.


[root@someServer ~]# grep Huge /proc/meminfo
AnonHugePages 458752 kB
HugePages_Total 0
HugePages_Free 0
HugePages_Rsvd 0
HugePages_Surp 0
Hugepagesize 2048 kB

All looks good, except what are Anonymous HugePages?

Turns out that new with Linux 6 amongst others, comes this weird set of dynamic HugePages that are different from the static ones defined and created at boot time.

Digging into Oracle Support, I see the lovely note 1557478.1 that indicates that these are generally not good things to have around on Oracle servers and that they should be disabled.

For your edification, this is the important part of that note and describes how to tell if it's enabled and if so, how to disable it.


ALERT: Disable Transparent HugePages on SLES11, RHEL6, OL6 and UEK2 Kernels (Doc ID 1557478.1)
Starting with RedHat6, OL6, SLES11 and UEK2 kernels, Transparent HugePages are implemented and enabled (default) in an attempt to improve the memory management.  Transparent HugePages are similar to the HugePages that have been available in previous Linux releases.  The main difference is that the Transparent HugePages are set up dynamically at run time by the khugepaged thread in kernel while the regular HugePages had to be preallocated at the boot up time.

Because Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC, Oracle strongly advises to disable the use of Transparent HugePages. In addition, Transparent Hugepages may cause problems even in a single-instance database environment with unexpected performance problems or delays. As such, Oracle recommends disabling Transparent HugePages on all Database servers running Oracle.

# cat /sys/kernel/mm/*transparent_hugepage/enabled
[always] never

To disable Transparent HugePages boot time either one of the following 2 methods may be used:
Add the following to the kernel boot line in /etc/grub.conf (a symlink to /boot/grub/grub.conf) and reboot the server (this is the preferred method):
transparent_hugepage=never

Once modified the line will read similar to the following example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img

OR
Add the following lines in /etc/rc.local and reboot the server:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
  echo never > /sys/kernel/mm/transparent_hugepage/enabledfiif test -f /sys/kernel/mm/transparent_hugepage/defrag; then  echo never > /sys/kernel/mm/transparent_hugepage/defragfiPlease change the file path for RHEL kernel to /sys/kernel/mm/redhat_transparent_hugepage/ accordingly.


Tuesday, January 21, 2014

Time Warp Oddity

Time warp oddity

So a few days ago, a colleague was converting a client's database from 10.2 single instance (on Linux) to 11.2 RAC (on Linux).  Everything went smoothly.  As with the source database server, the new server uses UTC time.

However he noticed that all SYSDATE functions accessed by remote clients (or by logging in from the server using sqlplus and a connect string) were off - and whilst the current time might be 17:00 UTC, the time returned by the client was 12:00 EDT.

Puzzling - there's no timezone support for a listener which clearly is the issue here, yet the problem is from a little-known configuration file used as part of the Grid Infrastructure install.

Using "cat /proc/<pid>/environ for our listener process we saw that the process environment (derived from the configuration file below) was incorrect.

Oracle Support document 1390015.1 illustrates this problem which boils down to either editing the file shown below, or using srvctl setenv to fix the issue and then either way, bounce the clusterware.

From their note:
For 11.2.0.2 and above the TZ entry in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt sets to correct time zone! This enables resources started under the Grid Infrastructure home (listener, databases) using the timezone set in here and so has no requirement anymore to set TZ for DB and listener via setenv. For more details see: How To Change Timezone for 11gR2 Grid Infrastructure ( Document 1209444.1)

Nice and obscure.

Friday, December 13, 2013

Identifying the interloper

I'm often looking to identify top consumers - SQL statements that we know will regularly chew through time, CPU, memory etc.  After a while of living with an application, you will get used to seeing certain statements up there however over time, you may miss something that creeps in.

Most sites I visit as a consultant have the DBMS_WORKLOAD_REPOSITORY snapshots taken automatically every 30 or 60 minutes which may be fine for applications that run well and predictably, however finding problematic events (bad SQL, spikes in latency etc) require that snapshot creation be more frequent.  I personally like snapshots to be generated every 10 minutes which is the minimum interval permissible in Oracle 11g and 12c.  I should add that this is the most frequent that automatic snapshots will be taken - there is nothing stopping you from creating them almost as frequently as you like (beware granularity issues should you try this).

Recently I've been getting a lot of requests to identify what the top consuming SQL statements are and to identify new statements that appear as new software is deployed.  You can certainly do this through reading the AWR reports, but the comparison between snapshots requires a little manipulation unless you like scraping data into Excel (or SuperCalc 4 for those in the stone age) so I came up with some SQL to do the hard work for me.

In this particular case, the SQL looks for statements executed the most frequently, specifically by the "executions_delta" that is quite the useful statistic.  You could of course care about any of the measurements - elapsed time, CPU consumed, I/O waits etc - modify the SQL as appropriate.
So, here it is.  It will identify the top 5 ranked SQL statements for snapshots taken in the past ten hours.

  SELECT snap_id,
         MAX (CASE WHEN exec_rank = 1 THEN sql_id ELSE NULL END) AS "First",
         MAX (CASE WHEN exec_rank = 2 THEN sql_id ELSE NULL END) AS "Second",
         MAX (CASE WHEN exec_rank = 3 THEN sql_id ELSE NULL END) AS "Third",
         MAX (CASE WHEN exec_rank = 4 THEN sql_id ELSE NULL END) AS "Fourth",
         MAX (CASE WHEN exec_rank = 5 THEN sql_id ELSE NULL END) AS "Fifth"
    FROM (  SELECT a.snap_id,
                   a.sql_id,
                   a.executions_delta,
                   DENSE_RANK ()
                   OVER (PARTITION BY a.snap_id ORDER BY a.executions_delta DESC)
                      exec_rank
              FROM sys.wrh$_sqlstat a, sys.dba_hist_snapshot d
             WHERE     a.snap_id = d.snap_id
                   AND (    (d.begin_interval_time > SYSDATE - 10 / 24)
                        AND (d.end_interval_time < SYSDATE - 1 / 96))
          ORDER BY snap_id, exec_rank, sql_id)
   WHERE exec_rank < 6 GROUP BY snap_id

That done, you should end up with data that looks like this. Now it should be somewhat simpler to identify what has changed between snapshots.

SNAP_ID
First
Second
Third
Fourth
Fifth
2909
96g93hntrzjtr
3nkd3g3ju5ph1
db78fxqxwxt7r
53saa2zkr6wc3
32hbap2vtmf53
2910
cm5vu20fhtnq1
3c1kubcdjnppq
b2gnxm5z6r51n
95mpkn5xz9001
db78fxqxwxt7r
2911
cm5vu20fhtnq1
61a161nm1ttjj
5ms6rbzdnq16t
3c1kubcdjnppq
089dbukv1aanh
2912
cm5vu20fhtnq1
61a161nm1ttjj
5ms6rbzdnq16t
089dbukv1aanh
csnp95dz2r8ss
2913
cm5vu20fhtnq1
61a161nm1ttjj
089dbukv1aanh
9pvbjqqt222b4
csnp95dz2r8ss
2914
cm5vu20fhtnq1
61a161nm1ttjj
089dbukv1aanh
9pvbjqqt222b4
csnp95dz2r8ss
2915
cm5vu20fhtnq1
61a161nm1ttjj
5ms6rbzdnq16t
089dbukv1aanh
9pvbjqqt222b4
2916
cm5vu20fhtnq1
61a161nm1ttjj
089dbukv1aanh
3c1kubcdjnppq
9pvbjqqt222b4

Friday, November 22, 2013

Welcome - and an introduction

I should perhaps start by declaring that I am not a blogger, nor does the social media aspect of modern internet life come naturally to me.

I am, by profession, an Oracle Database Administrator and have been since 1992 when a kind CEO offered me a job as a "DBA".  Now, I had no idea what a DBA did; my prior experience with Oracle was with an AT&T Istel patient administration system in 1988 at a London hospital.  Of course I couldn't Google it and my only hope was one of the Oracle Forms developers (green screen) and the Oracle Version 6 Administrator Guide which was relatively thin.

Almost 22 years later, I'm still in the field in spite of a very nice offer from IBM for me to specialize in AIX.  Very glad to be here now, what with Exadata, Big Data Appliance, 12c and a host of features too numerous to mention.

I'm convinced that the role of the DBA is not going away any time soon - as naysayers and "thought leaders" have predicted over the past few years.  As more automation comes in, there's more complexity to manage; more products and features to learn, optimize and new roadblocks to overcome.
I recall in version 6 spending a lot of time managing rollback segments.  Now it's all automatic and yet I don't feel my role is any less relevant because I don't have to do that any more.

Of late I am very excited to have joined the Enkitec team where the brilliant minds of many Oracle Aces and Ace Directors have congregated.  I feel smarter just breathing the same air :)

So from time to time I will post something here about situations I've encountered, perhaps some useful SQL or techniques and maybe some of those head-scratching moments one has when faced with either mind boggling situations or just "WTF" moments.

You may also experience the odd photo.  Cat, dog or travel.  Or just bizarre.


For those interested, my photos are available at my Flickr site.

Enjoy!  And avoid those ORA-00600s.