Tag Archives: performance

Slow Oracle Database Performance on a NetApp (10g, 11g, 12c) Tips

http://www.netapp.com
http://www.netapp.com

If you have a NetApp storage appliance, device, SAN, whatever you want to call it, review this first (NetApp Best Practices for Oracle Databases – published March 2014): http://www.netapp.com/us/media/tr-3633.pdf

The quick and dirty (for those experiencing Production issues right now):

The database-side init.ora:

grep filesystemio $ORACLE_HOME/dbs/initSID.ora

*.filesystemio_options=’SETALL’

The options for filesystemio_options can be summarized as follows:

  • ASYNCH: Asynchronous I/O. Oracle should submit I/O requests to the operating system for processing. This permits Oracle to carry on with other work rather than waiting for I/O completion and increases parallelization of I/O.
  • DIRECTIO: Direct I/O. Oracle should perform I/O directly against physical files rather than routing I/O through the host operating system cache.
  • NONE: Use synchronous and buffered I/O. In these configurations, the choice between shared and dedicated server processes and the number of dbwriters will become more important.
  • SETALL: Use both asynchronous and direct I/O. (preferred for NetApp)

Note: The filesystemio_options parameter has no effect in DNFS and ASM environments. The use of Direct NFS (DNFS) or Automatic Storage Management (ASM) automatically results in the use of both asynchronous and direct I/O.

However…. Because you might end up cloning to an environment that doesn’t support DNFS (remember it’s stuck in the ORACLE_HOME/rdbms/lib object binary), you should have it set to SETALL anyway to allow fall-back to standard NFS (but with asynch and DI/O.).

[This refers to the: cd $ORACLE_HOME/rdbms/lib ; make -f ins_rdbms.mk [dnfs_on / dnfs_off] used to enable and disable DNFS (you also need the /etc/oranfstab created to support this – server:   local:   path: export: mount: etc…)]

If you have 11.2.0.4.x or later (12c) there’s a new DNFS-related init.ora parameter critical for managing packet requests by the database to avoid packet flooding the NFS file server:  Without it (i.e. older versions) DNFS only works well on dedicated DNFS storage because an oracle database on modern hardware can easily over-whelm the NetApp’s ability to service TCP packets (can send upwards of 4,000+ requests per second)

$> grep dnfs $ORACLE_HOME/dbs/initSID.ora

*.dnfs_batch_size=128

On the Linux OS side:

$> grep tcp /etc/sysctl.conf

sunrpc.tcp_slot_table_entries = 128

$> grep nosharecache /etc/fstab

hostname.corp:/vol/testoracled/d01     /u01/app/Oracle              nfs rw,bg,hard,vers=3,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr,nosharecache

hostname.corp:/vol/testdata            /u01/app/Oracle/oradata     nfs rw,bg,hard,vers=3,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr,nosharecache

Incidentally, if you want to measure your I/O rates, I have been using SLOB (Silly Little Oracle Basher – http://kevinclosson.wordpress.com/slob/ ) with one small modification to the iostat command in the runit.sh script to capture the I/O activity on all the shares (iostat -mn instead of -xm):

#      ( iostat -xm 3 > iostat.out 2>&1 ) &
( iostat -mn 3 > iostat.out 2>&1 ) &
misc_pids=”${misc_pids} $!”
( vmstat 3 > vmstat.out 2>&1 ) &
misc_pids=”${misc_pids} $!”
( mpstat -P ALL 3  > mpstat.out 2>&1) &
misc_pids=”${misc_pids} $!”

All this besides the usual OEM recommendations (bigger log_buffer, manage SGA size, get block sizes correct), dropped latency from over 40000ms to < 80ms on heavy load, and is producing 12000 IOPS on the same device.

This is an R12 e-Business Suite environment running on an 11.2.0.3.13 (PSU JAN2015) database (5TB).

What triggered this investigation was that the Log Writer process (LGWR) began dumping trace files about 6 months ago with entries that look like:

*** 2014-12-21 19:49:29.827
Warning: log write elapsed time 9988ms, size 1KB

After investigating all the usual suspects about slow disk, I happened upon the aforementioned NetApp white paper. This occurs when multiple hosts are competing for resources under a single NetApp appliance environment (multiple enclosures, multiple dedicated or shared aggregates, etc.)

Situation addressed and under control.

Advertisements

R12 e-Business Suite and OEM Monitoring – Oracle Spins Freezes

Every so often, system load on an e-Business Suite instance ramps up and response time to users starts climbing, often resulting in user observed errors such as:

  • FRM-92100 Your connect to the server has been interrupted
  • FRM-92102 A network error has occurred

    FRM-92102 Forms Error R12 EBS
    That dreaded FRM-91201 / FRM-91200 error causing you to restart your session.

Or sometimes, the screen just freezes (aka spins, stops, is broken, stuck, motionless, looks like a screen saver,can’t do anything, won’t work, froze-up, etc.) and the person has to close their browser, or even shut-down their workstation and restart.

It's simply not doing anything - Nothing to see here, just move your cursor around. And wait... and wait.
It’s simply not doing anything – Nothing to see here, just move your cursor around. And wait… and wait.

Old technology often barks with unrelated error messages to the actual cause.  If there’s a lot going on with concurrent requests, or interfaces, or analytic extracts running, the front-end response-time slows down, sometimes sufficiently to trigger these kinds of Form errors, even though technically there was no interruption to the network connectivity, either between the hosts, nor the workstation and the middle-tier application server.

However, on the database, the user-experience can be seen, although not necessarily in the place you might expect.  OEM  had introduced it’s Adaptive Metric Thresholds technology back in OEM 11g (in a slightly different place than in 12c (in Oracle Management Server/OMS 12.1.0.4.0).  In OEM 11g, they were a link under the AWR Baseline Reports page.

OEM 11g AWR Baselines Page
See the Baseline Metric Thresholds link at the bottom.

In OEM 12c, you’ll find them under

Targets -> Database -> Peformance -> Adaptive Thresholds -> Baseline Metric Thresholds -> Edit Thresholds:

OEM 12c Baseline Metric Thresholds
Where those adaptive metric thresholds moved in 12c.

 

 

On this page and in the list of Baseline Metrics, when you click into them, you can access the trending statistics being gathered for each metric.  Many times this will provide direct insight into what a user experiences as the “the system is frozen” translates into “the back-end database response time is incredibly bad.”

OEM 12c Baseline Metric Response Time per Transaction vs. Baseline
See the spikes around 7AM and 11:30AM? Those are being associated with “System Froze” reports.

 

In the example here, the database experienced a dramatic slow-down in response almost 5 to 10 times slower than usual, which only lasted a few seconds. But that can be enough to show up in many users’ sessions who might have just kicked off a query, or were trying to save something.  Based upon the information gathered, we set the Warning and Critical thresholds to 1500ms and 2000ms respectively to start sending e-mail alert notifications upon breach of the levels. If the settings are left at “None”, no incident would be raised, and thus, no notification would be sent.

If you’re experiencing odd transient outages or sluggish behavior that defies the normal AWR and ADDM snapshot analysis, go take a look at what OEM has been gathering in the background over time and see if the statistics correlate to any of your issues.  There’s value in that data. Just mine it.

12c Histograms pt.2

Oracle Scratchpad

In part 2 of this mini-series I’ll be describing the new mechanism for the simple frequency histogram and the logic of the Top-N frequency histogram. In part 3 I’ll be looking at the new hybrid histogram.

You need to know about the approximate NDV (number of distinct values) before you start examining the 12c implementation of the frequency and top-frequency histograms – but there’s a thumbnail sketch at the end of the posting if you need a quick reminder.

View original post 1,155 more words