If you have a NetApp storage appliance, device, SAN, whatever you want to call it, review this first (NetApp Best Practices for Oracle Databases – published March 2014): http://www.netapp.com/us/media/tr-3633.pdf
The quick and dirty (for those experiencing Production issues right now):
The database-side init.ora:
grep filesystemio $ORACLE_HOME/dbs/initSID.ora
The options for filesystemio_options can be summarized as follows:
- ASYNCH: Asynchronous I/O. Oracle should submit I/O requests to the operating system for processing. This permits Oracle to carry on with other work rather than waiting for I/O completion and increases parallelization of I/O.
- DIRECTIO: Direct I/O. Oracle should perform I/O directly against physical files rather than routing I/O through the host operating system cache.
- NONE: Use synchronous and buffered I/O. In these configurations, the choice between shared and dedicated server processes and the number of dbwriters will become more important.
- SETALL: Use both asynchronous and direct I/O. (preferred for NetApp)
Note: The filesystemio_options parameter has no effect in DNFS and ASM environments. The use of Direct NFS (DNFS) or Automatic Storage Management (ASM) automatically results in the use of both asynchronous and direct I/O.
However…. Because you might end up cloning to an environment that doesn’t support DNFS (remember it’s stuck in the ORACLE_HOME/rdbms/lib object binary), you should have it set to SETALL anyway to allow fall-back to standard NFS (but with asynch and DI/O.).
[This refers to the: cd $ORACLE_HOME/rdbms/lib ; make -f ins_rdbms.mk [dnfs_on / dnfs_off] used to enable and disable DNFS (you also need the /etc/oranfstab created to support this – server: local: path: export: mount: etc…)]
If you have 126.96.36.199.x or later (12c) there’s a new DNFS-related init.ora parameter critical for managing packet requests by the database to avoid packet flooding the NFS file server: Without it (i.e. older versions) DNFS only works well on dedicated DNFS storage because an oracle database on modern hardware can easily over-whelm the NetApp’s ability to service TCP packets (can send upwards of 4,000+ requests per second)
$> grep dnfs $ORACLE_HOME/dbs/initSID.ora
On the Linux OS side:
$> grep tcp /etc/sysctl.conf
sunrpc.tcp_slot_table_entries = 128
$> grep nosharecache /etc/fstab
hostname.corp:/vol/testoracled/d01 /u01/app/Oracle nfs rw,bg,hard,vers=3,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr,nosharecache
hostname.corp:/vol/testdata /u01/app/Oracle/oradata nfs rw,bg,hard,vers=3,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr,nosharecache
Incidentally, if you want to measure your I/O rates, I have been using SLOB (Silly Little Oracle Basher – http://kevinclosson.wordpress.com/slob/ ) with one small modification to the iostat command in the runit.sh script to capture the I/O activity on all the shares (iostat -mn instead of -xm):
# ( iostat -xm 3 > iostat.out 2>&1 ) &
( iostat -mn 3 > iostat.out 2>&1 ) &
( vmstat 3 > vmstat.out 2>&1 ) &
( mpstat -P ALL 3 > mpstat.out 2>&1) &
All this besides the usual OEM recommendations (bigger log_buffer, manage SGA size, get block sizes correct), dropped latency from over 40000ms to < 80ms on heavy load, and is producing 12000 IOPS on the same device.
This is an R12 e-Business Suite environment running on an 188.8.131.52.13 (PSU JAN2015) database (5TB).
What triggered this investigation was that the Log Writer process (LGWR) began dumping trace files about 6 months ago with entries that look like:
*** 2014-12-21 19:49:29.827
Warning: log write elapsed time 9988ms, size 1KB
After investigating all the usual suspects about slow disk, I happened upon the aforementioned NetApp white paper. This occurs when multiple hosts are competing for resources under a single NetApp appliance environment (multiple enclosures, multiple dedicated or shared aggregates, etc.)
Situation addressed and under control.