When Your Oracle DataGuarded Database is Crashed (unintentionall)

It happens:  You have a 4-node DataGuarded Oracle 11gR2 database and your System Administrators need to take the boxes down for maintenance. But they don’t know that it’s protected under DataGuard, with Fast-start failover enabled (which automatically performs (via DG Observer) the switchover from Primary to Standby, switches roles, reconfigures listeners, and tries to keep everything 99.99995% available.)  So they use good old:

sqlplus “/ as sysdba”

SQL> shutdown immediate;

Database closed.

And now you need to start it back up.  But when your admins get to their familiar old prompt it burps out:

SQL> startup

Error: ORA-16825: multiple errors or warnings, including fast-start failover-related errors or warnings, detected for the database

Now there are a number of blogs which will walk you through the tedious steps of recovering this condition manually via DGMGRL and multiple control file and standby redo log restores to all the targets. But it was 1:00A local time and I didn’t want to spend the rest of the night crawling through these trying to get this beast back on its feet.

So, I returned to the never-ending exploratory world of what is contained in Oracle Enterprise Manager 12cR3 (12.1.0.3.0)  And (tada sound), OEM can handle it all for you relatively automatically.

To get your Primary DB back on-line (to avoid the complaints from the users who have been disconnected)

DGMGRL> connect SYS/<pwd>@<primaryDGDBtarget>

DGMGRL> disable fast_start failover force;

SQL> startup

Database opened.

Now that the Primary is back open and accessible, return to OEM OMS and visit:

Targets -> Database -> (your Primary DB)

Availability -> Data Guard Administration

There’s a text link at the bottom which reads

Additional Administration:

Verify Configuration

That’s your ticket back to green-arrows and healthy DG in 90% of the situations.  It will perform health-checks throughout your configuration, re-created standby redo logs, re-sychronizing disconnected standby databases, shipping archive logs whereever they are needed, repairing disrupted communications, restoring fast-start failover observers, and whatever else you have.

The other alternative is to simply Remove the existing failed Standby database (select, then [Remove] the existing standby database; then visit its host and delete the existing datafiles, redo, archive and tempfiles (you do not have to touch the TNSNAMES or Listener configurations – that will be re-created for you); then use [Add Standby Database] on the same screen to restore functionality (this is good as an if-all-else-fails, so this method of recovery).

Happy Guarding Data!

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s