Tag Archives: dataguard

Oracle DataGuard and Standby Database Archive Logs

Users see your wonderful DataGuard implementation like this:

Simple Oracle Dataguard Architecture
Simple Oracle Dataguard Architecture courtesy of https://appsdbatraining.files.wordpress.com

And yet, you know the actual picture looks more like this:

A map of Oracle DataGuard Architecture components
A map of Oracle DataGuard Architecture components

High-availability – the concept behind it makes every DBA shudder because every time it seems you deal with one element and have it protected, there’s another underlying component that also needs protection and redudancy, or else your solution is still insufficient.

Real Application Clusters (RAC) covers individual database host failures but is sensitive to failure of the storage subsystem or the network interconnections between the hosts.

Recovery Manager (RMAN) is your vital tool to keeping track of what’s backed up and where is it.  And its catalog of recovery information could reside locally in copies of the controlfiles, or centrally in another database.  Depends on your backup strategies, really.  Are you using SAN-based backups (snaps, virtual images, deduplicated block replication) or off-site methods that would have to be shipped back to start recovery?

But the typical first-time setup scenario, is you use the OEM-based jiffy whizbang method to setup your new DataGuard environment at the recommendation of one of the steps int the Maximum Availability Advisor (MAA), and everything’s up and running nicely.  You schedule a new weekly full backup, plus daily incremental backup as Oracle recommended practices prescribe, and notice everything’s running smoothly.

Except on your standby database, the archivelogs are piling up and not being deleted automatically.  What’s next?

This thread was a basic discussion in the Oracle Community forums of the topic:

https://community.oracle.com/thread/2388130?start=0&tstart=0

This is a typical RMAN-based configuration:

On Primary

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

On Standby ( Depends upon where backup is preformed )

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;

Or

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

/* if Standby Where Backups Are Not Performed  */

Source:

Data Guard Concepts and Administration 12c

http://docs.oracle.com/database/121/SBYDB/toc.htm

12 Using RMAN to Back Up and Restore FilesRMAN Configurations at the Primary Database
12.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed

The following RMAN configurations are recommended at a standby database where backups are not done:

  1. Connect RMAN to the standby database as target, and to the recovery catalog.
  2. Enable automatic deletion of archived logs once they are applied at the standby database (this is also applicable to all terminal databases when the cascading or far sync instance features are in use):
  3. CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

 

** However, that doesn’t really take into consideration what might happen if a final archivelog before switchover of roles doesn’t quite make it on the standby (for whatever reason, it gets corrupted during playback or something similar which results in a Database Needs More Recovery error.)

Based upon:

12.3.3 RMAN Configurations at a Standby Database Where Backups are Performed

The following RMAN configurations are recommended at a standby database where backups are done:

  1. Connect RMAN to the standby database (where backups are performed) as target, and to the recovery catalog.
  2. Enable automatic backup of the control file and the server parameter file:

3.  RMAN > CONFIGURE CONTROLFILE AUTOBACKUP ON;

  1. Skip backing up data files for which there already exists a valid backup with the same checkpoint:

5.  RMAN > CONFIGURE BACKUP OPTIMIZATION ON;

  1. Configure the tape channels to create backups as required by media management software:

7.  RMAN > CONFIGURE CHANNEL DEVICE TYPE SBT PARMS ‘<channel parameters>’;

  1. Because the archived logs are backed up at the standby database, Oracle recommends that you configure the BACKED UP option for the log deletion policy:

9.  RMAN > CONFIGURE ARCHIVELOG DELETION POLICY BACKED UP n TIMES TO [DEVICE TYPE SBT];

 

I came up with the configuration of:

 

On Standby ( Depends upon where backup is preformed )

# If no DataGuard is present (single DB host):

# Ensure daily RMAN backup job is being executed in OEM or via cron.

RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK;

 

And setup a 2nd backup set for the Standby Host DB.  This puts the backups into the defined Fast Recovery Area and manages both the backups and archivelog retention in the same mountpoint.

 

Each has its pros and cons depending on the scenario.You need to lay out your entire architecture scheme including backup solutions and play out the various scenarios that you’re required to cover as far as your Quality of Service (QOS) guarantee to your end-user population.

And of course, if you’re using the advanced cross-WAN FarSync DataGuard implementation architecture (wherein there’s a separate Failover Archive Log (FAL) standby database whose sole purpose in life is to cache archivelogs in case the data replication stream is too much for WAN bandwidth to handle in real-time.) this all still applies because the FAL server is basically just another standby target which needs managing just as much as any regular LAN-based full DataGuard standby instance (it’s just missing the big datafiles and handles all the archivelog traffic.)  You’ll just have even more servers and services involved in keeping the whole thing running (like your Global Names Service servers and databases, which might also be RAC and DataGuard protected, or your Single-Sign On authentication services, or even the OEM Cloud Control OMS itself orchestrating all of that.)

 

Advertisements

DataGuard and OEM 12c OMS DB Failover Configuration

# When Oracle DataGuard high-availability for the OMS database is configured using the OEM DataGuard Administration Wizard, and fast-start failover is configured, fail-overs automatically rename the standby as primary, and vice-versa and establish the change-over in roles.  While this accomplishes the database staying online and available on the secondary host (or all other databases in the DG group), the Enterprise Manager OMS must be told how to connect to it – preferably transparently.

# DataGuard OMS Registration
# Enterprise Manager Grid Control 11g: How to Configure the OMS Connect String when Repository is in a Dataguard setup (Doc ID 1328768.1)
# OEMPR11 is our primary DB SID/Service Name
# OEMPR11_DGMGRL is our alias for the fail-over service (pointing to all DG instances)

SQLPLUS as SYS:
SQL> exec DBMS_SERVICE.CREATE_SERVICE (service_name => ‘OEMPR11_DGMGRL’,network_name => ‘OEMPR11_DGMGRL’,aq_ha_notifications => true,failover_method => ‘BASIC’,failover_type => ‘SELECT’,failover_retries => 180,failover_delay => 1);

SQL> exec dbms_service.start_service(‘OEMPR11_DGMGRL’);

# Verify operation:

$> lsnrctl services     # Should see the new OEMPR11_DGMGRL service listed

# Create a Database Trigger so that the service can be stopped when the Database role becomes standby and started only when the Database role is Primary:

SQL> CREATE OR REPLACE TRIGGER manage_OCIservice after startup on database
DECLARE
role VARCHAR(30);
BEGIN
SELECT DATABASE_ROLE INTO role FROM V$DATABASE;
IF role = ‘PRIMARY’ THEN
DBMS_SERVICE.START_SERVICE(‘OEMPR11_DGMGRL’);
ELSE
DBMS_SERVICE.STOP_SERVICE(‘OEMPR11_DGMGRL’);
END IF;
END;

# Re-configure the OMS (All MT hosts) to have the connection string as:
$>  cd <OMS_HOME>/bin
# Following is a single-line command (basically an entire JDBC style connect string)
$> ./emctl config oms -store_repos_details -repos_conndesc ‘(DESCRIPTION=(FAILOVER=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=<primaryDBHostnameFQDN>)(PORT=1522))(ADDRESS=(PROTOCOL=TCP)(HOST=<secondaryDBHostnameFQDN>)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=OEMPR11_DGMGRL))(FAILOVER_MODE=(TYPE=select)(METHOD=basic)))’ -repos_user sysman

# Example output
$> ./emctl config oms <…> TYPE=select)(METHOD=basic)))’ -repos_user sysman               <
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
Enter Repository User’s Password :
Successfully updated datasources and stored repository details in Credential Store.
If there are multiple OMSs in this environment, run this store_repos_details command on all of them.
And finally, restart all the OMSs using ’emctl stop oms -all’ and ’emctl start oms’.
It is also necessary to restart the BI Publisher Managed Server.

# Add the tnsnames.ora entry (all DB hosts at minimum)
OEMPR11_DGMGRL=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=<primaryDBHostnameFQDN>)(PORT=1522))
(ADDRESS=(PROTOCOL=TCP)(HOST=<secondaryDBHostnameFQDN>)(PORT=1521))
)
(CONNECT_DATA=(SERVICE_NAME=OEMPR11_DGMGRL))
(FAILOVER_MODE=(TYPE=select)(METHOD=basic))
)

# Testing connectivity:

$> sqlplus sysman/$SYSMAN_PW@'(DESCRIPTION=(FAILOVER=ON)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=<primaryDBHostnameFQDN>)(PORT=1522))(ADDRESS=(PROTOCOL=TCP)(HOST=<secondaryDBHostnameFQDN>)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=OEMPR11_DGMGRL))(FAILOVER_MODE=(TYPE=select)(METHOD=basic)))’

# Test the Failover Service:

# Connect to the Database from sqlplus using SYSMAN user via the new service created above:

$> sqlplus sysman/$SYSMAN_PW@OEMPR11_DGMGRL

# Execute these queries to verify the Database name and service names:

SQL> select db_unique_name from v$database;

DB_UNIQUE_NAME
——————-
OEMPR11

SQL> show parameter service_names

NAME            TYPE     VALUE
————-   ——-  ————————————
service_names   string   OEMPR11, OEMPR11_DGMGRL

# Re-start the OMS once so that the connection string change is saved:

cd <OMS_HOME>/bin
./emctl stop oms -all    #on AdminServer MT
./emctl stop oms         #other MTs
./emctl start oms

When Your Oracle DataGuarded Database is Crashed (unintentionall)

It happens:  You have a 4-node DataGuarded Oracle 11gR2 database and your System Administrators need to take the boxes down for maintenance. But they don’t know that it’s protected under DataGuard, with Fast-start failover enabled (which automatically performs (via DG Observer) the switchover from Primary to Standby, switches roles, reconfigures listeners, and tries to keep everything 99.99995% available.)  So they use good old:

sqlplus “/ as sysdba”

SQL> shutdown immediate;

Database closed.

And now you need to start it back up.  But when your admins get to their familiar old prompt it burps out:

SQL> startup

Error: ORA-16825: multiple errors or warnings, including fast-start failover-related errors or warnings, detected for the database

Now there are a number of blogs which will walk you through the tedious steps of recovering this condition manually via DGMGRL and multiple control file and standby redo log restores to all the targets. But it was 1:00A local time and I didn’t want to spend the rest of the night crawling through these trying to get this beast back on its feet.

So, I returned to the never-ending exploratory world of what is contained in Oracle Enterprise Manager 12cR3 (12.1.0.3.0)  And (tada sound), OEM can handle it all for you relatively automatically.

To get your Primary DB back on-line (to avoid the complaints from the users who have been disconnected)

DGMGRL> connect SYS/<pwd>@<primaryDGDBtarget>

DGMGRL> disable fast_start failover force;

SQL> startup

Database opened.

Now that the Primary is back open and accessible, return to OEM OMS and visit:

Targets -> Database -> (your Primary DB)

Availability -> Data Guard Administration

There’s a text link at the bottom which reads

Additional Administration:

Verify Configuration

That’s your ticket back to green-arrows and healthy DG in 90% of the situations.  It will perform health-checks throughout your configuration, re-created standby redo logs, re-sychronizing disconnected standby databases, shipping archive logs whereever they are needed, repairing disrupted communications, restoring fast-start failover observers, and whatever else you have.

The other alternative is to simply Remove the existing failed Standby database (select, then [Remove] the existing standby database; then visit its host and delete the existing datafiles, redo, archive and tempfiles (you do not have to touch the TNSNAMES or Listener configurations – that will be re-created for you); then use [Add Standby Database] on the same screen to restore functionality (this is good as an if-all-else-fails, so this method of recovery).

Happy Guarding Data!