Investigating an OEM 12c E-Business Suite Alert (AMS 12.1.0.x)

OEM 12 Home Page
OEM 12 Home Page

The Applications Management Suite plug-in for Oracle Enterprise Manager simplifies discovery of the myriad of subtargets that make up an Oracle E-Business Suite instance.  Correspondingly, the number of alerts sent out can rise dramatically because of the inter-relationships between the components.

For example, the outage of a single Apache process triggers all of the following associated targets also to flag as a service Down status:

INSTANCE-Oracle E-Business Suite
INSTANCE-Infrastructure INSTANCE_host-APPL_TOP Context
HTTP_Server

The number of downed targets increases if a subcomponent of a primary component (such as a single JVM thread under the OACore process) experiences an outage.

This is a simple walk-through of navigating one of the e-mail alerts to start figuring out what happened.

The e-mail alert looks like this:

From: OEM12 Burbank
Sent: Monday, December 07, 2015 8:30 AM
To: DBAs
Subject: EM Event: Fatal:INSTANCE-Oracle E-Business Suite – Target is down; 1 member is down: INSTANCE_EBS Availability System

Host=hostname
Target type=Oracle E-Business Suite
Target name=INSTANCE-Oracle E-Business Suite
Categories=Availability
Message=Target is down; 1 member is down: INSTANCE_EBS Availability System
Severity=Fatal
Event reported time=Dec 7, 2015 8:29:14 AM PST
Target Lifecycle Status=Production
Operating System=Linux
Platform=x86_64
Associated Incident Id=390885
Associated Incident Status=New
Associated Incident Owner=
Associated Incident Acknowledged By Owner=No
Associated Incident Priority=None
Associated Incident Escalation Level=0
Event Type=Target Availability
Event name=Status
Availability status=Down
Root Cause Analysis Status=Symptom
Rule Name=EBS Notifications,Rule_EBS_Notifications
(to get notified, you set up Rule Sets that tell OEM when and what to notify you about)

Rule Owner=DBA
Update Details:
Target is down; 1 member is down: INSTANCE_EBS Availability System
Incident created by rule (Name = Incident management rule set for all targets, Incident creation rule for a Target Down availability status [System generated rule]).


To investigate an event alert, click on the Associated Incident ID (e.g. the 390885 which on your system will be a URL taking you into OEM) which will take you to the associated Incident Summary page.

Click on Related Events to investigate what raised the event alert (there may be more than one cause):

ss1
OEM 12c AMS 12.1.0.4 – Incident Details

From the screen, it shows the red mark on PRODARMK-Infrastructure PRODARMK_ascopofinm01-APPL_TOP Context (Oracle E-Business Suite Node).

Click on that link in the list of Targets.

Navigate to Monitoring -> Status History:

ss2
OEM 12c AMS 12.1.0.4 – Navigation Target: Monitoring -> Status History

Change the Availability History view to All History (the related underlying event caused is displayed.)

ss3
OEM 12c AMS 12.1.0.4 – Target: Status History Details

If you click on the related Message (e.g. Target is down; 1 member is down: INSTANCE_hostname.auca.corp_oacore_JVM_…); you will then be shown the related Event page for that target:

ss4
OEM 12c AMS 12.1.0.4 – Target: Event Details

Click on the Related Events tab for this target, to confirm the service alert recorded:

ss5
OEM 12c AMS 12.1.0.4 – Target: Event Details -> Related Events Timeline

If this is a recurring issue, by sliding the timeline back and forth (and adjusting the period view to a larger sample) you can see if there are any associated time-related occurrences that can be used to identify root cause.

For the specific issue, login to the associated host, and view the output and error logs for the process itself to determine what triggered the alert (in this case, the JVM automatically restarted the OACore process that had run out of memory.)

Advertisements