Tag Archives: OEM4APPS

Investigating an OEM 12c E-Business Suite Alert (AMS 12.1.0.x)

OEM 12 Home Page
OEM 12 Home Page

The Applications Management Suite plug-in for Oracle Enterprise Manager simplifies discovery of the myriad of subtargets that make up an Oracle E-Business Suite instance.  Correspondingly, the number of alerts sent out can rise dramatically because of the inter-relationships between the components.

For example, the outage of a single Apache process triggers all of the following associated targets also to flag as a service Down status:

INSTANCE-Oracle E-Business Suite
INSTANCE-Infrastructure INSTANCE_host-APPL_TOP Context
HTTP_Server

The number of downed targets increases if a subcomponent of a primary component (such as a single JVM thread under the OACore process) experiences an outage.

This is a simple walk-through of navigating one of the e-mail alerts to start figuring out what happened.

The e-mail alert looks like this:

From: OEM12 Burbank
Sent: Monday, December 07, 2015 8:30 AM
To: DBAs
Subject: EM Event: Fatal:INSTANCE-Oracle E-Business Suite – Target is down; 1 member is down: INSTANCE_EBS Availability System

Host=hostname
Target type=Oracle E-Business Suite
Target name=INSTANCE-Oracle E-Business Suite
Categories=Availability
Message=Target is down; 1 member is down: INSTANCE_EBS Availability System
Severity=Fatal
Event reported time=Dec 7, 2015 8:29:14 AM PST
Target Lifecycle Status=Production
Operating System=Linux
Platform=x86_64
Associated Incident Id=390885
Associated Incident Status=New
Associated Incident Owner=
Associated Incident Acknowledged By Owner=No
Associated Incident Priority=None
Associated Incident Escalation Level=0
Event Type=Target Availability
Event name=Status
Availability status=Down
Root Cause Analysis Status=Symptom
Rule Name=EBS Notifications,Rule_EBS_Notifications
(to get notified, you set up Rule Sets that tell OEM when and what to notify you about)

Rule Owner=DBA
Update Details:
Target is down; 1 member is down: INSTANCE_EBS Availability System
Incident created by rule (Name = Incident management rule set for all targets, Incident creation rule for a Target Down availability status [System generated rule]).


To investigate an event alert, click on the Associated Incident ID (e.g. the 390885 which on your system will be a URL taking you into OEM) which will take you to the associated Incident Summary page.

Click on Related Events to investigate what raised the event alert (there may be more than one cause):

ss1
OEM 12c AMS 12.1.0.4 – Incident Details

From the screen, it shows the red mark on PRODARMK-Infrastructure PRODARMK_ascopofinm01-APPL_TOP Context (Oracle E-Business Suite Node).

Click on that link in the list of Targets.

Navigate to Monitoring -> Status History:

ss2
OEM 12c AMS 12.1.0.4 – Navigation Target: Monitoring -> Status History

Change the Availability History view to All History (the related underlying event caused is displayed.)

ss3
OEM 12c AMS 12.1.0.4 – Target: Status History Details

If you click on the related Message (e.g. Target is down; 1 member is down: INSTANCE_hostname.auca.corp_oacore_JVM_…); you will then be shown the related Event page for that target:

ss4
OEM 12c AMS 12.1.0.4 – Target: Event Details

Click on the Related Events tab for this target, to confirm the service alert recorded:

ss5
OEM 12c AMS 12.1.0.4 – Target: Event Details -> Related Events Timeline

If this is a recurring issue, by sliding the timeline back and forth (and adjusting the period view to a larger sample) you can see if there are any associated time-related occurrences that can be used to identify root cause.

For the specific issue, login to the associated host, and view the output and error logs for the process itself to determine what triggered the alert (in this case, the JVM automatically restarted the OACore process that had run out of memory.)

R12 e-Business Suite and OEM Monitoring – Oracle Spins Freezes

Every so often, system load on an e-Business Suite instance ramps up and response time to users starts climbing, often resulting in user observed errors such as:

  • FRM-92100 Your connect to the server has been interrupted
  • FRM-92102 A network error has occurred

    FRM-92102 Forms Error R12 EBS
    That dreaded FRM-91201 / FRM-91200 error causing you to restart your session.

Or sometimes, the screen just freezes (aka spins, stops, is broken, stuck, motionless, looks like a screen saver,can’t do anything, won’t work, froze-up, etc.) and the person has to close their browser, or even shut-down their workstation and restart.

It's simply not doing anything - Nothing to see here, just move your cursor around. And wait... and wait.
It’s simply not doing anything – Nothing to see here, just move your cursor around. And wait… and wait.

Old technology often barks with unrelated error messages to the actual cause.  If there’s a lot going on with concurrent requests, or interfaces, or analytic extracts running, the front-end response-time slows down, sometimes sufficiently to trigger these kinds of Form errors, even though technically there was no interruption to the network connectivity, either between the hosts, nor the workstation and the middle-tier application server.

However, on the database, the user-experience can be seen, although not necessarily in the place you might expect.  OEM  had introduced it’s Adaptive Metric Thresholds technology back in OEM 11g (in a slightly different place than in 12c (in Oracle Management Server/OMS 12.1.0.4.0).  In OEM 11g, they were a link under the AWR Baseline Reports page.

OEM 11g AWR Baselines Page
See the Baseline Metric Thresholds link at the bottom.

In OEM 12c, you’ll find them under

Targets -> Database -> Peformance -> Adaptive Thresholds -> Baseline Metric Thresholds -> Edit Thresholds:

OEM 12c Baseline Metric Thresholds
Where those adaptive metric thresholds moved in 12c.

 

 

On this page and in the list of Baseline Metrics, when you click into them, you can access the trending statistics being gathered for each metric.  Many times this will provide direct insight into what a user experiences as the “the system is frozen” translates into “the back-end database response time is incredibly bad.”

OEM 12c Baseline Metric Response Time per Transaction vs. Baseline
See the spikes around 7AM and 11:30AM? Those are being associated with “System Froze” reports.

 

In the example here, the database experienced a dramatic slow-down in response almost 5 to 10 times slower than usual, which only lasted a few seconds. But that can be enough to show up in many users’ sessions who might have just kicked off a query, or were trying to save something.  Based upon the information gathered, we set the Warning and Critical thresholds to 1500ms and 2000ms respectively to start sending e-mail alert notifications upon breach of the levels. If the settings are left at “None”, no incident would be raised, and thus, no notification would be sent.

If you’re experiencing odd transient outages or sluggish behavior that defies the normal AWR and ADDM snapshot analysis, go take a look at what OEM has been gathering in the background over time and see if the statistics correlate to any of your issues.  There’s value in that data. Just mine it.

#OOW14 – @Oracle Enterprise Management for Applications SIG Mtg – 9/28 MW3003 3:30P-4:15P

http://em4apps.oaug.org/News.php

oow14_0928_web
OOW14 Sunday Schedule 28-SEP-2014

We’ll be at OpenWorld 2014 in San Francisco on User Group Sunday, September 28 (Session Time Mostone West – 3003; 15:30 – 16:15): SIG9177 – OAUG Enterprise Manager for Applications SIG Meeting

Click Here to Add to Outlook/iCalendar

LinkedIn Discussion Forum created by consensus of the attendees:

https://www.linkedin.com/groups/Oracle-Enterprise-Manager-Applications-OEM4APPS-6780481