Table Of Contents
Understanding Common Service Faults
RCA Correlation Tree
Common Failures
UC1 - VMware ESXi Host Failure - CUCM
UC2 - VMware ESXi Host Failure - CUCxn
UC3 - UCS Blade Failure - CUCM
UC4 - UCS Blade Failure - CUCxn
UC5 - Application Cold Failure - CUCM
UC6 - Application Cold Failure - CUCxn
UC7 - Changing the Number of Registered Gateways and Media Devices
UC8 - TFTP Server for UC Services - Critical Processes Failure
UC9 - Detecting and Correlating Customer Voice Quality Degradation
UC11 - VMware VM Failure - CUCM
UC12 - CUCM Clustering Problems
UC13 - Change in Number of Registered Phones
UC15 - CUCxn Critical Process Failure
UC16 - VMware VM Failure - CUCxn
UC17 - CUCxn Clustering Problems
UC18 - CUCM Critical Process Failure
UC19 - UCS Chassis Failure - CUCM
UC20 - UCS Chassis Failure - CUCxn
UC21 - Insufficient Virtual Memory
UC22 - CPU Utilization Problems
UC23 - Call Throttling Failures (Code Red)
UC24 - Call Throttling Failures (Code Yellow)
UC25 - Route List Exhausted
UC26 - Media List Exhausted
UC27 - High Resource Utilization by all Customer Sites
UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn
UC29 - Low Number Of Available Licenses - CUCxn
UC30 - VM Resources - Memory
UC31 - VM Resources - CPU
UC32 - VM Resources - Disk usage
UC33 - VM Resources - CPU ready time
UC34 - VM Resources - Disk latency
UC35 - ASR1K - Chassis Failure
UC36 - ASR1K - Power Supply/Fan Failure
UC37 - ASR1K - RP/ES/SPA Failure
UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk
UC39 - CUBE-SP Adjacency Status
UC40 - Voice Quality Degradation
UC41 - CUBE - SP Security Violation
UC42 - CUBE-SP Resource Performance Degradation
UC43 - CUBE-SP SLA Violation
CP1 - CUCMIP Critical Processes Failure
CP2 - Application Cold Failure - CUCMIP
CP3 - VMware VM Failure - CUCMIP
CP4 - CUCMIP VMware ESXi Host Failure
CP5 - CUCMIP UCS Blade Failure
CP6 - CUCMIP UCS Chassis Failure
CP7 - Application Resources Degradation - CUCMIP
CP11 - IM Resources Exceeded - CUCMIP
Understanding Common Service Faults
Service Impact Analysis (SIA) function is performed by Service Models defined within Prime Central for HCS-SV server. SV server however relies on event enrichment function which is performed primarily by CE server in conjunction with HCM-Fulfillment SDR. The following are the terms used in this chapter:
•
Service—Voice/VoiceMail/Presence are types of services deployed by HCS customers.
•
Impact—Service state could be Up/Down/Marginal. If the state is Marginal, it indicates that the service is at risk (for example, CUCM node goes down, phones register with standby node so service remains up but is considered at risk.) The Prime Central for HCS SIA takes the following into consideration:
–
The level of application-level redundancy deployed by customer and fault location. For example, customer-specific components as opposed to common components.
–
The initial service state. For example, the first VM failure could change the service state from UP to Marginal. Subsequent VM failure for same customer could change the service state from Marginal to Down if the initial problem is not addressed.
–
Service dependency. For example, VoiceMail and Presence services rely on Voice services provided by CUCM; so if Voice service goes Marginal/Down, it has the potential to affect VoiceMail and Presence services as well.
•
Scope—Scope defines the level of impact for a single fault which could affect single device user/ location/ customer/multiple customers using same blade or chassis/Data Center. Prime Central for HCS SIA is currently limited to customer level. Support for further info such as Location, User, Device level are under consideration for future releases.
Table A-1 lists a few typical faults with associated impact and scope.
Table A-1 Typical Faults in HCS System
Type of fault
|
Services affected
|
Impact
|
Scope
|
Notes
|
Process failure
|
Related service hosted by VM with process failure
|
Marginal/Down
|
Customer
|
Deploy application level redundancy
|
VM failure
|
Related service hosted by affected VM
|
Marginal/Down
|
Customer
|
Deploy application level redundancy/ VMware HA to minimize impact
|
Host failure
|
Related services hosted by all VMs deployed on host
|
Marginal/Down
|
Multiple customers
|
Deploy VMware HA/distribute Apps for a given customer across different hosts to minimize impact
|
Blade failure
|
Related services hosted by all VMs deployed on host
|
Marginal / Down
|
Multiple customers
|
Deploy VMware HA/distribute Apps for a given customer across different to minimize impact
|
Chassis failure
|
Related services hosted by all VMs deployed on Chassis
|
Marginal / Down
|
Multiple customers
|
Deploy VMware
HA cluster across multiple chassis
to minimize
impact
|
CPE router failure
|
All services on location
|
Marginal / Down
|
Customer Location
(in a future release)
|
Deploy SRST/ redundant connectivity options -SIA not currently supported by Prime Central for HCS
|
WAN connectivity failure
|
All services on location
|
Marginal/Down
|
Customer Location
(in a future release)
|
Deploy SRST/ redundant connectivity options - SIA not currently supported by Prime Central for HCS
|
CUBE failure
|
Offnet services
|
Marginal/Down
|
Multiple
Customers
|
Deploy CUBE HA - SIA support is a in Prime Central for HCS 9.2.1
|
RCA Correlation Tree
This section explains the generic RCA correlation tree that applies to the failure of the scenarios listed below:
For example, when a UCS Chassis failure occurs, the UCS Chassis failure event is marked as the root cause. UCS Blade failure events correlate to the UCS Chassis failure events, ESXi host failure events correlate to UCS Blade failure events and so on.
Note that it takes a few minutes for the correlation tree to converge. This is because the correlation tree is computed and updated as events arrive. For example, if the VM failure event is seen first, before the ESXi host failure events, then the VM failure events are first marked as a root cause. When the ESXi host events are seen later, the ESXi host events are marked as root causes and the VM failure events are remarked as symptoms.
Common Failures
This section documents the use cases (UC) of events observed in Prime Central for HCS during common service faults.
There are many ways to trigger the same event. The exact events Prime Central for HCS received may vary depending on how the fault is triggered and on the environment. The examples below illustrate how to use Prime Central for HCS to identify root cause and/or service impact events for specified faults triggered. This section contains the following topics:
•
UC1 - VMware ESXi Host Failure - CUCM
•
UC2 - VMware ESXi Host Failure - CUCxn
•
UC3 - UCS Blade Failure - CUCM
•
UC4 - UCS Blade Failure - CUCxn
•
UC5 - Application Cold Failure - CUCM
•
UC6 - Application Cold Failure - CUCxn
•
UC7 - Changing the Number of Registered Gateways and Media Devices
•
UC8 - TFTP Server for UC Services - Critical Processes Failure
•
UC8 - TFTP Server for UC Services - Critical Processes Failure
•
UC9 - Detecting and Correlating Customer Voice Quality Degradation
•
UC11 - VMware VM Failure - CUCM
•
UC12 - CUCM Clustering Problems
•
UC13 - Change in Number of Registered Phones
•
UC15 - CUCxn Critical Process Failure
•
UC16 - VMware VM Failure - CUCxn
•
UC17 - CUCxn Clustering Problems
•
UC18 - CUCM Critical Process Failure
•
UC19 - UCS Chassis Failure - CUCM
•
UC20 - UCS Chassis Failure - CUCxn
•
UC21 - Insufficient Virtual Memory
•
UC22 - CPU Utilization Problems
•
UC23 - Call Throttling Failures (Code Red)
•
UC24 - Call Throttling Failures (Code Yellow)
•
UC25 - Route List Exhausted
•
UC26 - Media List Exhausted
•
UC27 - High Resource Utilization by all Customer Sites
•
UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn
•
UC29 - Low Number Of Available Licenses - CUCxn
•
UC30 - VM Resources - Memory
•
UC31 - VM Resources - CPU
•
UC32 - VM Resources - Disk usage
•
UC33 - VM Resources - CPU ready time
•
UC34 - VM Resources - Disk latency
•
UC35 - ASR1K - Chassis Failure
•
UC36 - ASR1K - Power Supply/Fan Failure
•
UC37 - ASR1K - RP/ES/SPA Failure
•
UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk
•
UC39 - CUBE-SP Adjacency Status
•
UC40 - Voice Quality Degradation
•
UC41 - CUBE - SP Security Violation
•
UC42 - CUBE-SP Resource Performance Degradation
•
UC43 - CUBE-SP SLA Violation
•
CP1 - CUCMIP Critical Processes Failure
•
CP2 - Application Cold Failure - CUCMIP
•
CP3 - VMware VM Failure - CUCMIP
•
CP4 - CUCMIP VMware ESXi Host Failure
•
CP5 - CUCMIP UCS Blade Failure
•
CP6 - CUCMIP UCS Chassis Failure
•
CP7 - Application Resources Degradation - CUCMIP
•
CP11 - IM Resources Exceeded - CUCMIP
UC1 - VMware ESXi Host Failure - CUCM
This use case describes the events that Prime Central for HCS receives if the VMware ESXi Host fails. This type of failure generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the ESXi host is recovered.
Observed RC-EL Events
When the ESXi host shuts down, many synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, OM_CUCM_Redundancy, and OM_CUCM_Registration. Eventually, there is only one primary synthetic RCA event and VC_Host_Avlblty along with the following two events:
•
OM_CUCM_Registration, which is triggered when the VM moves to the new ESXi Host.
•
UCS_Bladelinks, which is a sibling event of VC_Host_Avlblty in the correlation tree.
Table A-2 Observed Root Cause Events for UC1
Severity
|
EventTypeID
|
Summary
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty groupevents from 10.11.3.152
|
Warning
|
OM_CUCM_Registration
|
Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C071-1
|
Major
|
UCS_BladeLinks
|
Synthetic Event for UCS_BladeLinks group events from 10.11.2.8
|
Observed SI-EL Events
CUCM voice service impacts voice mail and presence.
Table A-3 Observed Service Events for UC1
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal Observed Other Events Prime Central for HCS does not analyze these events, but they could point to potential
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.
Table A-4 Observed Other Events for UC 1
Severity (s)/Customer (C)/
Node (N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• C = C071
• N = CUCM-71-
pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCM-71-pub was restarted on 10.11.3.148 since 10.11.3.152 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.148:ESX ON 30522010
(Event_Type=VmRestartedOnAlternateHostEvent)\]
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet1060)
|
• S = Major
• N = 10.11.2.8
|
—
|
• Network Interface (ifIndex = 486548517)
• Down, should be Up (ifEntry.486548517)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet1059)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet9254)
|
• S = Major
• N = 10.11.2.8
|
—
|
• Network Interface (ifIndex = 486540323)
• Down, should be Up (ifEntry.486540323)
|
• S = Major
• N = 10.11.2.8
|
—
|
• Network Interface (ifIndex = 503317541)
• Administratively Down (ifEntry.503317541)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet9253)
|
• S = Minor
• N= 10.11.2.9
|
—
|
• Network Interface (ifIndex = 503317540)
• Administratively Down (ifEntry.503317540)
|
• S = Major
• N = 10.11.2.9
|
—
|
• Network Interface (ifIndex = 486540322)
• Down, should be Up (ifEntry.486540322)
|
• S = Major
• N = 10.11.2.9
|
—
|
• Network Interface (ifIndex = 486548516)
• Down, should be Up (ifEntry.486548516)
|
• S = Major
• N = 10.11.2.8
|
—
|
• Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.9
|
—
|
• Network Interface (ifIndex = 520355904)
• Down, should be Up (ifEntry.520355904)
|
• S = Major
• N = 10.11.2.9
|
—
|
• Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.8
|
—
|
• Network Interface (ifIndex = 520355904)
• Down, should be Up (ifEntry.520355904)
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-5 Observed Service Tree Events for UC1
Location
|
Summary
|
... -> Voice Service
|
Meta event for Voice Service - C071
|
...-> Cluster_Availability-->
Internode_Trunks
|
SDL Link Out Of Service::Component=192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < >;
|
...
-> CUCM-71-pub
|
PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error Message String= 27-Jun-2012 16:23:59 EDT,cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < >;
|
...
-> CUCM-71-pub
|
"DeviceRestarted::Component= cucm-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >; "
|
...
-> CUCM-71-sub
|
ServiceDown::Component= VS-cucm-71- sub.customer.com/Cisco DRF Local; ProductName= Cisco DRF Local; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < >;
|
...
-> CUCM-71-pub --> VM Resources
|
The virtual machine guest memory usage is high on CUCM-71-pub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>40) ON VM:cisco-10.11.3.148:ESX ON CUCM-71- pub (Guest_Util=75)]
|
...
-> Call Control-->
Registration
|
Number Of Registered MediaDevices Increased::Component= VE-CUCM-CLC071- 1-RTMTSyslog-Id#1340828781555; Detail= Number of registered Media Devices increased in consecutive polls. Current monitored precanned object has increased by 3 The alert is generated on Wed Jun 27 16:26:22 EDT 2012 on cluster CUCM-CL-C071-1.] ClusterID=: RTMT Alert; Default Event Name= Number Of Registered MediaDevices Increased; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=NumberOfRegisteredMediaDevicesIncreased >;
|
Next Steps
Step 1
The VM on the host is automatically brought up in another host through HA.
Step 2
The original host is brought back using the following steps:
a.
UCS Manager > Service Profiles > root > 862-10-c5b2 > Boot Server
b.
Troubleshoot and resolve the ESXi Host issue.
c.
Drag and drop CUCM-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).
d.
Clear any alarms on the CUCM VM.
UC2 - VMware ESXi Host Failure - CUCxn
This use case describes the events that Prime Central for HCS receives if the VMware ESXi host fails. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCxn nodes stay down until the ESXi Host is recovered.
Observed RC-EL Events
When the ESXi host shuts down, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, and OM_CUCxn_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of VC_Host_Avlblty.
Table A-6 Observed Root Cause Events for UC2
Severity
|
EventTypeID
|
Summary
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty group events from 10.11.3.152
|
Major
|
UCS_BladeLinks
|
Synthetic Event for UCS_BladeLinks group events from 10.11.2.8
|
Observed SI-EL Events
CUCxn voice mail service is impacted.
Table A-7 Observed Service Events for UC2
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.
Note
10.11.2.8 and 10.11.2.9 are the IP addresses of UCS6140 side A and UCS6140 side B, respectively.
.
Table A-8 Observed Other Events for UC2
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• C = C071
• N = CUCxn-71-pub
|
• EN =KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCxn-71-pub
was restarted on 10.11.3.141 since
10.11.3.152 failed. Message:
KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX
ON 31536476
(Event_Type=VmRestartedOnAlternateHostEvent)]
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet1060)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex =
486540323) Down, should be Up
(ifEntry.486540323)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet1059)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet9254)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet9253)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)
|
Service Tree Event Overlay Location and Content
SIA events are overlain on the Service Tree in the Service Availability view.
Table A-9 Observed Service Tree Events for UC2
Location
|
Summary
|
...
-> CUCxn-71-pub
|
PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 06- Jul-2012 10:20:43 EDT,cucxn-71- pub.customer.com,192.6.4.125,Cannot collect data. The device returned no data from a required
MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
Next Steps
Step 1
The VM on the host is automatically brought up in another host through HA.
Step 2
The original host is brought back via following steps:
a.
Select UCS Manager > Service Profiles > root > 862-10-c5b2 > Boot Server.
b.
Troubleshoot and resolve the ESXi Host issue.
c.
Drag and drop CUCxn-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).
d.
Clear any alarms on the CUCxn VM.
UC3 - UCS Blade Failure - CUCM
This fault generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the UCS blade is replaced.
Observed RC-EL Events
When the UCS blade fails, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_Blade_Avlblty, UCS_BladeLinks, and OM_CUCM_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of UCS_Blade_Avlblty.
Table A-10 Observed Root Cause Events for UC3
Severity
|
EventTypeID
|
Summary
|
Critical
|
UCS_Blade_Avlblty
|
Synthetic Event for UCS_Blade_Avlblty group events from 10.11.2.10
|
Observed SI-EL Events
CUCM voice service impacts voice mail and presence.
Table A-11 Service Events observed for UC3 - UCS Blade Failure - CUCM
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Other Events Observed:
These events are not currently being analyzed by Prime Central for HCS but could point to potential root causes for the impacted services.
Note
10.11.2.8, 10.11.2.9, and 10.11.2.10 are the IP address of UCS6140 side A, UCS6140 side B, and UCSM, respectively.
Table A-12 Other Events Observed for UC3 - UCS Blade Failure - CUCM
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• C = C071
• N = CUCM-71-pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCM-71-pub was restarted on 10.11.3.141 since 10.11.3.152 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco \[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX ON 31551556 (Event_Type=VmRestartedOnAlternateHostEvent)\]
|
• S = Indeterminate
• N = 10.11.2.10
|
• EN = fltEquipmentFanPerfThresholdNonCritical
• ET = default
|
Fan 2 in Fan Module
3/1-1 speed: upper-noncritical(
FaultCode:fltEquipmentFanPerfThresholdNonCritical
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet1060)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet1059)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 486548517) Down, should be Up (ifEntry.486548517)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet9253)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet9254)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (Ethernet5/1/2)
|
Service Tree Event Overlay Location and Content
SIA events are overlain on the Service Tree in the Service Availability view.
Table A-13 Service Tree events Observed for UC3 - UCS Blade Failure - CUCM
Location
|
Summary
|
...
-> CUCM-71-pub
|
The virtual machine CUCM-71- pub running on host 10.11.3.152 is Disconnected. Message:
KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.152:ESX ON 31551544 (Event_Type=VmDisconnectedEvent)]
|
CUCxn-CL-C071-1 -> Voice Service
|
Meta event for Voice Service - C071
|
... -> CUCM-71-pub
|
PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error
Message String= 06-Jul-2012 14:47:59 EDT,
cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...
-> Cluster_Availability-->
Internode_Trunks
|
SDL Link Out Of Service::Component= 192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;
|
Next Steps:
Step 1
The VM on the host is automatically brought up in another host via HA.
Step 2
The original host is brought back via following steps:
a.
Troubleshoot and resolve the blade issue.
b.
ESXi Host (10.11.3.152) > Reconfigure for VMware HA.
c.
Drag and drop CUCM-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).
d.
Clear any alarms on the CUCM VM.
UC4 - UCS Blade Failure - CUCxn
This use case describes the events that Prime Central for HCS receives if a UCS blade fails. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the UCS blade is replaced.
Observed RC-EL Events
When the UCS blade fails, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_Blade_Avlblty, UCS_BladeLinks, and OM_CUCxn_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of UCS_Blade_Avlblty.
Table A-14 Observed Root Cause Events for UC4
Severity
|
EventTypeID
|
Summary
|
Critical
|
UCS_Blade_Avlblty
|
Synthetic Event for UCS_Blade_Avlblty group events from 10.11.2.10
|
Observed SI-EL Events
CUCxn voice mail service is impacted.
Table A-15 Observed Service Events for UC4
Severity
|
Summary
|
Critical
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal. (Flapping Threshold Exceeded: 5 >= 5, over the last 300 s(2012-07-06 13:56:16.000- >2012-07-06 14:01:16.000))
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.
Note
10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP.
Table A-16 Observed Other Events for UC4
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• C = C071
• N = CUCM-71-pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCxn-71-pub was restarted on 10.11.3.141 since 10.11.3.152 failed. Message:
KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX ON 31548751
(Event_Type=VmRestartedOnAlternateHostEvent)]
|
• S = Indeterminate
• N = 10.11.2.10
|
• EN = fltAdaptorUnitAdaptorReachability
• ET = default
|
Adapter 5/2/1 is unreachable
(FaultCode:fltAdaptorUnitAdaptorReachability,FaultIndex)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 486548517) Down, should be Up (ifEntry.486548517)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet9254)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet1059)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (vethernet1060)
|
• S = Major
• N = 10.11.2.9
|
—
|
Link Down (vethernet9253)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex =
520355904) Down, should be Up
(ifEntry.520355904)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (Ethernet5/1/2)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)
|
• S = Major
• N = 10.11.2.9
|
—
|
Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)
|
• S = Major
• N = 10.11.2.8
|
—
|
Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)
|
• S = Major
• N = 10.11.2.8
|
—
|
Link Down (Ethernet5/1/2)
|
Service Tree Event Overlay Location and Content
SIA events are overlain on the Service Tree in the Service Availability view.
Table A-17 Observed Service Tree events for UC4
Location
|
Summary
|
...
-> CUCxn-71-pub
|
PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 06- Jul-2012 13:59:59 EDT,cucxn-71- pub.customer.com,192.6.4.125, Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> CUCxn-71-pub
|
Unresponsive::Component= cucxn-71-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 4096 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-22-2012 11:32:41; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.125; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 07-05-2012 18:07:05; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive>;
|
Next Steps
Step 1
The VM on the host is automatically brought up in another host via HA.
Step 2
The original host is brought back via following steps:
a.
Troubleshoot and resolve the blade issue.
b.
ESXi Host (10.11.3.152) > Reconfigure for VMware HA.
c.
Drag and drop CUCxn the host that it moved to back to ESXi Host (10.11.3.152).
d.
Clear any alarms on the CUCxn VM.
UC5 - Application Cold Failure - CUCM
This use case describes the events that Prime Central for HCS receives if a CUCM server restarts. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.
Observed RC-EL Events
When the CUCM server restarts, numerous synthetic RCA events are observed, including OM_CUCM_Processes, OM_CUCM_TFTP_Processes, OM_CUCM_Endpt_Connectivity, and OM_CUCM_OM_Connectivity.
Eventually, there is only one primary synthetic RCA event of OM_CUCM_NodeRestart.
Table A-18 Observed Root Cause Events for UC5
Severity
|
EventTypeID
|
Summary
|
Warning
|
OM_CUCM_NodeRestart
|
DeviceRestarted::Component= cucm-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >;
|
Observed SI-EL Events
CUCM voice service impacts voice mail and presence.
Table A-19 Observed Service Events for UC5
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlain on the Service Tree in the Service Availability view.
Table A-20 Service Tree events Observed for UC5 - Application Cold Failure - CUCM
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
...
-> CUCM-71-pub
|
PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error Message String= 16- Jul-2012 16:19:54 EDT,cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
... -> Cluster_Availability--> Internode_Trunks
|
SDL Link Out Of Service::Component= 192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;
|
Next Steps
•
A system restart automatically recovers.
•
The event with the EventTypeId of OM_CUCM_NodeRestart will automatically clear in 60 minutes.
UC6 - Application Cold Failure - CUCxn
This use case describes the events that Prime Central for HCS receives if a CUCxn server restarts. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.
Observed RC-EL Events
When the CUCxn server restarts, a synthetic RCA event of OM_CUCxn_OM_Connectivity is observed.
Table A-21 Observed Root Cause Events for UC6
Severity
|
EventTypeID
|
Summary
|
Critical
|
OM_CUCxn_OM_Connectivity
|
Synthetic Event for OM_CUCxn_OM_Connectivity group events from cucxn-71-pub.customer.com
|
Observed SI-EL Events
CUCxn voice mail service is impacted.
Table A-22 Observed Service Events for UC6
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.
Table A-23 Observed Other Events for UC6
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• C = C071
• N = cucxn-71-pub.customer.com
|
• EN = AutoFailbackSucceeded
• ET = default
|
AutoFailbackSucceeded::Component= 192.6.4.125-null; Detail= %1 : PEER_REBOOT
|
• S = Warning
• C = C071
• N = cucxn-71-pub.customer.com
|
• EN = DeviceRestarted
• ET = default
• CUST_C071_CLS_CUCXN_CUCxn-CLC071-
1
|
DeviceRestarted::Component= cucxn-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >;
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-24 Observed Service Tree events for UC7
Location
|
Summary
|
... -> CUCM-71-pub
|
PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 17- Jul-2012 08:51:50 EDT,cucxn-71- pub.customer.com,192.6.4.125,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
Next Steps
•
A system restart automatically recovers.
•
The event with the EventTypeId of OM_CUCxn_OM_Connectivity will automatically when the issue clears on the server.
•
Should there be a different type of OS failure, other recovery steps would be required.
UC7 - Changing the Number of Registered Gateways and Media Devices
This use case describes the events that the Prime Central for HCS dashboard displays if the number of registered gateways and media devices changes in the CUCM cluster. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.
Observed RC-EL Events
Decreasing the number of registered gateways or media devices generates synthetic RCA events for OM_CUCM_Registration and OM_CUCM_Endpt_Connectivity. When the media device registers the event, OM_CUCM_Endpt_Connectivity is cleared. The raw events for OM_CUCM_Registration are Number Of Registered MediaDevices Decreased and Number Of Registered MediaDevices Increased.
Table A-25 Observed Root Cause Events for UC7
Severity
|
EventTypeID
|
Summary
|
Warning
|
OM_CUCM_Registration
|
Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C070-1
|
Critical
|
OM_CUCM_Endpt_Connectivity
|
Synthetic Event for OM_CUCM_Endpt_Connectivity group events from CUCM-CL-C070-1
|
Observed SI-EL Events
Changing in number of registered gateways and media devices may not impact voice mail and presence services, but Prime Central for HCS by default shows impact on voice mail and presence Services if voice service is impaired.
Table A-26 Observed Service Events for UC7
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they point to potential root causes for impacted services. Currently, cluster-level events do not participate in RCA correlation. The raw event mapped to the cluster level EventTypeId OM_CUCM_Endpt_Connectivity is marked as unknown and does not participate in any RCA and SIA. Therefore, OM_CUCM_Endpt_Connectivity raw event shows up in the Other-EL field. The raw event mapped to the cluster level EventTypeId OM_CUCM_Registration is marked as unknown and does not participate in RCA. But OM_CUCM_Registration does participate in SIA. Therefore, OM_CUCM_Registration raw event does not show up in the Other-EL field.
Table A-27 Observed Other Events for UC7
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Critical
• N = CUCM-CL-C070-1
|
• EN = EndPointLostContact
• ET = OM_CUCM_Endpt_Connectivity
|
EndPointLostContact::Component= CUCM-CL-C070-1-CFB_2; EndPoint Name= CFB_2; EndPoint IPAddress= 200.1.1.11; EndPoint Status= UnRegistered; EndPoint Type= Conference Bridge; Device Pool= Default; CUCM Node= 192.6.4.116; Timestamp= 2012-07-18 13:36:50.326; Default Event Name= EndPointLostContact; DescriptionURL= <
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-28 Observed Service Tree Events for UC7
Location
|
Summary
|
...->Call Control--> Registration
|
Number Of Registered MediaDevices Decreased::Component= VECUCM- CL-C070-1-RTMTSyslog- [Id#1342633009817]; Detail= Number of registered Media Devices decreased between consecutive polls. Current monitored precanned object has decreased by 1 The alert is generated on Wed Jul 18 13:37:19 EDT 2012 on cluster CUCM-CL-C070-1.][App ID=Cisco AMC Service][Cluster ID=][Node ID=CUCM-70-pub]: RTMT Alert; Default Event Name= Number Of Registered MediaDevices Decreased; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=NumberOfRegisteredMediaDevicesDecreased] >;
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
Next Steps
Step 1
Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click Raw event > Event Details > Next Steps to display the following recommendation:
Go to CUCM to verify the registration status of the reported end point. Verify that
IPconnectivity exists between the cluster and the endpoint.
Step 3
To clear this event, go to the CUCM Administration page > Service Parameter screen and set the Run Flag to True for the conference bridge.
UC8 - TFTP Server for UC Services - Critical Processes Failure
This use case describes the events that the Prime Central for HCS dashboard displays if critical processes such as TFTP service fail. In the UC environment, TFTP service is essential for new UC endpoints, which use TFTP to download code and register with CUCM servers.
Observed RC-EL Events
The TFTP process running on the CUCM Publisher system is forced to stop running.
Table A-29 Observed Root Cause Events for UC8
Severity
|
Summary
|
Critical
|
Synthetic Event for OM_CUCM_TFTP_Processes group events from cucm-81-pub.customer.com
|
Observed SI-EL Events
TFTP Service failure will not impact overall Voice service including voice mail and presence services. It affect only new endpoint which get stranded because unable to download the image for it to work.
Table A-30 Observed Service Events for UC8
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C081-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C081-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C081-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-31 Observed Service Tree Events for UC8
Location
|
Summary
|
...->TFTP_App
|
ServiceDown::Component= VScucm- 81-pub.customer.com/ Cisco Tftp; ProductName=
Cisco Tftp; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceDown >;
|
...-> Voice Service
|
Meta event for Voice Service - C081
|
Next Steps
Step 1
Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click Raw event > Event Details > Next Steps to display the following recommendation:
Identify which services are not running. You can start the service manually from the
Administrator Service Control page. To disable monitoring for a specific service, go to
the Detailed Device View of the device, select the specific service, and change the
managed state to False.
Step 3
Check whether there are any core and service trace files. If they are available, then download them.
UC9 - Detecting and Correlating Customer Voice Quality Degradation
This use case describes what events dash board of Prime Central for HCS will show if voice quality degradation is detected using aggregated quality event generation per cluster. This type of incident generates only Service Impact (SI) events.
Observed RC-EL Events
None.
Observed SI-EL Events
Table A-32 Observed Service Events for UC9
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-33 Observed Service Tree Events for UC9
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C072
|
...-> VoiceQuality
|
ServiceQualityThresholdCrossed::Component= Device Pool:devicepool3449; Source= Cisco Unified Operations Manager; Impacted Endpoints at the time event was raised= 1; Threshold Percentage at the time event was raised= 10.0; Registered Phone Count at the time event was raised= 1; Default Event Name= ServiceQualityThresholdCrossed; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceQualityThresholdCrossed >;
|
Next Steps
Step 1
Go to CUOM/SM and generate a call quality report.
Step 2
Check network for possible delay/jitter issues.
UC11 - VMware VM Failure - CUCM
This use case describes the events that the Prime Central for HCS dashboard displays if a VM fails abruptly. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.
Observed RC-EL Events
When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, OM_CUCM_NodeRestart, OM_CUCM_Redundancy, and OM_CUCM_Endpt_Connectivity. The CUCM-C081-pub node generates an OM_CUCM_Redundancy event. This event should be treated as the root cause event for the CUCM publisher node because correlation between publisher and subscriber nodes (sibling correlation) is not currently supported.
Table A-34 Observed Root Cause Events for UC11
Severity
|
EventTypeID
|
Summary
|
Critical
|
VC_VM_Avlblty
|
Synthetic Event for VC_VM_Avlblty groupevents from CUCM-81-sub2
|
Critical
|
OM_CUCM_Endpt_Connectivity
|
Synthetic Event for OM_CUCM_Endpt_Connectivity group events from CUCM-CL-C081-1
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-tftp.customer.com
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-pub.customer.com
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-sub1.customer.com
|
Observed SI-EL Events
Table A-35 Observed Service Events for UC11
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C081-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C081-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C081-1 is Marginal.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.
Table A-36 Observed Other Events for UC11
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Critical
• C = C081
• N = CUCM-CLC081-
1
|
• EN = DBReplicationFailure
• ET = OM_CUCM_BackupRestore
|
DBReplicationFailure::Component= VECUCM- CL-C081-1; CallManagerList= 192.6.4.195,192.6.4.202,192.6.4.197,192.6.4.196; ReplicationStatus= Replication is bad in the cluster; CustomerName= C081; Default Event Name= DBReplicationFailure; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DBReplicationFailure] >;
|
• S = Critical
• C = C081
• N = CUCM-CLC081-
1
|
• EN = EndPointLostContact
• ET = OM_CUCM_Endpt_Connectivity
|
EndPointLostContact::Component= CUCM-CL-C081-1-MTP_6; EndPoint Name= MTP_6; EndPoint IPAddress= 200.1.1.17; EndPoint Status= UnRegistered; EndPoint Type= Media Termination Point; Device Pool= Default; CUCM Node= 192.6.4.195; Timestamp= 2012-06-25 17:38:06.634; Default Event Name= EndPointLostContact; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=EndPointLostContact] >;
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-tftp.customer.com
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-pub.customer.com
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-sub1.customer.com
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. Table 19-36 shows service tree events observed during testing.
Table A-37 Observed Service Tree Events for UC11
Location
|
Summary
|
...-> Cluster_Availability -->
Internode_Trunks
|
SDL Link Out Of Service::Component= 192.6.4.195-192.6.4.197; Local Application ID= CCM; Remote Node ID= 5; Unique Link ID= 1:100:5:100; Remote Application IP Address= 192.6.4.197; Local Node ID= 1; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService] >;
|
...-> CUCM-81-sub2 --> VM Availability
|
The virtual machine CUCM-81-sub2 running on 10.11.3.147 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.147:ESX ON 30374781 (Event_Type=VmPoweredOffEvent)]
|
...-> CUCM-81-pub --> VM Resources
|
Alarm ''Virtual Machine Disk Latency High'' on CUCM-81-sub2 changed from Green to Gray. Message: KVM_VM_Disk_Latency[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"Virtual*Machine*Disk*Latency" ON VM:cisco-10.11.3.147:ESX ON 30374777 (Event_Type=AlarmStatusChangedEvent Event_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-81-sub2 changed from Green to Gray)]
|
...-> Cluster_Availability --> Sub: CUCM-81-sub2
|
Unresponsive::Component= cucm-81-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-22-2012 16:43:59; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.197; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-24-2012 18:06:21; Default Event Name= Unresponsive; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/
com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive] >;
|
...-> Cluster_Availability --> Sub: CUCM-81-sub2
|
PerformancePollingStopped::Component= cucm-81-sub2.customer.com; Error Message String= 25- Jun-2012 17:39:58 EDT,cucm-81- sub2.customer.com,192.6.4.197,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped] >;
|
...-> Voice Service
|
Meta event for Voice Service - C081
|
Next Steps
Step 1
Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Go to CUCM to verify the registration status of the reported endpoint.
Step 3
Verify whether IP connectivity exists between the cluster and endpoints.
UC12 - CUCM Clustering Problems
This use case describes the events that the Prime Central for HCS dashboard displays for CUCM clustering issues, such as a server running a different version of software and database replication issues in the cluster. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Such problems impair CUCM cluster performance as call processing nodes, so immediate attention is needed to fix these issues.
Observed RC-EL Events
When CUCM Publisher is brought up with old version of software when the Subscriber nodes run a newer software version, many synthetic RCA events will be noticed, including OM_CUCM_Processes, OM_CUCM_NodeRestart, and OM_CUCM_Redundancy as follows.
Table A-38 Observed Root Cause Events for UC12
Severity
|
EventTypeId
|
Summary
|
Critical
|
OM_CUCM_Processes
|
Synthetic Event for OM_CUCM_Processes group events from cucm-70-sub.customer.com.
|
Warning
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from CUCM-CL-C070-1
|
Warning
|
OM_CUCM_Registration
|
Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C070-1
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-70-pub.customer.com
|
Warning
|
OM_CUCM_NodeRestart
|
Synthetic Event for OM_CUCM_NodeRestart group events from cucm-70-pub.customer.com
|
Observed SI-EL Events
CUCM voice service impacted voice mail and presence. Table 19-38 shows Service Events observed during testing
Table A-39 Observed Service Events for UC12
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.
Table A-40 Observed Other Events for UC12
Severity (s)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Critical
• C = C070
• N = CUCM-CLC070-
1
|
• EN = DBReplicationFailure
• ET = OM_CUCM_BackupRestore
|
DBReplicationFailure::Component= VECUCM- CL-C070-1; CallManagerList= 192.6.4.116,192.6.4.117; ReplicationStatus= Replication is bad in the cluster; CustomerName= C070; Default Event Name= DBReplicationFailure; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DBReplicationFailure] >;
|
• S = Warning
• C = C070
• N = CUCM-CLC070-
1
|
• EN = SystemVersionMismatched
• ET = OM_CUCM_Redundancy
|
SystemVersionMismatched::Component= VE-CUCM-CL-C070-1; NodeVersionInformation= cucm-70- pub.customer.com(8.6.2.20000-2),cucm-70- sub.customer.com(8.6.2.21900-5); CustomerName= C070; Default Event Name= SystemVersionMismatched; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=SystemVersionMismatched] >;
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-41 Observed Service Tree events for UC12
Location
|
Summary
|
...-> Cluster_Availability -->
Internode_Trunks
|
SDL Link Out Of Service::Component= 192.6.4.117-192.6.4.116; Local
Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.116; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link
Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;
|
...-> Cluster_Availability -->
Internode_Trunks
|
SystemVersionMismatched::Component= VE-CUCM-CL-C070-1; NodeVersionInformation= cucm-70- pub.customer.com(8.6.2.20000-2),cucm-70- sub.customer.com(8.6.2.21900-5); CustomerName= C070; Default Event Name= SystemVersionMismatched; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SystemVersionMismatched >;
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
...
Next Steps
Step 1
Right-click the Synthetic RC event> Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Investigate why the remote Communications Manager is not running or whether a network
problem exists.
Step 3
Correct the version issue by switching back to the original version.
UC13 - Change in Number of Registered Phones
This use case describes the events that the Prime Central for HCS dashboard displays if the number of registered phones in the cluster drops more than a configured percentage between consecutive polls. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Observed RC-EL Events
When the number of registered phones decreases, only one synthetic RCA event, OM_CUCM_Registration, is observed.
Table A-42 Observed Root Cause Events for UC13
Location
|
EventTypeID
|
Summary
|
Warning
|
OM_CUCM_Registration
|
Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C072-1.
|
Observed SI-EL Events
CUCM voice service impacts presence and voice mail. Table 19-41 shows the Service Events observed `during testing.
Table A-43 Observed Service Events for UC13
Location
|
Summary
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.
|
Minor
|
CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1 Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Minor
|
CUST_C072_CLS_CUCM_CUCM-CLC072- Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-44 Observed Service Tree Events for UC13
Location
|
Summary
|
...-> Call Control --> Registration
|
Number Of Registered Phones Dropped::Component= VECUCM- CL-C072-1-RTMTSyslog- [Id#1342640053391]; Detail= Number of registered phones in the cluster drop more than configured percentage between consecutive polls. Configured high threshold is 30%.
|
...-> Call Control --> Registration
|
PhoneUnregThresholdExceeded::Component= Device Pool:devicepool3449; Unreg Count= 1;Total Count= 1;Threshold In %= 10.0;ClusterName= CUCM-CL-C072-1;Device Pool= devicepool3449;Default Event Name= PhoneUnregThresholdExceeded; DescriptionURL= <
|
...-> Voice Service
|
Meta event for Voice Service - C072
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Phone registration status must be monitored for sudden changes. If the registration status
changes slightly and readjusts quickly over a short time frame, it could indicate a phone
move, addition, or change. A sudden smaller drop in the phone registration counter could
indicate a localized outage; for instance, an access switch or a WAN circuit outage or
malfunction.A significant drop in registered phone level requires immediate attention from
the administrator.
Step 3
Register the phones to clear the event.
UC15 - CUCxn Critical Process Failure
This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCxn. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Observed RC-EL Events
If a critical process is killed in voice mail service (CUCxn), only one synthetic RCA event, OM_CUCxn_processes, is generated.
Table A-45 Observed Root Cause Events for UC15
Severity
|
EventTypeID
|
Summary
|
Critical
|
OM_CUCxn_Processes
|
Synthetic Event for OM_CUCxn_Processes group events from cucxn-72-pub.customer.com
|
Observed SI-EL Events
Critical process failures impact voice mail service.
Table A-46 Observed Service Events for UC15
Severity
|
EventTypeID
|
Summary
|
Minor
|
CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-47 Observed Service Tree Events for UC15
Severity
|
Summary
|
Cluster Availability
->Pub-CUCxn-72-pub
|
ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Conversation Manager; ProductName= Connection Conversation Manager; CurrentState= Stopped; Default Event Name= ServiceDown;
|
Next Steps
Step 1
Right-click the Synthetic RC event> Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details> Next Steps to display the following recommendation:
Identify which services are not running. You can start the service manually from the
Administrator Service Control page. To disable monitoring for a specific service, go to
the device's Detailed Device View, select the specific service, and change the managed
state to False. Check to see if there are any core files. Download the core files, if any,
as well as service trace files. Events are removed for Unified CM only. You may need to
manually clear these Unified CM events after your upgrade is complete.
Step 3
Type the IP address in CUCxn and select Cisco Unity connection service availability > Tools > Service management > Connection Conversation Manager to start the service.
UC16 - VMware VM Failure - CUCxn
This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCxn fails abruptly. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Observed RC-EL Events
When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, OM_CUCM_NodeRestart, and OM_CUCM_OM_Connectivity. Eventually, Prime Central for HCS stabilizes to one root cause, VC_VM_Avlblty.
Table A-48 Observed Root Cause Events for UC16
Severity
|
EventTypeID
|
Summary
|
Critical
|
VC_VM_Avlblty
|
Synthetic Event for VC_VM_Avlblty group events from CUCxn-72-pub.
|
Observed SI-EL Events
VM failure impacts voice mail service.
Table A-49 Observed Service Events for UC16
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Observed Other-EL
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-50 Observed Service Tree Events for UC16
Location
|
Summary
|
...-> Cluster_Availability -->
PUB:CUCXn-72-pub
|
PerformancePollingStopped::Component= cucxn-72-pub.customer.com; Error Message String= 06- Jul-2012 11:08:43 EDT,cucxn-72- pub.customer.com,192.6.4.132,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped] >;
|
...-> Cluster_Availability -->
PUB:CUCXn-72-pub
|
Unresponsive::Component= cucxn-72-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 4096 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt=
06-22-2012 11:32:48; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.132; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 07-05-2012 18:07:17; Default Event Name=
Unresponsive; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive] >;
|
...-> CUCxn-72-pub--> VM Availability
|
The virtual machine CUCxn-72-pub running on 10.11.3.148 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.148:ESX ON 31539282 (Event_Type=VmPoweredOffEvent)]
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Check if the device is reachable from Unified Operations Manager.
UC17 - CUCxn Clustering Problems
This use case describes the events that the Prime Central for HCS dashboard displays for CUCxn clustering issues, such as a server running a different version of software and database replication issues in the cluster. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Observed RC-EL Events
When there is a mismatch in CUCxn publisher and subscriber software versions, three Synthetic RCA events, OM_CUCxn_Redundancy, OM_CUCxn_Processes, and OM_CUCxn_Processes are observed.
Table A-51 Observed Root Cause Events for UC17
Severity
|
EventTypeID
|
Summary
|
Critical
|
OM_CUCxn_Redundancy
|
Synthetic Event for OM_CUCxn_Redundancy group events from cucxn-72-pub.customer.com
|
Critical
|
OM_CUCxn_Processes
|
Synthetic Event for OM_CUCxn_Processes group events from cucxn-72-sub.customer.com
|
Critical
|
OM_CUCxn_Processes
|
ServiceDown::Component= VS-cucxn-72- sub.customer.com/Connection Inbox RSS Feed; ProductName= Connection Inbox RSS Feed; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= <
|
Observed SI-EL Events
Version mismatch impacts Voice mail service.
Table A-52 Observed Service Events for UC17
Severity
|
Summary
|
Minor
|
CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1 Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-53 Observed Service Tree Events for UC17
Severity
|
Summary
|
...->Cluster_Availability --
>Internode_Trunks
|
NoConnectionToPeer::Component= 192.6.4.132-RTMTSyslog; Detail=
%1 : cucxn-72-pub.customer.com AppID : CuSrm ClusterID : NodeID : ;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Voice Mail Web Service; ProductName= Connection;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Inbox RSS Feed; ProductName= Connection Inbox ;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Serviceability; ProductName= Connection ;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Administration; ProductName= Connection ;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Administration; ProductName= Connection ;
|
cucxn-72-pub.customer.com
|
ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection SNMP Agent; ProductName= Connection SNMP ;
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Make sure that the secondary server is active and connected to primary.
Step 3
Correct the version issue by switching back to the original version.
UC18 - CUCM Critical Process Failure
This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCM. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.
Observed RC-EL Events
When a critical process is killed, two Synthetic RCA events, OM_CUCM_Processes, and OM_CUCM_Redundancy are observed.
Table A-54 Observed Root Cause Events for UC18
Severity
|
EventTypeID
|
Summary
|
Critical
|
OM_CUCM_Processes
|
Synthetic Event for OM_CUCM_Processes group events from cucm-72-pub.customer.com
|
Critical
|
OM_CUCM_Redundancy
|
Synthetic Event for OM_CUCM_Redundancy group events from cucm-72-sub.customer.com
|
Observed SI-EL Events
CUCM voice service impacts presence and voice mail.
Table A-55 Observed Service Events for UC18
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-56 Observed Service Tree Events for UC18
Severity
|
Summary
|
...->Cluster_availability --
>PUB:CUCM-72-pub
|
ServiceDown::Component= CCM-cucm-72- pub.customer.com/1; CallManagerName= 192.6.4.130; CallManagerStatus= Stopped; Default Event Name= ServiceDown; DescriptionURL= < ;
|
...->Cluster_availability --
>Internode_Trunks
|
SDL Link Out Of Service::Component= 192.6.4.131-192.6.4.130; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.130; Local Node ID= 2; Remote;
|
Cust_C072
-> VoiceService
|
Meta event for Voice Service - C072.
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Identify which services are not running. You can start the service manually from the
Administrator Service Control page. To disable monitoring for a specific service, go to
the device's Detailed Device View, select the specific service, and change the managed
state to False. Check to see if there are any core files. Download the core files, if any,
as well as service trace files. Events are removed for Unified CM only. You may need to
manually clear these Unified CM events after your upgrade is complete.
Step 3
Use Ctrl-C to end the running process.
Step 4
Log into the CUCM application. Type the IP address of CUCM and select Cisco Unified service
Availability > Tools > Feature services to start the service.
UC19 - UCS Chassis Failure - CUCM
This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCM nodes loses power. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the chassis is powered on. In the following example, the same chassis hosts all UC VMs for customer 80.
Observed RC-EL Events
When the chassis powers off, numerous synthetic RCA events are observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. Eventually, the UCS_Chassis_Fault synthetic RCA event remains as the root cause.
Table A-57 Observed Root Cause Events for UC19
Severity
|
EventTypeID
|
Summary
|
Critical
|
UCS_Chassis_Fault
|
Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.1
|
Critical
|
OM_CUCM_OM_Connectivity
|
Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty group events from 10.13.3.31
|
Observed SI-EL Events
CUCM voice service impacts voice mail and presence. In this example, CUCxn and CUP VMs are hosted in the same chassis, and all voice, voice mail, and presence services are affected.
Table A-58 Service Events for UC19
Severity
|
Summary
|
Critical
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C080-1 is Bad.
|
Critical
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.
|
Critical
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C080-1 is Bad.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services. The following table presents a list of events observed during internal testing if HA is not enabled on the cluster.
Note
10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP. Georedundancytemp-sa is the cluster name in the vCenter containing the C080 customer VMs.
Table A-59 Observed Other Events for UC19 (No HA)
Severity(S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)/
|
Summary
|
• S = Major
• N= 10.13.2.10
|
• EN = fltAdaptorExtIfLinkDown
• ET = UCS_Adapter
|
Adapter uplink interface 3/4/1/1 link state: unavailable(FaultCode:fltAdaptorExtIfLinkDown,FaultIndex
|
• S = Indeterminate
• N = 10.13.2.8
|
• EN = fltAdaptorUnitAdaptorReachability
• ET = default
|
Adapter 3/1/1 is unreachable (FaultCode:fltAdaptorUnitAdaptorReachability, FaultIndex:3695955)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEtherSwitchIntFIoSatelliteConnection Absent
• ET = UCS_PortsLinks
|
No link between IOM port 3/1/1 and fabric interconnect A:1/9 (FaultCode:fltEtherSwitchIntFIoSatellite ConnectionAbsent, FaultIndex:3468974)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltDcxVcMgmtVifDown
• ET = UCS_Mgmt_Link
|
IOM 3 / 1 (A) management VIF 3 down, reason None (FaultCode:fltDcxVcMgmtVifDown, FaultIndex:3468977)
|
• S = Indeterminate
• N = 10.13.2.10
|
• EN = fltPortPIoLinkDown
• ET = UCS_Etherne
|
Ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected (FaultCode:fltPortPIoLinkDown,FaultIndex).
|
• S = Major
• N = 10.13.2.10
|
• EN = fltEquipmentIOCardUnsupportedConnectivity
• ET = default
|
IOM 3/2 (B) current connectivity does not match discovery policy: unsupported connectivity(FaultCode:fltEquipmentIOCardUnsupportedConnectivity
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEquipmentIOCardUnsupported Connectivity
• ET = default
|
IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode:fltEquipmentIOCardUnsupported Connectivity,FaultIndex: 3695902).
|
• S = Major
• N = 10.13.2.8/li>
|
• EN = fltLsServerInaccessible
• ET = default
|
Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible, FaultIndex:3695961)
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = OM_CUCM_OM_Connectivity
|
RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing>;
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = Default
|
RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
Service Tree Event Overlay Location and Content
Table A-60 Observed Service Tree Events for UC19
Location
|
Summary
|
...-> Cluster_Availability --> Node:CUP-80-pub
|
PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cup-80- pub.customer.com,192.6.4.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;12 and 19.
|
...-> Cluster_Availability-->
Node:CUP-80-pub
|
Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.99.1.1.3.28; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44;
DiscoveredFirstAt= 06-28-2012 15:58:29; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:25; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub1
|
Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:34; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability-->
Sub: CUCM-80-sub1
|
PerformancePollingStopped::Component= cucm-80-sub1.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT, cucm-80- sub1.customer.com,192.6.4.187,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Sub: CUCM-80-sub2
|
PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- sub2.customer.com,192.6.4.188,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Sub: CUCM-80-sub2
|
Unresponsive::Component= cucm-80-sub2.customer.com;
SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability-->
Pub: CUCM-80-pub
|
PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- pub.customer.com,192.6.4.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Pub: CUCM-80-pub
|
Unresponsive::Component= cucm-80-pub.customer.com;
SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/
|
...-> Cluster_Availability-->
Sub: CUCxn-80-sub
|
Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2;
DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190;
IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:45; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability-->
Sub: CUCxn-80-sub
|
PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Pub: CUCxn-80-pub
|
PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Pub: CUCxn-80-pub
|
Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.189; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:38; Default Event Name= Unresponsive; DescriptionURL=< http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> VM Availability
|
The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub1 running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.34:ESX ON 2301145 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub2 running on host 10.13.3.33 is Disconnected. Message KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.33:ESX ON 2301136 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUP-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3-vcent-10.13.3.31:ESXON 2301155 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCxn-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.32:ESX ON 2301185 (Event_Type=VmDisconnectedEvent)]
|
...-> VoiceService
|
Meta event for Voice Service - C080
|
Next Steps
Step 1
Cross-launch to the domain manager UCSM to confirm that the chassis is powered off. Power on the chassis to clear the events.
UC20 - UCS Chassis Failure - CUCxn
This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCxn nodes lost power. Prime Central for HCS performs SIA and RCA for this use case. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCxn nodes stay down until the chassis is powered on.
For the following example, the same chassis hosts all UC VMs for customer 80.
Observed RC-EL Events
When the chassis powers off, numerous synthetic RCA events are observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. Eventually, only one synthetic RCA event, UCS_Chassis_Fault, remains as the root cause.
Table A-61 Observed Root Cause Events for UC20
Severity
|
EventTypeID
|
Summary
|
Critical
|
UCS_Chassis_Fault
|
Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.10
|
Critical
|
OM_CUCM_OM_Connectivity
|
Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty group events from 10.13.3.31
|
Observed SI-EL Events
CUCM voice service impacts voice mail and presence. In this example, CUCM VMs are hosted in the same chassis and all voice, voice mail, and presence services are affected because of this dependency. If only a CUCxn cluster is on the failed chassis, only CUCxn service is affected. The following table presents the list of events observed during internal testing.
Table A-62 Observed Service Events for UC20
Severity
|
Summary
|
Critical
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C080-1 is Bad.
|
Critical
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.
|
Critical
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C080-1 is Bad.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services. The following table presents the list of events observed during internal testing.
Note
10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP. georedundancytemp-sa is the cluster name in the vcenter containing Customer C080 VMs.
Table A-63 Observed Other Events for UC20 (No HA)
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Major
• N = 10.13.2.8
|
• EN = fltAdaptorExtIfLinkDown
• ET = UCS_Adapter
|
Adapter uplink interface 3/1/1/1 link state: unavailable(FaultCode:fltAdaptorExtIfLinkDown,FaultIndex:
|
• S = Indeterminate
• N = 10.13.2.8
|
• EN = fltAdaptorUnitAdaptorReachability
• ET = default
|
Adapter 3/1/1 is unreachable(FaultCode:fltAdaptorUnitAdaptorReachability
|
• S = Major
• N = 10.13.2.10
|
• EN = fltEtherSwitchIntFIoSatelliteConnectionAbsent
• ET = UCS_PortsLinks
|
No link between IOM port 3/1/1 and fabric interconnect A:1/9(FaultCode:fltEtherSwitchIntFIoSatelliteConnectionAbsent
|
• S = Major
• N = 10.13.2.10
|
• EN = fltDcxVcMgmtVifDown
• ET = UCS_Mgmt_Link
|
IOM 3 / 1 (A) management VIF 3 down, reason None(FaultCode:fltDcxVcMgmtVifDown,FaultIndex
|
• S = Major
• N = 10.13.2.10
|
• EN = fltPortPIoLinkDown
• ET = UCS_Etherne
|
Ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected (FaultCode:fltPortPIoLinkDown,FaultIndex
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEquipmentIOCardUnsupported Connectivity
• ET = default
|
IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode:fltEquipmentIOCardUnsupported Connectivity, FaultIndex:3695902)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEquipmentIOCardUnsupportedConnectivity
• ET = default
|
IOM 3/1 (A) current connectivity does not match discovery policy: unsupported connectivity(FaultCode:fltEquipmentIOCardUnsupportedConnectivity,FaultIndex:3695902)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltLsServerInaccessible
• ET = default
|
Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible, FaultIndex:3695961)
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = OM_CUCM_OM_Connectivity
|
RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = Default
|
RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name=RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
The following table presents the list of events observed during internal testing, if HA is enabled on the cluster.
Table A-64 Observed Other Events for UC20 (HA)
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Warning
• N = geo-redundancytemp- sa
|
• EN = KVM_Cluster_Effective_CPU_Low
• ET = VC_Cluster_Resources
|
The effective CPU amount of the cluster is low on georedundancy- temp-sa. Message: KVM_Cluster_Effective_CPU_Low[(Percent_Effective_AND Percent_Effective_CPU<50) ON tb3-vcenter:hcm-es-itm-m2:VM ON tb3 (Percent_Effective_CPU=29)]
|
• S = Warning
• N = geo-redundancytemp- sa
|
• EN = KVM_Cluster_Effective_Mem_Low
• ET = VC_Cluster_Resources
|
The effective memory of the cluster amount is low on georedundancy- temp-sa. Message: KVM_Cluster_Effective_Mem_Low[(Percent_Effective AND Percent_Effective_Memory<50) ON tb3-vcenter:hcm-es-itm-m2:VM ON tb3 (Percent_Effective_Memory=30)]
|
• S = Warning
• C = C080
• N = CUP-80-pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUP-80-pub was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312823 (Event_Type=VmRestartedOnAlternateHostEvent)]
|
• S = Warning
• C = C080
• N = CUCM-80-pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCM-80- pub was restarted on 10.13.3.15 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.15:ESX ON 2312730 (Event_Type=VmRestartedOnAlternateHostEvent
|
• S = Warning
• C = C080
• N = CUCM-80-sub2
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCM-80- sub2 was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312744 (Event_Type=VmRestartedOnAlternateHostEvent)]
|
• S = Warning
• C = C080
• N = CUCM-80-sub1
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCM-80- sub1 was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312743 (Event_Type=VmRestartedOnAlternateHostEvent)]
|
• S = Warning
• C = C080
• N = CUCxn-80-pub
|
• EN = KVM_VM_RestartOnAlt_Host_Cisco
• ET = VC_VM_Restored
|
Virtual machine CUCxn-80- pub was restarted on 10.13.3.15 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.15:ESX ON 2312727 (Event_Type=VmRestartedOnAlternateHostEvent)].
|
Service Tree Event Overlay Location and Content
The following table presents the list of events observed during internal testing.
Table A-65 Observed Service Tree Events for UC20
Severity (S)/Customer (C)/Node
(N)
|
Summary
|
...-> Cluster_Availability -->
Node:CUP-80-pub
|
cup-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cup-80- pub.customer.com,192.6.4.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;12 and 19, i meant CUP-80-pub -> Cluster_Availability--> Node:CUP-80-pub CUST_C080_CLS_CUP_CUP-80- pub_CUP-80-pub
|
...-> Cluster_Availability-->
Node:CUP-80-pub
|
Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.99.1.1.3.28; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 06-28-2012 15:58:29; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:25; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub1
|
Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:34; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub1
|
PerformancePollingStopped::Component= cucm-80-sub1.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT, cucm-80- sub1.customer.com,192.6.4.187,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Sub: CUCM-80-sub2
|
PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- sub2.customer.com,192.6.4.188,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub2
|
Unresponsive::Component= cucm-80-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Pub: CUCM-80-pub
|
PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- pub.customer.com,192.6.4.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Pub: CUCM-80-pub
|
Unresponsive::Component= cucm-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33;Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCxn-80-sub
|
Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190;
IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:45; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCxn-80-sub
|
PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Pub: CUCxn-80-pub
|
PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Pub: CUCxn-80-pub
|
Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.189; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:38; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> VM Availability
|
The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub1 running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.34:ESX ON 2301145 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub2 running on host 10.13.3.33 is Disconnected. Message KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.33:ESX ON 2301136 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUP-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3-vcent-10.13.3.31:ESXON 2301155 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCxn-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.32:ESX ON 2301185 (Event_Type=VmDisconnectedEvent)]
|
...-> VoiceService
|
Meta event for Voice Service - C080
|
Next Steps
Step 1
Cross-launch to the domain manager UCSM to confirm that the chassis is powered off. Power on the chassis to clear the events.
UC21 - Insufficient Virtual Memory
This use case describes the events that Prime Central for HCS receives if a CUCM server runs out of virtual memory. This type of incident generates Service Impact (SI) events.
Observed RC-EL Events
None.
Observed SI-EL Events
CUCM voice service impacted voice mail and presence. The following table presents the list of events observed during internal testing.
Table A-66 Observed Service Events for UC21
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-67 Observed Service Tree Events for UC21
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
...-> Application Resources
|
LowAvailableVirtualMemory::Component= VMEM-cucm-71-pub.customer.com/ Memory; VmPercentageUsed= 84; LowAvailableVirtualMemoryThreshold= 25; Default Event Name=
LowAvailableVirtualMemory; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=LowAvailableVirtualMemory >;
|
...-> VM_Resources
|
The virtual machine guest memory usage is high on CUCM-71-pub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>40 ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Guest_Util=66)]
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:
Check CUCM Windows Task Manager or the RTMT tool to verify insufficient memory. This event
may be caused by a memory leak. It is important to identify which process is using
excessive memory. After the process is identified, if you suspect a memory leak (for
example, if memory use for a process increases continually, or a process uses more memory
than it should), you may want to contact your support team.
UC22 - CPU Utilization Problems
This use case describes what events the Prime Central for HCS will receive if a CUCM server has a heavy load on its CPU. Service Impact (SI) Events will be generated due to this type of incident.
Observed RC-EL Events
None.
Observed SI-EL Events
CUCM voice service impacts voice mail and presence.
Table A-68 Observed Service Events for UC22
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-69 Observed Service Tree Events for UC20
Location
|
Summary
|
...-> Application Resources
|
CPUPegging::Component= PROCcucm- 71-pub.customer.com/
_Total; PercentageCPU= 99; TopProcessesDetails= tomcat(5%);RisDC(1%);cmoninit(1%); CallProcessingNodeCpuPeggingThreshold= 90; Default Event Name= CPUPegging; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CPUPegging >;
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
...-> VM_Resources
|
CPU use high on CUCM-71-pub. Message: KVM_VM_CPU_Util_High[(Utilization>90) ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Utilization=93)\]
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details > Next Steps to display the following recommendation:
Check the Communications Manager Windows Task Manager or Real Time Monitoring Tool (RTMT)
to verify CPU high utilization. The most common cause is one or more processes that use
excessive CPU resources. The event has information on which process is using the most CPU.
After the process is identified, you may want to take action, which could include:
* Checking the trace setting for that process; using detailed trace level can take up
excessive CPUresources
* Checking for events, such as Code Yellow, and launching Operations Manager synthetic
tests, such as Dial Tone Test to see if there is any impact on call processing.
* You may want to take more drastic measures, such as stopping nonessential services.
For more information, see
•
http://www.cisco.com/en/US/products/sw/voicesw/ps556/ products_tech_note09186a00808ef0f4.shtml
•
http://www.cisco.com/en/US/products/sw/voicesw/ ps556/products_tech_note09186a00807f32e9.shtml.
For a video tutorial on troubleshooting the CPUPegging event, click the E-Learning button in online help.
UC23 - Call Throttling Failures (Code Red)
This use case describes the events that Prime Central for HCS receives if a CUCM server has Call Throttling failures in the Code Red range. This type of incident generates Service Impact (SI) events.
Observed RC-EL Events
None.
Observed SI-EL Events
CUCM voice service impacted voice mail and presence.
The following table presents the list of events observed during internal testing.
Table A-70 Observed Service Events for UC23
Location
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-71 Observed Service Tree Events for UC23
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
...-> Application Resources
|
Code Red::Component= 192.6.4.123- System; Code Yellow Duration= 300 NumberOfCallsRejectedDueToCallThrottling=0 TotalCodeYellowEntry=2
HighPriorityQueueDepth=0 NormalPriorityQueueDepth=0 LowPriorityQueueDepth=0 AppID=Cisco CallManager ClusterID=CUCM-CL-C071-1 NodeID=CUCM-71-pub : Unified CM has entered Code Red condition and will restart; Default Event Name= Code Red; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CodeRed >;
|
Next Steps
Generally, repeated call throttling events require assistance. CUCM SDI and SDL trace files record call-throttling events and can provide useful information. Your support team may request these trace files for closer examination.
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:
When CUCM enters a Code Red state, the CUCM service restarts and produces a memory dump
that may be helpful for analyzing the failure.
Note
Events are cleared after 24 hours automatically or manually clear the event on Unified Operations Manager once you rectify the fault.
UC24 - Call Throttling Failures (Code Yellow)
This use case describes the events that Prime Central for HCS receives if a CUCM server has Call Throttling Failures in the Code Yellow Range. This type of incident generates Service Impact (SI) events.
Observed RC-EL Events
None.
Observed SI-EL Events
CUCM voice service impacts voice mail and presence.
The following table presents the list of events observed during internal testing.
Table A-72 Observed Service Events for UC24
Location
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-73 Observed Service Tree Events for UC24
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
...-> Application Resources
|
Code Yellow::Component= 192.6.4.123-System; Exit Latency= 8; Expected Average Delay= 0; Total Code Yellow Entry= 4; Entry Latency= 20; Sample Size= 10; Default Event Name= Code Yellow; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CodeYellow >;
|
Next Steps
Step 1
Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:
While this event generates, check process CPU usage and memory usage. Check for call
bursts and an increased number of registered devices (phones, gateways, and so on)
generated.
Continuously monitor whether CUCM is out of the Code Yellow state. You can launch
synthetic tests, such as the Dial Tone Test, to check for any impact on call processing.
To try to circumvent the possibility of a Code Yellow event, consider the possible causes
of system overload, such as heavy call activity, low CPU availability for CUCM, routing
loops, disk I/O limitations, disk fragmentation, and so on, and investigate those
possibilities.
For more information, see Call Throttling and the Code Yellow State.
UC25 - Route List Exhausted
This use case describes the events that the Prime Central for HCS dashboard displays if calls fail on Route List fail because no channels are available for call routing. Prime Central for HCS performs SIA for this use case. This event alerts a Network Operator that calls to a particular destination are failing and demand immediate attention to stop further failures. This can happen for several reasons, for example, a remote IP address is not reachable on a SIP/H323 trunk; a gateway is not reachable; the call failed at next call processing node across an IP trunk or TDM trunk, or not a TDM trunk lacked sufficient channels for the call.
Observed RC-EL Events
None.
Observed SI-EL Events
A Route List Exhausted failure may not impact voice mail and presence services, but by default Prime Central for HCS indicates impact on voicemail and presence services if Voice service is impaired.
The following table presents the list of events observed during internal testing.
Table A-74 Observed Service Events for UC25
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-75 Observed Service Tree Events for UC25
Severity
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
...-> Call Conrol -->Resources
|
CUST_C070_CLS_CUCM_CUCMCL- C070-1 RouteGroups(RG-AGGR); Default Event Name= Route List Exhausted; DescriptionURL= <
|
Next Steps
Step 1
Right-click Raw event > Event Details > Next Steps to display the following recommendation:
Check the RTMT Syslog Viewer for verification and further details. Assess whether
additional resources should be added in the indicated route.
UC26 - Media List Exhausted
This use case describes the events that Prime Central for HCS receives if calls fail because of unavailable media resources. Prime Central for HCS performs SIA for this use case. This event alert network operators that calls requiring media resources such as Announciator, Transcoder, Conference Bridge, and Music On Hold are failing.
Observed RC-EL Events
None.
Observed SI-EL Events
Media List Exhausted failures may not impact voice mail and presence services, but by default Prime Central for HCS indicates impact on voice mail and presence services if voice service is impaired. Table A-76 shows the Service Events observed during testing.
Table A-76 Observed Service Events for UC26
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.
Table A-77 Observed Service Tree Events for UC26
Severity
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
...-> Call Conrol -->Resources
|
Media List Exhausted::Component= VE-CUCM-CL-C070-1- cucm-70-pub.customer.com-- NULL_LIST; Media Resource Type= Annunciator; Media Resource List Name= NULL_LIST; Default Event Name= Media List Exhausted; DescriptionURL= < http://150.0.0.52:1741/
CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=MediaListExhausted >;
|
Next Steps
Step 1
Right-click Route List Exhausted > Event Details > Next Steps to display the following recommendation:
Install additional resources to the indicated media resource list. This event indicates a
network failure or device failure.
UC27 - High Resource Utilization by all Customer Sites
This use case describes the event that Prime Central for HCS receives if the High Utilization of Resources by all Customer sites event is encountered. Prime Central for HCS performs SIA for this use case.
Observed RC-EL Events
None.
Observed SI-EL Events
Media List Exhausted failures may not impact voice mail and presence services, but by default Prime Central for HCS indicates impacts to voice mail and presence services if voice service is impaired.
The following table presents the list of events observed during internal testing.
Table A-78 Observed Service Events for UC27
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the Service Tree in the Service Availability view.
Table A-79 Observed Service Events for UC27
Location
|
Summary
|
...-> Voice Service
|
Meta event for Voice Service - C070.
|
...-> Call Conrol -->Resources
|
HighResourceUtilization::Component= Transcoder-cucm-70- pub.customer.com; Threshold Value(%)= 10; Violation Value(%)= 20; Port or Resource Type= Transcoder; Default Event Name= HighResourceUtilization; DescriptionURL= < http://150.0.0.52:1741/
CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=HighResourceUtilization >;
|
Next Steps
Step 1
Right-click the Media List Exhausted > Event Details > Next Steps to display the the following recommendation:
Assess whether you should install additional resources. While this event is generated,
click the event ID to view event details and identify which resource exceeded the
threshold.
Use the performance graph or RTMT (for CUCM) to monitor resource utilization in real time
and over the past 72 hours to verify high utilization and determine whether you need to
install additional resources.
UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn
This use case describes the events that Prime Central for HCS receives if there is a memory, CPU, or disk threshold exceeded issue on Unity Connection (CUCxn). Prime Central for HCS displays Service Impact (SI) events only for such incidents.
Observed RC-EL Events
None.
Observed SI-EL Events
These CUCxn issues affect Voice mail. The following table presents the list of events observed during internal testing.
Table A-80 Observed Service Events for UC27
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-81 Observed Service Tree Events for UC28
Location
|
Summary
|
...-> ApplicationResources
|
InsufficientFreeMemory::Component= RAM-cucxn-71-pub.customer.com/1; RAMTotalSize= 3920 MB; FreePhysicalMemoryThreshold= 15; UsedRAM= 3488 MB; FreePhysicalMemoryInPercentage= 11 %; Default ;
|
...-> ApplicationResources
|
HighUtilization::Component= PSRcucxn- 71-pub.customer.com/0; ProcessorUtilizationThreshold= 90; CpuUtilFiveMin= 99 %; Default Event Name= HighUtilization; DescriptionURL= < ;
|
...-> ApplicationResources
|
InsufficientFreeHardDisk::Component= DISK-cucxn-71-pub.customer.com/9; HardDiskTotalSize= 99404 MB; FreeHardDiskThreshold= 15; FreeHardDiskInPercentage= 11 %; HardDiskUsed= 87824 MB; Default ;
|
Next Steps
Step 1
Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click Raw event > Event Details > Next Steps to display the following recommendation:
*Insufficient free memory
On Cisco IOS devices, run show memory to check memory utilization. Sometimes high memory
utilization indicates a memory leak. Identify which process is using excessive memory and
take action (including restarting the process). On other devices, close any unnecessary
applications and stop the services that are not being used or are not required.
Identify the processes using excessive CPU space. You may want to take action, which can
include restarting the identified process or processes.
*Insufficient free disk space
Uninstall unnecessary applications, delete temporary files to free disk space, and clean
up unnecessary files.
UC29 - Low Number Of Available Licenses - CUCxn
This use case describes the events that Prime Central for HCS display if there are few available licenses in CUCxn. Prime Central for HCS displays Service Impact (SI) events for such incidents.
Observed RC-EL Events
None.
Observed SI-EL Events
Unavailable CUCxn licenses impacts Voicemail service.
Table A-82 Observed Service Events for UC29
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-83 Observed Service Tree Events for UC29
Location
|
Summary
|
...-> VoicemailResources
|
Subscriber License Violated::Component= 192.6.4.133- System; Detail= CUCxn-72-sub]: An insufficient license violation has occurred. For details,open the Licensing screens on Cisco Unity Connection Administration web pages. Tag LicSubscribersMax licenses 10 subscribers, but 502 are being used. Please reduce usage to match the licensed limits or purchase additional licensed functionality.
|
Next Steps
Step 1
Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.
Step 2
Right-click Raw event > Event Details > Next Steps to display the recommendation.
UC30 - VM Resources - Memory
This use case describes the event that receives if a VM exceed its memory threshold. Prime Central for HCS performs SIA for this use case; no RCA is performed.
Observed RC-EL Events
None.
Observed SI-EL Events
CUCM voice service impacts voice mail and presence. In this example, a CUCM VM is used and all voice, VM, and presence services are affected because of the dependency. If CUCxN or CUP VM memory is used, you should see only one service impacted (CUCxn or CUP).
Table A-84 Observed Service Events for UC30
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
Table A-85 Observed Service Tree Events for UC30
Location
|
Summary
|
...-> VM_Resources
|
The virtual machine guest memory usage is high on CUCM-70-sub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>ON VM:cisco-10.11.3.145:ESX ON CUCM-70-sub (Guest_Util=76)]
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
Next Steps
Step 1
Check vCenter to confirm the alarm.
Step 2
Add additional memory resources, if required to rectify the alarm.
UC31 - VM Resources - CPU
This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the CPU. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.
Root Cause Events Observed
None.
Service Events Observed
CUCM voice service impacted voice mail and presence. In this example, CUCM VM is used and all Voice/ VM/Presence services are affected due to dependency. If CUCxN or CUP VM CPU threshold is violated, it should only see one service affected: CUCxn or CUP service.
Table A-86 SI events Observed for UC31 - VM Resources - CPU
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.
|
Other Events Observed
None.
Table A-87 SI events Observed for UC31 - VM Resources - CPU
Location
|
Summary
|
...-> VM_Resources
|
CPU use high on CUCM-71-pub. Message: KVM_VM_CPU_Util_High[(Utilization>5) ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Utilization=10)]
|
...-> Voice Service
|
Meta event for Voice Service - C071
|
Next Steps
Check vCenter to confirm the alarm. Add additional CPU resources to VM as needed to rectify the alarm.
UC32 - VM Resources - Disk usage
This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the disk usage. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.
Root Cause Events Observed
None.
Service Events Observed
CUCM voice service impacted voice mail and presence. In this example, CUCM VM is used and all Voice/ VM/Presence services are affected due to dependency. If CUCxN or CUP VM disk is used, it should only see one service affected: CUCxn or CUP service.
Table A-88 SI events Observed for UC32 - VM Resources - Disk usage
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Other Events Observed
None.
Service tree event overlay location
Table A-89 SI events Observed for UC32 - VM Resources - Disk usage
Location
|
Summary
|
...-> VM_Resources
|
The virtual machine disk partition free space is low on CUCM-70-sub.Message: KVM_VM_Disk_Free_Low \[(Percent_Free>=0 AND Percent_Free<10) ON VM:cisco-10.11.3.145:ESX ON CUCM-70-sub (Percent_Free=0)\]
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
Next Steps
Check vCenter to confirm the alarm and remove files in VM to free up disk space.
UC33 - VM Resources - CPU ready time
This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the CPU Ready Time. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.
Root Cause Events Observed
None.
Service Events Observed
CUCM voice service impacted voice mail and presence. In this example, event is triggered on CUCM VM and all Voice/ VM/Presence services are affected due to dependency. If event is triggered on CUCxN or CUP VM, it should only see one service affected: CUCxn or CUP service.
Table A-90 Service Events Observed for UC33 - VM Resources - CPU ready time
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Other Events Observed
None.
Table A-91 Service Tree events Observed for UC33 - VM Resources - CPU ready time
Location
|
Summary
|
...-> VM_Resources
|
The CPU percent ready is high on CUCM-70-sub. Message: KVM_VM_CPU_Ready_High[(Percent_Rdy>5) ON VM:cisco-10.11.3.148:ESX ON CUCM-70-sub (Percent_Rdy=7)]
|
...-> VM_Resources
|
The CPU percent ready is high on CUCM-70-pub. Message: KVM_VM_CPU_Ready_High[(Percent_Rdy>5) ON VM:cisco-10.11.3.148:ESX ON CUCM-70-pub (Percent_Rdy=8)]
|
...-> Voice Service
|
Meta event for Voice Service - C070
|
Next Steps
Step 1
Check vCenter DM to confirm the alarm and make sure host is not overstressed by hosting VMs.
Step 2
Manually move some VMs to other hosts if needed.
Step 3
Configure DRS if possible to minimize the stress on any particular hosts among the cluster.
UC34 - VM Resources - Disk latency
This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the Disk Latency. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.
Root Cause Events Observed
None.
Service Events Observed
CUCM voice service impacted voice mail and presence. In this example, event is triggered on CUCM VM and all Voice/ VM/Presence services are affected due to dependency. If event is triggered on CUCxN or CUP VM, it should only see one service affected: CUCxn or CUP service.
Table A-92 Root Cause Events Observed for UC34 - VM Resources - Disk latency
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.
|
Minor
|
Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.
|
Other Events Observed
None.
Table A-93 Service tree events Observed for UC34 - VM Resources - Disk latency
Location
|
Summary
|
...-> VM_Resources
|
Alarm ''Virtual Machine Disk Latency High'' on CUCM-70- sub changed from Red to Yellow. Message: KVM_VM_Disk_Latency \[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"*Virtual*Machine*Disk*Latency*" ON VM:cisco-10.11.3.148:ESX ON 31448316 (Event_Type=AlarmStatusChangedEventEvent_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-70-sub changed from Red to Yellow)\]
|
...-> VM_Resources
|
Alarm ''Virtual Machine Disk Latency High'' on CUCM-70- pub changed from Red to Yellow. Message: KVM_VM_Disk_Latency \[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"*Virtual*Machine*Disk*Latency*" ON VM:cisco-10.11.3.148:ESX ON 31448313 (Event_Type=AlarmStatusChangedEvent Event_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-70-pub changed from Red to Yellow)]
|
...-> Voice Service
|
Meta event for Voice Service - C070.
|
Next Steps
Step 1
Check vCenter DM to confirm the alarm and make sure LUN is not overstressed by hosting VMs.
Step 2
Manually move some VMs to other LUN if needed.
Step 3
Configure Storage DRS if possible to minimize the stress on any particular LUN.
UC35 - ASR1K - Chassis Failure
This use case describes the impact on offnet service in the event of a chassis failure in the physical ASR1K router.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-94 Service Events Observed for UC35 - ASR1K - Chassis Failure
Severity
|
Summary
|
Critical
|
<Customer-Name> offnet voice service is bad. When all ASR1Ks under CUBE-SP service are down.
|
Minor
|
<Customer-Name> offnet voice service is marginal. When one or more (but not all) ASR1Ks under CUBE-SP service are down.
|
Other Events Observed
None
Next Steps
Step 1
Check vCenter DM to confirm the alarm and make sure the host is not overstressed by hosting VMs.
Step 2
Manually move some VMs to other hosts, if needed.
Step 3
Configure DRS, if possible, to minimize the stress on any particular hosts among the cluster.
UC36 - ASR1K - Power Supply/Fan Failure
This use case describes the impact on offnet voice service in the event of a power supply or fan failure in the physical ASR1K router.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-96 Service Events Observed for UC35- ASR1K - Power Supply/Fan Failure
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal
|
Other Events Observed
None
UC37 - ASR1K - RP/ES/SPA Failure
This use case describes the impact on offnet voice service in the event of a router card failure that results in interface down events.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-98 Service Events Observed for UC35- ASR1K - Chassis Failure
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal.
|
Other Events Observed
None
UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk
This use case describes the impact on offnet voice service when there is a loss of SIP trunk between CUBE-SP and leaf cluster without losing IP connectivity.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-100 Service Events Observed for UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk
Severity
|
Summary
|
Critical
|
<Customer-Name> offnet voice service is bad. When all SIP Trunks are out of service.
|
Minor
|
<Customer-Name> offnet voice service is marginal. When one or more SIP trunks are out of service.
|
Other Events Observed
None
UC39 - CUBE-SP Adjacency Status
This use case describes the impact on offnet voice service when there is a loss of adjacency when CUCM application is down.
Root Cause Events Observed
Root cause events are generated when CUCM is shut down. The following events are displayed as synthetic root cause events:
•
OM_CUCM_OM_Connectivity—Indicates that Cisco Unified Communications Manager is down.
•
OM_CUCM_NodeRestart—Indicates that Cisco Unified Communications Manager node has been restarted.
•
OM_CUCM_Processes—Indicates that Cisco Unified Communications Manager services are down.
Service Events Observed
offnet voice service events are listed in the following table:
Table A-102 Service Events Observed for UC39 - CUBE-SP Adjacency Status
Severity
|
Summary
|
Critical
|
<Customer-Name> offnet voice service is bad. When all northbound or southbound adjacencies are down.
|
Minor
|
<Customer-Name> offnet voice service is marginal. When one or more northbound or southbound adjacencies are down.
|
Other Events Observed
None
UC40 - Voice Quality Degradation
This use case describes the impact on offnet service when MOS quality is critical or major.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-104 Service Events Observed for UC35 - Voice Quality Degradation
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal.
|
Other Events Observed
None
UC41 - CUBE - SP Security Violation
This use case describes the impact on offnet voice service when CUBE-SP security events take place.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-106 Service Events Observed for UC41 - CUBE - SP Security Violation
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal.
|
Other Events Observed
None
Table A-107 Service Tree events Observed for UC41 - CUBE-SP Security Violation
Location
|
Summary
|
...-> CUBE-SP Security
|
SourceAlert::Component= hcs-sbc/3/1/3/3.3.3.33/1/1.1.1.112/3/VPNID; SBCServiceName= hcs-sbc; VdbeId= 3; GateId= 1; FlowPairId= 3; LocalAddressType= dns; LocalAddress= 3.3.3.33; LocalPort= 1; RemoteAddressType= ipv4z; RemoteAddress= 1.1.1.112; RemotePort= 3; VpnId= VPNID; AlarmDescription= This is to alert that some unwanted data packets are received by the system from an undesirable IP/port.; Default Event Name= SourceAlert; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SourceAlert >;
|
..-> CUBE-SP Security
|
DynamicBlackList::Component= hcs-sbc/globalnew12/0.0.0.23/0; SBCServiceName= hcs-sbc; SubFamily= Blacklist VPN; VpnId= globalnew12; AddressType= ipv4; Address= 0.0.0.23; TransportType= UDP; PortNumber= 0; AlarmDescription= source is added to or removed from the blacklist table; Default Event Name= DynamicBlackList; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=DynamicBlackList >;
|
UC42 - CUBE-SP Resource Performance Degradation
This use case describes the impact on offnet voice service when CUBE-SP resource performance degrades.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-108 Service Events Observed for UC42 - CUBE-SP Resource Performance Degradation
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal.
|
Other Events Observed
None
UC43 - CUBE-SP SLA Violation
This use case describes the impact on offnet voice service when CUBE-SP SLA violation takes place.
Root Cause Events Observed
None
Service Events Observed
offnet voice service events are listed in the following table:
Table A-110 Service Events Observed for UC43 - CUBE-SP SLA Violation
Severity
|
Summary
|
Minor
|
<Customer-Name> offnet voice service is marginal.
|
Other Events Observed
None
Table A-111 Service Tree events Observed for UC43 - CUBE-SP SLA Violation
Location
|
Summary
|
...->CUBE-SP Performance
|
SLAViolation::Component= hcs-sbc/unknown/global/call setup; SBCServiceName= hcs-sbc; SLAPolicyAccountName= unknown; SLAPolicyScope= global; SLAPolicyLimit= 700; SLACurrentUsage= 700; SLAViolationEvent= call setup; SLAPolicyRestriction= allowable number of concurrent calls; AlarmDescription= Violation of Service Level Agreement as described in the policy tables; Default Event Name= SLAViolation; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SLAViolation >;
|
CP1 - CUCMIP Critical Processes Failure
This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCMIP. Prime Central for HCS generates RC and SI events for such incidents.
Observed RC-EL Events
When the critical process such as sipd is down, the CUMIP server generate Service Down event and CUOM process it and transmit it to Prime Central for HCS system.
Table A-112 Observed RC-EL Events for CP1)
Severity
|
EventTypeId
|
Summary
|
• Critical
|
• OM_CUP_OM_ Processes
|
Synthetic Event for OM_CUP_Processes group events from cup-82- pub.customer.com
|
Observed SI-EL Events
If critical process such as sipd is down, it will affect presence and IM feature of soft clients like Cisco Jabber.
Table A-113 Observed SI-EL Events for CP1
Severity
|
Summary
|
• Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-87-pub is Marginal.
|
Observed Other-EL Events
None
Correlated Events in Service Tree
The following table shows the events that are overlaid in Service Tree.
Table A-114 Observed SI-EL Events for CP1
Node
|
Summary
|
• 87-pub.customer.com
|
ServiceDown::Component= VS-cup-87-pub.customer.com/ Cisco SIP Proxy; ProductName= Cisco SIP Proxy; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceDown >;
|
Next Steps
Step 1
Abort the script that will kill the sipd process.
Step 2
Go to the UC Serviceability page and Start the Cisco SIP Proxy service.
CP2 - Application Cold Failure - CUCMIP
This use case describes the events that Prime Central for HCS receives if a CUCMIP server restarts. This type of incident generates RC and SI events.
Observed RC-EL Events
When the CUCMIP server restarts, synthetic RCA events of type OM_CUP_OM_Connectivity is Observed.
Table A-115 Observed RC-EL Events for CP2
Severity
|
EventTypeId
|
Summary
|
• Critical
|
• OM_CUP_OM_Connectivity
|
Synthetic Event for OM_CUP_OM_Connectivity group events from cup-82-pub.customer.com
|
Observed SI-EL Events
CUP voice service impacts voice mail and presence. The following table shows SI-EL events observed during testing.
Table A-116 Observed SI-EL Events for CP2
Severity
|
Summary
|
• Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.
|
Observed Other-EL Events
None.
Correlated Events in Service Tree
The following table shows events which are correlated in Service Tree.
Table A-117 Observed Service Tree Events for CP2
Severity
|
Summary
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
RTMTDataMissing::Component= cup-82-pub.customer.com; Name= cup-82-pub.customer.com; HostDescription= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
PerformancePollingStopped::Component= cup-82-pub.customer.com; Error Message String= 16- Oct-2012 15:07:59 EDT,cup-82- pub.customer.com,192.6.4.210,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
Next Steps
The system restarts automatically and the events are also cleared in 60 minutes.
CP3 - VMware VM Failure - CUCMIP
This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCMIP fails abruptly. Prime Central for HCS generates RC and SI events for such incidents.
Observed RC-EL Events
When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, and OM_CUP_OM_Connectivity. Eventually, Prime Central for HCS stabilizes to one root cause, VC_VM_Avlblty.
Table A-118 Observed RC-EL Events for CP3
Severity
|
EventTypeId
|
Summary
|
• Critical
|
• VC_VM_Avlblty
|
Synthetic Event for VC_VM_Avlblty group events from CUP-82-pub
|
Observed SI-EL Events
VM failure impacts presence service. Table 20-99 shows SI-EL events observed during testing.
Table A-119 Observed SI-EL Events for CP3
Severity
|
Summary
|
• Minor
|
Overall attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid in the service tree view portlet. The following table shows service tree events observed during testing:
Table A-120 Observed Service Tree Events for CP3
Location
|
Summary
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
RTMTDataMissing::Component= cup-82-pub.customer.com; Name= cup-82-pub.customer.com; HostDescription= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
PerformancePollingStopped::Component= cup-82-pub.customer.com; Error Message String= 16- Oct-2012 13:28:00 EDT,cup-82- pub.customer.com,192.6.4.210,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= <http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
Unresponsive::Component= cup-82-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-11-2012 13:53:45; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.210; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-16-2012 06:03:55; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
The virtual machine CUP-82- pub running on 10.11.3.158 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM [(Event_Type=N"VmPoweredOffEvent") ON VM:tb1-vcent-10.11.3.158:ESX ON 39450127 (Event_Type=VmPoweredOffEvent)]
|
Next Steps
Power on the affected VM.
CP4 - CUCMIP VMware ESXi Host Failure
This use case describes the events that Prime Central for HCS receives if the VMware ESXi host fails. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the ESXi Host is recovered.
Observed RC-EL Events
When the ESXi host shuts down, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, and OM_CUP_OM_Connectivity. Eventually, there are two synthetic RCA events outstanding: VC_Host_Avlblty and UCS_Bladelinks. These two are sibling events in the correlation tree. The following table shows RC-EL events observed during testing.
Table A-121 Observed RC-EL Events for CP4
Severity
|
EventTypeId
|
Summary
|
Major
|
UCS_BladeLinks
|
Synthetic Event for UCS_BladeLinks group events from 10.13.2.8
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty group events from 10.13.3.34
|
Observed SI-EL Events
CUCMIP service is impacted.
Table A-122 Observed SI-EL Events for CP4
Severity
|
Summary
|
Critical
|
Overall attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.
Table A-123 Observed Other-EL Events for CP4
Severity
|
Summary
|
• S = Major
• N = 10.13.2.8
|
Network Interface (ifIndex = 469775808) Down, should be Up (ifEntry.469775808)
|
• S = Major
• N = 10.13.2.9
|
Network Interface (ifIndex = 469775824) Down, should be Up (ifEntry.469775824)
|
• S = Major
• N = 10.13.2.8
|
Link Down (server 3/4, VNIC eth0)
|
• S = Major
• N = 10.13.2.9
|
Link Down (server 3/4, VNIC eth1)
|
• S = Major
• N = 10.13.2.8
|
Link Down (server 3/4, VHBA fc0)
|
• S = Major
• N = 10.13.2.9
|
Link Down (server 3/4, VHBA fc1)
|
• S = Minor
• N = 10.13.2.8
|
Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317342.301)
|
• S = Minor
• N = 10.13.2.9
|
Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317343.302)
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the service tree view portlet.
Table A-124 Observed Service Tree Events for CP4
Severity
|
Summary
|
...-> Cluster_Availability--> Node:CUP-80-pub
|
PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 23- Oct-2012 15:15:41 EDT,cup-80- pub.customer.com,192.6.6.190,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Node:CUP-80-sub
|
PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 23- Oct-2012 15:15:41 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Node:CUP-80-pub
|
Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-17-2012 18:48:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-23-2012 06:04:21; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Node:CUP-80-sub
|
Unresponsive::Component= cup-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-17-2012 19:37:09; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-23-2012 06:04:05; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> VM Availability
|
The virtual machine CUP-80- pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUP-80- sub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]
|
Next Steps
Step 1
The VM on the host is automatically brought up in another host if HA is enabled.
Step 2
Follow the steps given below to bring back the original host:
a.
Go to UCSM to bring the host back through boot server.
b.
Manually power on CUCMIP VM if HA is not configured and VM is not configured to restart with host.
c.
Drag and drop CUCMIP VM to the original host if HA is enabled.
CP5 - CUCMIP UCS Blade Failure
This use case describes the events that Prime Central for HCS receives if a UCS blade fails. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the UCS blade is replaced.
Table A-125 Observed RC-EL Events for CP5
Severity
|
EventTypeId
|
Summary
|
Critical
|
UCS_Blade_Avlblty
|
Synthetic Event for UCS_Blade_Avlblty group events from 10.13.2.8
|
Observed SI-EL Events
CUCMIP service is impacted.
Table A-126 Observed SI-EL Events for CP5
Severity
|
EventTypeId
|
Summary
|
Critical
|
CUST_C080_CLS_CUP_CUP-80-pub
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.
|
Observed Other-EL Events
Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.
Table A-127 Observed Other-EL Events for CP5
Severity (s)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Indeterminate
• N = 10.13.2.8
|
• EN = fltAdaptorUnitAdaptorReachability
• ET = default
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.
|
• S = Major
• N = 10.13.2.8
|
• EN = fltLsServerRemoved
• ET = UCS_Blade_ServiceProfile
|
Service profile c3b4 underlying resource removed (FaultCode:fltLsServerRemoved, FaultIndex:3860463)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltAdaptorExtIfLinkDown
• ET = UCS_Adapter
|
Adapter uplink interface 3/4/1/2 link state: unavailable (FaultCode:fltAdaptorExtIfLinkDown, FaultIndex:3860496)
|
• S = Major
• N = 10.13.2.8
|
|
Network Interface (ifIndex = 469775808) Down, should be Up (ifEntry.469775808)
|
• S = Major
• N = 10.13.2.9
|
|
Network Interface (ifIndex = 469775824) Down, should be Up (ifEntry.469775824)
|
• S = Major
• N = 10.13.2.8
|
|
Link Down (server 3/4, VNIC eth0)
|
• S = Major
• N = 10.13.2.9
|
|
Link Down (server 3/4, VNIC eth1)
|
• S = Major
• N = 10.13.2.8
|
|
Link Down (server 3/4, VHBA fc0)
|
• S = Major
• N = 10.13.2.9
|
|
Link Down (server 3/4, VHBA fc1)
|
• S = Major
• N = 10.13.2.8
|
|
Network Interface (ifIndex = 520224960) Down, should be Up (ifEntry.520224960)
|
• S = Major
• N = 10.13.2.9
|
|
Network Interface (ifIndex = 520224960) Down, should be Up (ifEntry.520224960)
|
• S = Minor
• N = 10.13.2.8
|
|
Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317342.301)
|
• S = Minor
• N = 10.13.2.9
|
|
Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317343.302)
|
Service Tree Event Overlay Location and Content
SIA events are overlaid on the service tree view portlet.
Table A-128 Observed Other-EL Events for CP5
Location
|
Summary
|
...-> Cluster_Availability --> Node:CUP-80-pub
|
PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 26- Oct-2012 16:16:28 EDT,cup-80- pub.customer.com,192.6.6.190,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability --> Node:CUP-80-sub
|
PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 26- Oct-2012 16:16:28 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Node:CUP-80-pub
|
Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 19:08:32; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-26-2012 06:05:53; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Node:CUP-80-sub
|
Unresponsive::Component= cup-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 22:21:22; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-26-2012 06:05:47; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> VM Availability
|
The virtual machine CUP-80- pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUP-80- sub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530558 (Event_Type=VmDisconnectedEvent)]
|
Next Steps
Step 1
The VM on the host is automatically brought up in another host if HA is enabled.
Step 2
The original host is brought back via following steps:
a.
Troubleshoot and resolve the blade issue.
a.
Manually power on CUCMIP VM if HA is not enabled and VM is not configured to restart with host.
b.
Drag and drop CUCMIP VM to the original host if HA is enabled.
CP6 - CUCMIP UCS Chassis Failure
This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCM, CUCxn, and CUCMIP nodes loses power. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the chassis is powered on.
In the following example, the same chassis hosts all UC VMs for customer 80.
Observed RC-EL Events
During the chassis powers off/on, numerous synthetic RCA events may be observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. The UCS_Chassis_Fault synthetic RCA event is the root cause. Additional root cause events were observed during our testing because of issues outlined in the note following the table.
Table A-129 Observed RC-EL Events for CP6
Severity
|
EventTypeId
|
Summary
|
Critical
|
UCS_Chassis_Fault
|
Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.8
|
Critical
|
OM_CUCM_OM_Connectivity
|
Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1
|
Critical
|
VC_Host_Avlblty
|
Synthetic Event for VC_Host_Avlblty group events from 10.13.3.33
|
Note
Event OM_CUCM_OM_Connectivity shows up as the root cause event because the cluster level event does not participate in the event correlation dependency tree in the current release. VC_Host_Avlblty shows up as the root cause event because of the DDTS CSCuc06575 - Some VC_Host_Avlblty events remained as root cause during chassis failure.
Observed SI-EL Events
CUCM voice service impacts voice mail and presence. In this example, CUCxn and CUCMIP VMs are hosted in the same chassis, and all voice, voice mail, and presence services are affected.
Table A-130 Observed SI-EL Events for CP6
Severity (S)/Customer (C)/Node
(N)
|
EventName (EN)/EventTypeId (ET)
|
Summary
|
• S = Major
• N = 10.13.2.8
|
• EN = fltAdaptorExtIfLinkDown
• ET = UCS_Adapter
|
Adapter uplink interface 3/1/1/2 link state: unavailable (FaultCode:fltAdaptorExtIfLinkDown, FaultIndex:3711164)
|
• S = Indeterminate
• N = 10.13.2.8
|
• EN = fltAdaptorUnitAdaptorReachability
• ET = default
|
Adapter 3/1/1 is unreachable (FaultCode:fltAdaptorUnitAdaptorReachability, FaultIndex:3695955)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEtherSwitchIntFIoSatelliteConnection Absent
• ET = UCS_PortsLinks
|
No link between IOM port 3/1/1 and fabric interconnect A:1/9 (FaultCode:fltEtherSwitchIntFIoSatelliteConnection Absent, FaultIndex:3468974)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltDcxVcMgmtVifDown
• ET = UCS_Mgmt_Link
|
IOM 3 / 1 (A) management VIF 3 down, reason None (FaultCode:fltDcxVcMgmtVifDown, FaultIndex:3468977)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltPortPIoLinkDown
• ET = UCS_Etherne
|
ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected(FaultCode:fltPortPIoLinkDown, FaultIndex:3468975)
|
• S = Major
• N= 10.13.2.8
|
• EN = fltEquipmentIOCardUnsupported Connectivity
• ET = default
|
IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode: fltEquipmentIOCardUnsupportedConnectivity, FaultIndex:3695902)
|
• S = Major
• N = 10.13.2.8
|
• EN = fltEquipmentIOCardUnsupported Connectivity
• ET = default
|
IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode: fltEquipmentIOCardUnsupportedConnectivity, FaultIndex:3695902)
|
• S = Major
• N= 10.13.2.8
|
• EN = fltLsServerInaccessible
• ET = default
|
Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible,
FaultIndex:3695961)
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = OM_CUCM_OM_Connectivity
|
RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT
on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
• S = Warning
• C = C080
|
• EN = RTMTDataMissing
• ET = Default
|
RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;
|
Chassis failures were observed for few instances of UCS_Chassis_Avlblty EventTypeId from fltEquipmentChassisPowerProblem event and the it is marked as `Unknown' and not as `Root Cause' or `Symptom' event.
Service Tree Event Overlay Location and Content
Table A-131 Observed Service Tree Events for CP6
Location
|
Summary
|
...-> Cluster_Availability -->
Node:CUP-80-pub
|
PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Sub: CUCM-80-sub2
|
PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucm-80- sub2.customer.com,192.6.6.192,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Pub: CUCM-80-pub
|
PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucm-80- pub.customer.com,192.6.6.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Sub: CUCxn-80-sub
|
PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability-->
Pub: CUCxn-80-pub
|
PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;
|
...-> Cluster_Availability--> Node:CUP-80-pub
|
Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 19:08:32; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub1
|
Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-17-2012 18:49:30; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-24-2012 06:04:52; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCM-80-sub2
|
Unresponsive::Component= cucm-80-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-24-2012 23:13:45; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.192; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:02:42; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Pub: CUCM-80-pub
|
Unresponsive::Component= cucm-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-17-2012 18:49:27; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:03:58; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Sub: CUCxn-80-sub
|
Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 10-24-2012 17:05:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 05:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> Cluster_Availability--> Pub: CUCxn-80-pub
|
Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 10-24-2012 19:05:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;
|
...-> VM Availability
|
The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub1 running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529034 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-sub2 running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529036 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUP-80-pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.34:ESX ON 2529027 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCM-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529032 (Event_Type=VmDisconnectedEvent)]
|
...-> VM Availability
|
The virtual machine CUCxn-80-pub running on host 10.13.3.33 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.33:ESX ON 2529018 (Event_Type=VmDisconnectedEvent)]
|
...-> CUP-80-pub-> VoiceService
|
Meta event for CUP Voice Service - C080
|
...-> CUCxn-CL-C080-1-> VoiceService
|
Meta event for CUCxn Voice Service - C080
|
Next Steps
Step 1
Cross-launch to UCSM to confirm that the chassis is powered off.
Step 2
Power on the chassis to clear the events.
CP7 - Application Resources Degradation - CUCMIP
This use case describes the events that are generated if the threshold for available hard disk space is crossed. This type of incident generates RC and SI events.
Observed RC-EL Events
The following table shows RC-EL events observed during testing.
Table A-132 Observed RC-EL Events for CP7
Severity
|
EventTypeID
|
Summary
|
Minor
|
OM_CUP_App_Resources
|
Synthetic Event for OM_CUP_App_Resources group events from cup-82-pub.customer.com
|
Observed SI-EL Events
Table A-133 Observed SI-EL Events for CP7
Severity
|
Summary
|
Minor
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the service in the tree view portlet. Table A-134 shows service tree events observed during testing.
Table A-134 Observed Service Tree for CP7
Severity
|
Summary
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
InsufficientFreeHardDisk::Component= DISK-cup-82-pub.customer.com/3; HardDiskTotalSize= 19280 MB; FreeHardDiskThreshold= 15; FreeHardDiskInPercentage= 0 %; HardDiskUsed= 19268 MB; Default Event Name= InsufficientFreeHardDisk; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=InsufficientFreeHardDisk >;
|
Next Steps
•
Manually remove the dummy file from the CUCMIP server.
•
The events automatically clear, however some take up to 60 minutes.
•
Should there be a different type of OS failure, other recovery steps would be required.
CP11 - IM Resources Exceeded - CUCMIP
This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCMIP exceeds the threshold value for the number of TextConferenceRooms opened via the Jabber client. Prime Central for HCS generates RC and SI events for such incidents.
Observed RC-EL Events
When the threshold value is reached, numerous synthetic RCA events are observed, including VC_VM_Avlblty and OM_CUP_IM_Resources.
Table A-135 Observed RC-EL Events for CP11
Severity
|
EventTypeID
|
Summary
|
Critical
|
OM_CUP_IM_Resources
|
Synthetic Event for OM_CUP_IM_Resources group events from cup-82-pub.customer.com
|
Observed SI-EL Events
Table A-136 Observed SI-EL Events for CP11
Severity
|
Summary
|
Critical
|
Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.
|
Observed Other-EL Events
None.
Service Tree Event Overlay Location and Content
SIA events are overlaid on the service in the tree view portlet. Table A-114 shows service tree events observed during testing.
Table A-137 Observed Service Tree for CP11
Severity
|
Summary
|
...-> Cluster_Availability--> Pub: CUP-82-pub
|
TextConferenceRoomsExceeded::Component= TextConferenceRooms-cup-82- pub.customer.com; Threshold Value= 3; Violation Value= 5; Default Event Name= TextConferenceRoomsExceeded; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=TextConferenceRoomsExceeded >;
|
Next Steps
Close chat rooms until the number of open chat rooms becomes less than the threshold value.