Table Of Contents
System Troubleshooting
Introduction
System Events and Alarms
SYSTEM (1)
SYSTEM (2)
SYSTEM (3)
SYSTEM (4)
SYSTEM (5)
SYSTEM (6)
SYSTEM (7)
SYSTEM (8)
SYSTEM (9)
SYSTEM (10)
SYSTEM (11)
SYSTEM (12)
SYSTEM (13)
SYSTEM (14)
SYSTEM (15)
Monitoring System Events
Test Report - System (1)
Inter-Process Communication Queue Read Failure - System (2)
Inter-Process Communication Message Allocate Failure - System (3)
Inter-Process Communication Message Send Failure - System (4)
Unexpected Inter-Process Communication Message Received - System (5)
Index List Insert Error - System (6)
Index List Remove Error - System (7)
Thread Creation Failure - System (8)
Timer Start Failure - System (9)
Index Update Registration Error - System (10)
Index Table Add-Entry Error - System (11)
Software Error - System (12)
Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)
Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)
Troubleshooting System Alarms
Inter-Process Communication Queue Read Failure - System (2)
Inter-Process Communication Message Allocate Failure - System (3)
Inter-Process Communication Message Send Failure - System (4)
Index List Insert Error - System (6)
Index List Remove Error - System (7)
Thread Creation Failure - System (8)
Index Update Registration Error - System (10)
Index Table Add Entry Error - System (11)
Software Error - System (12)
Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)
Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)
System Troubleshooting
Revised: July 22, 2009, OL-8000-32
Introduction
This chapter provides the information needed to monitor and troubleshoot System events and alarms. This chapter is divided into the following sections:
•
System Events and Alarms - Provides a brief overview of each System event and alarm.
•
Monitoring System Events - Provides the information needed to monitor and correct System events.
•
Troubleshooting System Alarms - Provides the information needed to troubleshoot and correct System alarms.
System Events and Alarms
This section provides a brief overview of the System events and alarms for the Cisco BTS 10200 Softswitch in numerical order. Table 12-1 lists all of the System events and alarms by severity.
Note
Click the System message number in Table 12-1 to display information about the event or alarm.
SYSTEM (1)
For additional information, refer to the "Test Report - System (1)" section.
DESCRIPTION
|
Test Report
|
SEVERITY
|
Information (INFO)
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
PRIMARY CAUSE
|
This is a test report for the new "SYSTEM" category.
|
PRIMARY ACTION
|
No action is required.
|
SYSTEM (2)
To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Queue Read Failure - System (2)" section.
DESCRIPTION
|
Inter-Process Communication Queue Read Failure
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Queue Name - STRING [20] Location Tag - STRING [30]
|
PRIMARY CAUSE
|
There is a problem with inter-process communication (IPC) communication.
|
PRIMARY ACTION
|
If problem persists, call Cisco Technical Assistance Center (TAC). (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (3)
To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Allocate Failure - System (3)" section.
DESCRIPTION
|
Inter-Process Communication Message Allocate Failure
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Requested Size - TWO_BYTES Error Code - FOUR_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
There is a system error or there is not enough free memory left to allocate a message buffer.
|
PRIMARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (4)
To troubleshoot and correct the cause of the alarm, refer to the "Inter-Process Communication Message Send Failure - System (4)" section.
DESCRIPTION
|
Inter-Process Communication Message Send Failure
|
SEVERITY
|
MINOR
|
THRESHOLD
|
50
|
THROTTLE
|
0
|
DATAWORDS
|
Error Code - FOUR_BYTES Destination Process - FOUR_BYTES Message Number - FOUR_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
The process for which the message is intended is not running.
|
PRIMARY ACTION
|
Check to ensure all components/processes are running. Attempt to restart any component/process that is not.
|
SECONDARY CAUSE
|
An internal error has occurred.
|
SECONDARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (5)
To monitor and correct the cause of the event, refer to the "Unexpected Inter-Process Communication Message Received - System (5)" section.
DESCRIPTION
|
Unexpected Inter-Process Communication Message Received
|
SEVERITY
|
WARNING
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Source Process Type - ONE_BYTE Source Thread Type - ONE_BYTE Message Number - TWO_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
The process reporting the event is receiving messages it is not expecting.
|
PRIMARY ACTION
|
Call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (6)
To troubleshoot and correct the cause of the alarm, refer to the "Index List Insert Error - System (6)" section.
DESCRIPTION
|
Index List Insert Error
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
List Name - STRING [20] Index of Entry Being - FOUR_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
An internal error has occurred.
|
PRIMARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (7)
To troubleshoot and correct the cause of the alarm, refer to the "Index List Remove Error - System (7)" section.
DESCRIPTION
|
Index List Remove Error
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
List Name - STRING [20] Index of Entry Being - FOUR_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
An internal error has occurred.
|
PRIMARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (8)
To troubleshoot and correct the cause of the alarm, refer to the "Thread Creation Failure - System (8)" section.
DESCRIPTION
|
Thread Creation Failure
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Error Code - FOUR_BYTES Thread Name - STRING [20] Location Tag - STRING [30]
|
PRIMARY CAUSE
|
An internal error has occurred. A process was unable to create one of its threads.
|
PRIMARY ACTION
|
Attempt to restart the node on which the error occurred. If the same error occurs, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (9)
To monitor and correct the cause of the event, refer to the "Timer Start Failure - System (9)" section.
DESCRIPTION
|
Timer Start Failure
|
SEVERITY
|
WARNING
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Timer Type - STRING [20] Location Tag - STRING [30]
|
PRIMARY CAUSE
|
Process was unable to start a platform timer.
|
PRIMARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (10)
To troubleshoot and correct the cause of the alarm, refer to the "Index Update Registration Error - System (10)" section.
DESCRIPTION
|
Index Update Registration Error
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Error Code - FOUR_BYTES Table Name - STRING [20] Location Tag - STRING [30]
|
PRIMARY CAUSE
|
Application unsuccessfully requested to be notified of table changes.
|
PRIMARY ACTION
|
Call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (11)
To troubleshoot and correct the cause of the alarm, refer to the "Index Table Add Entry Error - System (11)" section.
DESCRIPTION
|
Index Table Add Entry Error
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Table Name - STRING [20] Index of Entry Being - FOUR_BYTES Error Code - FOUR_BYTES Location Tag - STRING [30]
|
PRIMARY CAUSE
|
An internal error has occurred.
|
PRIMARY ACTION
|
If problem persists, call Cisco TAC. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (12)
To troubleshoot and correct the cause of the alarm, refer to the "Software Error - System (12)" section.
DESCRIPTION
|
Software Error
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Context Description - STRING [80] FileName - STRING [20] Line Number of Code - TWO_BYTES Error Specific Information - STRING [80]
|
PRIMARY CAUSE
|
Logic path is not handled by algorithm in code.
|
PRIMARY ACTION
|
Save trace log around the time of occurrence and notify Cisco Systems, Inc. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
SYSTEM (13)
To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)" section.
DESCRIPTION
|
Multiple Readers and Multiple Writers Maximum Queue Depth Reached
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
High Mark for Queue Depth - FOUR_BYTES Low Mark for Queue Depth - FOUR_BYTES
|
PRIMARY CAUSE
|
Messages flooding from a malfunctioning network element.
|
PRIMARY ACTION
|
Check messages to process.
|
SECONDARY CAUSE
|
Resource congestion or slow processing of messages from queue.
|
SECONDARY ACTION
|
Check the process and system resources. May need failover.
|
SYSTEM (14)
To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)" section.
DESCRIPTION
|
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Lower Queue Depth Limit - FOUR_BYTES Higher Queue Depth Limit - FOUR_BYTES
|
PRIMARY CAUSE
|
High rate of messages being received from the network.
|
PRIMARY ACTION
|
Check messages to the system.
|
SECONDARY CAUSE
|
System or processing thread congestion.
|
SECONDARY ACTION
|
Check process and system resources.
|
SYSTEM (15)
To troubleshoot and correct the cause of the alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)" section.
DESCRIPTION
|
Multiple Readers and Multiple Writers Throttle Queue Depth Reached
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Throttle Mark for Queue Depth - FOUR_BYTES Throttle Clear Mark for Queue De - FOUR_BYTES
|
PRIMARY CAUSE
|
Inbound network messages arriving at a rate much higher than processing capacity.
|
PRIMARY ACTION
|
Determine the cause of increase in inbound network traffic, and try to control the traffic externally.
|
SECONDARY CAUSE
|
Resource congestion resulting in a slow down in processing messages from queue.
|
SECONDARY ACTION
|
Check the platform CPU utilization, IPC queue depths and overall availability of system resources.
|
Monitoring System Events
This section provides the information needed to monitor and correct System events. Table 12-2 lists all System events in numerical order and provides cross reference to each subsection in this section.
Test Report - System (1)
The Test Report event is for testing the system event category. The event is informational and no further action is required.
Inter-Process Communication Queue Read Failure - System (2)
The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. To troubleshoot and correct the cause of the Inter-Process Communication Queue Read Failure alarm, refer to the "Inter-Process Communication Queue Read Failure - System (2)" section.
Inter-Process Communication Message Allocate Failure - System (3)
The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Allocate Failure alarm, refer to the "Inter-Process Communication Message Allocate Failure - System (3)" section.
Inter-Process Communication Message Send Failure - System (4)
The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. To troubleshoot and correct the cause of the Inter-Process Communication Message Send Failure alarm, refer to the "Inter-Process Communication Message Send Failure - System (4)" section.
Unexpected Inter-Process Communication Message Received - System (5)
The Unexpected Inter-Process Communication Message Received event serves as a warning that an unexpected IPC message was received. The primary cause of the event is that the IPC process is receiving messages it is not expecting. To correct the primary cause of the event, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index List Insert Error - System (6)
The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. To troubleshoot and correct the cause of the Index List Insert Error alarm, refer to the "Index List Insert Error - System (6)" section.
Index List Remove Error - System (7)
The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. To troubleshoot and correct the cause of the Index List Remove Error alarm, refer to the "Index List Remove Error - System (7)" section.
Thread Creation Failure - System (8)
The Thread Creation Failure alarm (major) indicates that a thread creation has failed. To troubleshoot and correct the cause of the Thread Creation Failure alarm, refer to the "Thread Creation Failure - System (8)" section.
Timer Start Failure - System (9)
The Timer Start Failure event serves as a warning that a timer start failure has occurred. The primary cause of the event is that the process was unable to start a platform timer. To correct the primary cause of the event, check and see if the problem persists. If the problem persists, call Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index Update Registration Error - System (10)
The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. To troubleshoot and correct the cause of the Index Update Registration Error alarm, refer to the "Index Update Registration Error - System (10)" section.
Index Table Add-Entry Error - System (11)
The Index Table Add-entry Error alarm (minor) indicates that an error occurred while adding an entry in the index table. To troubleshoot and correct the cause of the Index Table Add-entry Error alarm, refer to the "Index Table Add Entry Error - System (11)" section.
Software Error - System (12)
The Software Error alarm (major) indicates that a software error has occurred. To troubleshoot and correct the cause of the Software Error alarm, refer to the "Software Error - System (12)" section.
Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)
The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the multiple readers and multiple writers (MRMW) maximum queue depth has been reached. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)" section.
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)
The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm, refer to the "Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)" section.
Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)
The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached the throttle depth. To troubleshoot and correct the cause of the Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm, refer to the "Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)" section.
Troubleshooting System Alarms
This section provides the information needed to monitor and correct System alarms. Table 12-3 lists all System alarms in numerical order and provides cross reference to each subsection in this section.
Inter-Process Communication Queue Read Failure - System (2)
The Inter-Process Communication Queue Read Failure alarm (minor) indicates that the IPC queue read has failed. The primary cause of the alarm is that there is a problem with IPC communication. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Inter-Process Communication Message Allocate Failure - System (3)
The Inter-Process Communication Message Allocate Failure alarm (minor) indicates that the IPC message allocation has failed. The primary cause of the alarm is that there is a system error, or there is not enough free memory left to allocate a message buffer. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Inter-Process Communication Message Send Failure - System (4)
The Inter-Process Communication Message Send Failure alarm (minor) indicates that the IPC message send has failed. The primary cause of the alarm is that the process for which the message is intended is not running. To correct the primary cause of the alarm, check to ensure that all components and processes are running. Attempt to restart any component or process that is not running. The secondary cause of the alarm is that an internal error has occurred. To correct the secondary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index List Insert Error - System (6)
The Index List Insert Error alarm (minor) indicates that an error has been inserted in the index list. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index List Remove Error - System (7)
The Index List Remove Error alarm (minor) indicates that an index list remove error has occurred. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Thread Creation Failure - System (8)
The Thread Creation Failure alarm (major) indicates that a thread creation has failed. The primary cause of the alarm is that an internal error occurred. A process was unable to create one of its threads. To correct the primary cause of the alarm, attempt to restart the node on which the error occurred. If the same alarm occurs, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index Update Registration Error - System (10)
The Index Update Registration Error alarm (minor) indicates that an index update registration error has occurred. The primary cause of the alarm is that an application unsuccessfully requested to be notified of table changes. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Index Table Add Entry Error - System (11)
The Index Table Add Entry Error alarm (minor) indicates that an error occurred while adding an entry in the index table. The primary cause of the alarm is that an internal error has occurred. To correct the primary cause of the alarm, contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Software Error - System (12)
The Software Error alarm (major) indicates that a software error has occurred. The primary cause of the alarm is that a logic path is not handled by algorithm in the code. To correct the primary cause of the alarm, save the trace log from around the time of occurrence and contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Multiple Readers and Multiple Writers Maximum Q Depth Reached - System (13)
The Multiple Readers and Multiple Writers Maximum Q Depth Reached alarm (critical) indicates that the MRMW maximum queue depth has been reached. The primary cause of the alarm is message flooding from an erratic network element. To correct the primary cause of the alarm, check the messages to process, The secondary cause of the alarm is resource congestion or slow processing of messages from queue. To correct the secondary cause of the alarm, check the process and system resources. The system may need to be failed over.
Multiple Readers and Multiple Writers Queue Reached Low Queue Depth - System (14)
The Multiple Readers and Multiple Writers Queue Reached Low Queue Depth alarm (minor) indicates that the MRMW queue has reached the low queue depth threshold. The primary cause of the alarm is a high rate of messages from the network. To correct the primary cause of the alarm, check the messages to the system. The secondary cause of the alarm is system or processing thread congestion. To correct the secondary cause of the alarm, check process and system resources.
Multiple Readers and Multiple Writers Throttle Queue Depth Reached - System (15)
The Multiple Readers and Multiple Writers Throttle Queue Depth Reached alarm (major) indicates that the MRMW queue has reached the throttle depth. The primary cause of the alarm is that inbound network messages arriving at a rate much higher than processing capacity. To correct the primary cause of the alarm, determine the cause of increase in inbound network traffic, and try to control the traffic externally. The secondary cause of the alarm is that there is resource congestion resulting in a slow down in processing messages from queue. To correct the secondary cause of the alarm, check the platform CPU utilization, IPC queue depths and overall availability of system resources.