Table Of Contents
Database Troubleshooting
Introduction
Database Events and Alarms
DATABASE (1)
DATABASE (2)
DATABASE (3)
DATABASE (4)
DATABASE (5)
DATABASE (6)
DATABASE (7)
DATABASE (8)
DATABASE (9)
DATABASE (10)
DATABASE (11)
DATABASE (12)
DATABASE (13)
DATABASE (14)
DATABASE (15)
DATABASE (16)
DATABASE (17)
DATABASE (18)
DATABASE (19)
DATABASE (20)
DATABASE (21)
DATABASE (25)
DATABASE (26)
Monitoring Database Events
Test Report - Database (1)
Database Management Update Failure: Master/Slave Database Out of Sync - Database (2)
There are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)
Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)
Element Management System Database HeartBeat Process Died - Database (5)
Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)
Element Management System Database DefTran Queue is Overloaded - Database (7)
Element Management System Database Tablespace is Out of Free Space - Database (8)
Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)
Element Management System Database: Backup Fails - Database (10)
Element Management System Database Alert.log Alerts - Database (11)
Element Management System Database Process Died - Database (12)
Element Management System Database Performance Alert - Database (13)
Table Size Exceeds Minor Threshold Limit - Database (14)
Table Size Exceeds Major Threshold Limit - Database (15)
Table Size Exceeds Critical Threshold Limit - Database (16)
Data Replication Failed - Database (17)
Unexpected Runtime Data Interaction - Database (18)
Daily Database Backup Completed Successfully - Database (19)
Replication Data Flush Timeout During Switchover - Database (20)
Database Statistics Collection Exception - Database (21)
Secure File Transfer Protocol Transfer Failed—Database (25)
File Write Error—Database (26)
Troubleshooting Database Alarms
There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)
Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)
Element Management System Database HeartBeat Process Died - Database (5)
Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)
Element Management System Database DefTran Queue is Overloaded - Database (7)
Element Management System Database Tablespace is Out of Free Space - Database (8)
Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)
Element Management System Database: Backup Fails - Database (10)
Element Management System Database Alert.log Alerts - Database (11)
Element Management System Database Process Died - Database (12)
Element Management System Database Performance Alert - Database (13)
Table Size Exceeds Minor Threshold Limit - Database (14)
Table Size Exceeds Major Threshold Limit - Database (15)
Table Size Exceeds Critical Threshold Limit - Database (16)
Data Replication Failed - Database (17)
Replication Data Flush Timeout During Switchover - Database (20)
Database Statistics Collection Exception - Database (21)
Secure File Transfer Protocol Transfer Failed—Database (25)
File Write Error—Database (26)
Database Troubleshooting
Revised: July 22, 2009, OL-8000-32
Introduction
This chapter provides the information needed to monitor and troubleshoot Database events and alarms. This chapter is divided into the following sections:
•
Database Events and Alarms - Provides a brief overview of each Database event and alarm.
•
Monitoring Database Events - Provides the information needed to monitor and correct Database events.
•
Troubleshooting Database Alarms - Provides the information needed to troubleshoot and correct Database alarms.
Database Events and Alarms
This section provides a brief overview of the Database events and alarms for the Cisco BTS 10200 Softswitch in numerical order. Table 6-1 lists all Database events and alarms by severity.
Note
Click the Database message number in Table 6-1 to display information about the event or alarm.
DATABASE (1)
For additional information, refer to the "Test Report - Database (1)" section.
DESCRIPTION
|
Test Report
|
SEVERITY
|
Information (INFO)
|
THRESHOLD
|
10000
|
THROTTLE
|
0
|
DATABASE (2)
To monitor and correct the cause of the event, refer to the "Database Management Update Failure: Master/Slave Database Out of Sync - Database (2)" section.
DESCRIPTION
|
Database Management Update Failure: Master/Slave Database Out of Sync
|
SEVERITY
|
WARNING
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Error Code - TWO_BYTES Error String - STRING [20] Provisioning String - STRING [80]
|
PRIMARY CAUSE
|
The master database under Oracle control in the Element Management System (EMS) was successfully updated, but the subsequent update of the shared memory tables in the Call Agents (CAs) and/or Feature Servers (FSs) were not properly updated.
|
PRIMARY ACTION
|
Perform an audit of the database in question to correct the data stored in shared memory.
|
SECONDARY ACTION
|
Use command line interface (CLI) to show and delete the transaction queue, and to audit and manage the queue.
|
DATABASE (3)
To troubleshoot and correct the cause of the alarm, refer to the "There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)" section.
DESCRIPTION
|
There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Error Count - ONE_BYTE Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Replication data conflicts.
|
PRIMARY ACTION
|
May require manual update on database tables. Call Cisco Support. (Contact Cisco Technical Assistance Center (TAC).)
|
SECONDARY CAUSE
|
Update/delete on non-existing data.
|
TERNARY CAUSE
|
Unique constraint (primary key) violated.
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (4)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)" section.
DESCRIPTION
|
Element Management System Database HeartBeat: Replication PUSH Job Broken
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Local Database - STRING [10] Remote Database - STRING [10] Job - STRING [5] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Remote database is not accessible.
|
PRIMARY ACTION
|
Re-start up or restore remote database.
|
SECONDARY CAUSE
|
Remote database is down.
|
SECONDARY ACTION
|
Restart remote listener process.
|
TERNARY CAUSE
|
Remote Oracle listener process died.
|
TERNARY ACTION
|
Correct network connection problem.
|
SUBSEQUENT CAUSE
|
Network connection is broken.
|
DATABASE (5)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database HeartBeat Process Died - Database (5)" section.
DESCRIPTION
|
Element Management System Database HeartBeat Process Died
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Terminated by the system manager program (SMG) or stopped by the platform.
|
PRIMARY ACTION
|
Restart DBHeartBeat by "dbinit -H -i start" as oracle user, or by platform start command as root.
|
DATABASE (6)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)" section.
DESCRIPTION
|
Element Management System Database Replication DefTranDest Queue Overloaded
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Threshold - FOUR_BYTES Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Replication PUSH job is broken.
|
PRIMARY ACTION
|
Correct problems on remote database.
|
SECONDARY CAUSE
|
Remote database is not accessible.
|
SECONDARY ACTION
|
Make sure db_heart_beat process is up.
|
TERNARY CAUSE
|
Database is overloaded.
|
TERNARY ACTION
|
Troubleshoot database performance.
|
DATABASE (7)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database DefTran Queue is Overloaded - Database (7)" section.
DESCRIPTION
|
Element Management System Database DefTran Queue is Overloaded
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Threshold - TWO_BYTES Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Replication DefTranDest queue is overloaded.
|
PRIMARY ACTION
|
Resume replication activities.
|
SECONDARY CAUSE
|
Too many errors in DefError queue.
|
SECONDARY ACTION
|
Correct replication errors.
|
TERNARY CAUSE
|
Replication PURGE job is broken or overloaded.
|
TERNARY ACTION
|
Enable replication PURGE job.
|
DATABASE (8)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database Tablespace is Out of Free Space - Database (8)" section.
DESCRIPTION
|
Element Management System Database Tablespace is Out of Free Space
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Tablespace Name - STRING [30] Total Free Space - TWO_BYTES Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Increased data volume or transactions.
|
PRIMARY ACTION
|
Add more space to tablespace.
|
DATABASE (9)
To troubleshoot and correct the cause of the alarm, refer to the "Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)" section.
DESCRIPTION
|
Urgent: Element Management System Database Archive Log Directory is Getting Full
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Directory Name - STRING [100] Free Space - TWO_BYTES Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Transaction volume increased.
|
PRIMARY ACTION
|
Backup and cleanup archive log files.
|
SECONDARY ACTION
|
Add more space to archive log directory.
|
DATABASE (10)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database: Backup Fails - Database (10)" section.
DESCRIPTION
|
Element Management System Database: Backup Fails
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Message 1 - STRING [200] Message 2 - STRING [200] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
System or hardware unstable.
|
PRIMARY ACTION
|
Re-start backup process.
|
DATABASE (11)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database Alert.log Alerts - Database (11)" section.
DESCRIPTION
|
Element Management System Database Alert.log Alerts
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Message 1 - STRING [200] Message 2 - STRING [200] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
The probable cause of the ORA- error reported in this event is documented in the $ORACLE_HOME/rdbms/mesg/oraus.msg file. Login to EMS system as oracle user (or su - oracle) to view this file. If more information is needed, contact Cisco TAC support, or query Oracle Metalink library at http://metalink.oracle.com.
|
PRIMARY ACTION
|
The corrective action is documented in the $ORACLE_HOME/rdbms/mesg/oraus.msg file. To view the file, login to EMS system as root, then su - oracle. After the situation is corrected, user will need to manually clear this event. This event does not automatically clear the alarm. If further support is needed, contact Cisco TAC support, or query Oracle Metalink library at http://metalink.oracle.com. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (12)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database Process Died - Database (12)" section.
DESCRIPTION
|
Element Management System Database Process Died
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Error Source - STRING [40] Message - STRING [200] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
Possible Error Source: 1. Process Name, if local process is not running. 2. "Cannot_connect_database" if local database (DB) is unreachable. 3. "Cannot_connect_" if remote DB is unreachable.
|
PRIMARY ACTION
|
Re-start process.
|
SECONDARY ACTION
|
Contact Cisco Support. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (13)
To troubleshoot and correct the cause of the alarm, refer to the "Element Management System Database Performance Alert - Database (13)" section.
DESCRIPTION
|
Element Management System Database Performance Alert
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Stat Event Name - STRING [80] Value 1 - STRING [50] Value 2 - FOUR_BYTES Message - STRING [200] Time Stamp - STRING [20]
|
PRIMARY CAUSE
|
See Stat Event Name dataword.
|
PRIMARY ACTION
|
Contact Cisco Support. (Contact Cisco TAC.)
|
SECONDARY ACTION
|
Perform database performance tuning.
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (14)
To troubleshoot and correct the cause of the alarm, refer to the "Table Size Exceeds Minor Threshold Limit - Database (14)" section.
DESCRIPTION
|
Table Size Exceeds Minor Threshold Limit
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Table Name - STRING [32]
|
PRIMARY CAUSE
|
The pre-provisioned size for the stated table is nearing the licensed limit on the number of entries it can hold.
|
PRIMARY ACTION
|
Contact Cisco to purchase additional entry space for this particular table. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (15)
To troubleshoot and correct the cause of the alarm, refer to the "Table Size Exceeds Major Threshold Limit - Database (15)" section.
DESCRIPTION
|
Table Size Exceeds Major Threshold Limit
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Table Name - STRING [32]
|
PRIMARY CAUSE
|
Major threshold limit exceeded.
|
PRIMARY ACTION
|
N/A
|
DATABASE (16)
To troubleshoot and correct the cause of the alarm, refer to the "Table Size Exceeds Critical Threshold Limit - Database (16)" section.
DESCRIPTION
|
Table Size Exceeds Critical Threshold Limit
|
SEVERITY
|
CRITICAL
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Table Name - STRING [32]
|
PRIMARY CAUSE
|
Critical threshold limit exceeded.
|
PRIMARY ACTION
|
N/A
|
DATABASE (17)
To troubleshoot and correct the cause of the alarm, refer to the "Data Replication Failed - Database (17)" section.
DESCRIPTION
|
Data Replication Failed
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Replication-Stage - STRING [40] Table Name - STRING [40] Index - FOUR_BYTES Table ID - TWO_BYTES
|
PRIMARY CAUSE
|
Index out of range.
|
SECONDARY CAUSE
|
Record size mismatch.
|
DATABASE (18)
To monitor and correct the cause of the event, refer to the "Unexpected Runtime Data Interaction - Database (18)" section.
DESCRIPTION
|
Unexpected Runtime Data Interaction
|
SEVERITY
|
WARNING
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Internal/External In - STRING [10] Table Name - STRING [32] Table Entry - STRING [10] Table Field Name - STRING [32] Descriptive Data 1 - STRING [64] Descriptive Data 2 - STRING [64] Descriptive Data 3 - STRING [64] Descriptive Data 4 - STRING [64]
|
PRIMARY CAUSE
|
Unexpected data interaction has been detected at runtime in the call agent or feature server.
|
PRIMARY ACTION
|
Collect logs and contact Cisco Support. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
DATABASE (19)
For additional information, refer to the "Daily Database Backup Completed Successfully - Database (19)" section.
DESCRIPTION
|
Daily Database Backup Completed Successfully
|
SEVERITY
|
INFO
|
THRESHOLD
|
0
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [60] ORACLE_SID - STRING [30] Process - STRING [60] Message 1 - STRING [100] Message 2 - STRING [100] Message 3 - STRING [100]
|
PRIMARY CAUSE
|
Normal operation.
|
PRIMARY ACTION
|
N/A
|
DATABASE (20)
To troubleshoot and correct the cause of the alarm, refer to the "Replication Data Flush Timeout During Switchover - Database (20)" section.
DESCRIPTION
|
Replication Data Flush Timeout During Switchover
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Tables Failed - STRING [20]
|
PRIMARY CAUSE
|
Replication module software problem.
|
PRIMARY ACTION
|
Database restore procedure needs to be executed on side which goes active after switchover. Alarm should be cleared manually after recovery action is taken.
|
DATABASE (21)
To troubleshoot and correct the cause of the alarm, refer to the "Database Statistics Collection Exception - Database (21)" section.
DESCRIPTION
|
Database Statistics Collection Exception
|
SEVERITY
|
MINOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Host Name - STRING [30] Database Name - STRING [10] Schema Name - STRING [32] Object Name - STRING [64] Task Name - STRING [64] Exception - STRING [256]
|
PRIMARY CAUSE
|
Check the messages in the exception field to identify the cause of the error.
|
PRIMARY ACTION
|
The correction action varies and is determined by the type of exception. For more information about the ORA-xxxxx errors, execute "oerr ora xxxxx" command as oracle user.
|
Note
DATABASE (22) through DATABASE (24) are not used.
DATABASE (25)
To troubleshoot and correct the cause of the alarm, refer to "Secure File Transfer Protocol Transfer Failed—Database (25)" section.
DESCRIPTION
|
Secure File Transfer Protocol Transfer Failed
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
FileName - STRING [128] Error - STRING [50]
|
PRIMARY CAUSE
|
Unable to connect between active and standby call agents.
|
PRIMARY ACTION
|
Verify communication between primary and CA. On each CA, ping the other node.
|
SECONDARY CAUSE
|
Unable to login to remote host.
|
SECONDARY ACTION
|
Verify that secure shell (SSH) keys have been pre-configured for user root on both active and standby call agents.
|
TERNARY CAUSE
|
File transfer error.
|
TERNARY ACTION
|
Check the Error dataword to see if it gives an indication of the kind of error that occurred. It could be a file-system error on the remote host, or a communication failure between the active and standby call agents.
|
DATABASE (26)
To troubleshoot and correct the cause of the alarm, refer to "File Write Error—Database (26)" section.
DESCRIPTION
|
File Write Error
|
SEVERITY
|
MAJOR
|
THRESHOLD
|
100
|
THROTTLE
|
0
|
DATAWORDS
|
Path Name - STRING [128]
|
PRIMARY CAUSE
|
System error, may be out of file descriptors.
|
PRIMARY ACTION
|
Call Cisco TAC technical support. (Contact Cisco TAC.)
|
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Monitoring Database Events
This section provides the information needed to monitor and correct Database events. Table 6-2 lists all Database events in numerical order and provides cross reference to each subsection in this section.
Test Report - Database (1)
The Test Report is for testing the database event category. The event is informational and no further action is required.
Database Management Update Failure: Master/Slave Database Out of Sync - Database (2)
The Database Management Update Failure: Master/Slave Database Out of Sync event functions as a warning that master and slave databases are out of sync. The primary cause of the event is that the master database under Oracle control in the EMS was successfully updated, but the subsequent update of the shared memory tables in the Call Agent (CA) servers and/or Feature Servers (FS) were not properly updated. To correct the primary cause of the event, perform an audit of the database in question to correct the data stored in shared memory. Additionally, use the CLI to show and delete the transaction queue, and to audit and manage the queue.
There are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)
The There are Errors in Element Management System Database DefError Queue; Contact Database Administrator alarm (critical) indicates that there are errors in the EMS database DefError queue. To troubleshoot and correct the cause of the There are Errors in Element Management System Database DefError Queue; Contact Database Administrator alarm, refer to the "There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)" section.
Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)
The Element Management System Database HeartBeat: Replication PUSH Job Broken alarm (critical) indicates that the replication PUSH job is broken. To troubleshoot and correct the cause of the of the Element Management System Database HeartBeat: Replication PUSH Job Broken alarm, refer to the "Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)" section.
Element Management System Database HeartBeat Process Died - Database (5)
The Element Management System Database HeartBeat Process Died alarm (critical) indicates that the EMS database heartbeat process has died. To troubleshoot and correct the cause of the Element Management System Database HeartBeat Process Died alarm, refer to the "Element Management System Database HeartBeat Process Died - Database (5)" section.
Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)
The Element Management System Database Replication DefTranDest Queue Overloaded alarm (major) indicates that the EMS database replication DefTranDest queue is overloaded. To troubleshoot and correct the cause of the Element Management System Database Replication DefTranDest Queue Overloaded alarm, refer to the "Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)" section.
Element Management System Database DefTran Queue is Overloaded - Database (7)
The Element Management System Database DefTran Queue is Overloaded alarm (minor) indicates that the EMS database DefTran queue is overloaded. To troubleshoot and correct the cause of the Element Management System Database DefTran Queue is Overloaded alarm, refer to the "Element Management System Database DefTran Queue is Overloaded - Database (7)" section.
Element Management System Database Tablespace is Out of Free Space - Database (8)
The Element Management System Database Tablespace is Out of Free Space alarm (major) indicates that the EMS database table space is out of free space. To troubleshoot and correct the cause of the Element Management System Database Tablespace is Out of Free Space alarm, refer to the "Element Management System Database Tablespace is Out of Free Space - Database (8)" section.
Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)
The Urgent: Element Management System Database Archive Log Directory is Getting Full alarm (critical) indicates that the EMS database archive log directory is getting full. To troubleshoot and correct the cause of the Urgent: Element Management System Database Archive Log Directory is Getting Full alarm, refer to the "Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)" section.
Element Management System Database: Backup Fails - Database (10)
The Element Management System Database: Backup Fails alarm (major) indicates that the EMS database backup has failed. To troubleshoot and correct the cause of the Element Management System Database: Backup Fails alarm, refer to the "Element Management System Database: Backup Fails - Database (10)" section.
Element Management System Database Alert.log Alerts - Database (11)
The Element Management System Database Alert.log Alerts alarm (major) indicates that the EMS database alerts are being received and logged into the alert log. To troubleshoot and correct the cause of the Element Management System Database Alert.log Alerts alarm, refer to the "Element Management System Database Alert.log Alerts - Database (11)" section.
Element Management System Database Process Died - Database (12)
The Element Management System Database Process Died alarm (critical) indicates that the EMS database process has died. To troubleshoot and correct the cause of the Element Management System Database Process Died alarm, refer to the "Element Management System Database Process Died - Database (12)" section.
Element Management System Database Performance Alert - Database (13)
The Element Management System Database Performance Alert alarm (major) indicates that the EMS database performance has degraded. To troubleshoot and correct the cause of the Element Management System Database Performance Alert alarm, refer to the "Element Management System Database Performance Alert - Database (13)" section.
Table Size Exceeds Minor Threshold Limit - Database (14)
The Table Size Exceeds Minor Threshold Limit alarm (minor) indicates that the table size has exceeded the minor threshold crossing limit. To troubleshoot and correct the cause of the Table Size Exceeds Minor Threshold Limit alarm, refer to the "Table Size Exceeds Minor Threshold Limit - Database (14)" section.
Table Size Exceeds Major Threshold Limit - Database (15)
The Table Size Exceeds Major Threshold Limit alarm (major) indicates that the table size has exceeded the major threshold crossing limit. To troubleshoot and correct the cause of the Table Size Exceeds Major Threshold Limit alarm, refer to the "Table Size Exceeds Major Threshold Limit - Database (15)" section.
Table Size Exceeds Critical Threshold Limit - Database (16)
The Table Size Exceeds Critical Threshold Limit alarm (critical) indicates that the table size has exceeded the critical threshold crossing limit. To troubleshoot and correct the cause of the Table Size Exceeds Critical Threshold Limit alarm, refer to the "Table Size Exceeds Critical Threshold Limit - Database (16)" section.
Data Replication Failed - Database (17)
The Data Replication Failed alarm (major) indicates that the data replication failed. To troubleshoot and correct the cause of the Data Replication Failed alarm, refer to the "Data Replication Failed - Database (17)" section.
Unexpected Runtime Data Interaction - Database (18)
The Unexpected Runtime Data Interaction event functions as a warning that an unexpected runtime data interaction has occurred. The primary cause of the event is that an unexpected data interaction has been detected at runtime in the call agent or feature server. To correct the primary cause of the event, collect the logs and contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Daily Database Backup Completed Successfully - Database (19)
The Daily Database Backup Completed Successfully event functions as an informational alert that the daily database backup has completed successfully. The event is informational and no further action is required.
Replication Data Flush Timeout During Switchover - Database (20)
The Replication Data Flush Timeout During Switchover alarm (major) indicates that the replication data flush timed out during a switchover. To troubleshoot and correct the cause of the Replication Data Flush Timeout During Switchover alarm, refer to the "Replication Data Flush Timeout During Switchover - Database (20)" section.
Database Statistics Collection Exception - Database (21)
The Database Statistics Collection Exception alarm (minor) indicates that the database statistics collection process had an exception. To troubleshoot and correct the cause of the Database Statistics Collection Exception alarm, refer to the "Database Statistics Collection Exception - Database (21)" section.
Secure File Transfer Protocol Transfer Failed—Database (25)
The Secure File Transfer Protocol Transfer Failed alarm (alarm) indicates that a SFTP file transfer had failed. To troubleshoot and correct the cause of the Secure File Transfer Protocol Transfer Failed alarm, refer to the "Secure File Transfer Protocol Transfer Failed—Database (25)" section.
File Write Error—Database (26)
The File Write Error alarm (major) indicates that a file write error has occurred. To troubleshoot and correct the cause of the File Write Error alarm, refer to the "File Write Error—Database (26)" section.
Troubleshooting Database Alarms
This section provides the information needed to monitor and correct Database alarms. Table 6-3 lists all Database alarms in numerical order and provides cross reference to each subsection in this section.
There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator - Database (3)
The There Are Errors in Element Management System Database DefError Queue; Contact Database Administrator alarm (critical) indicates that there are errors in the EMS database DefError queue. The primary cause of the alarm is that replication data conflicts have occurred. The additional causes of the alarm are that a request for update or delete on non-existing data occurred, or an unique constraint (primary key) was violated. Correcting the cause of the alarm may require a manual update on database tables. Contact Cisco TAC for assistance. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Prior to opening the Cisco TAC service request, take the following steps:
Step 1
Login to the EMS reported in the "Hostname" dataword and execute the following commands and collect the reported information.
dbadm -r get_defcall_order
Step 2
Check the collected information against any known field notices at the following link:
http://www.cisco.com/en/US/products/hw/vcallcon/ps531/prod_field_notices_list.html
Element Management System Database HeartBeat: Replication PUSH Job Broken - Database (4)
The Element Management System Database HeartBeat: Replication PUSH Job Broken alarm (critical) indicates that the replication PUSH job is broken. The primary cause of the alarm is that the remote database is down or the remote database is not accessible. To correct the primary cause of the alarm, re-start or restore remote database. The secondary cause of the alarm is that network connections in broken. To correct the secondary cause of the alarm, correct the network connection problem. The ternary cause of the alarm is that the remote Oracle Listener process died. To correct the ternary cause of the alarm, restart the remote Listener process.
Element Management System Database HeartBeat Process Died - Database (5)
The Element Management System Database HeartBeat Process Died alarm (critical) indicates that the EMS database heartbeat process has died. The primary cause of the alarm is that the EMS DBHeartBeat Process was terminated by SMG, or stopped by the platform. To correct the primary cause of the alarm, restart the DBHeartBeat by executing the "dbinit -H -i start" command as an Oracle user, or by executing the platform start command as a root user.
Element Management System Database Replication DefTranDest Queue Overloaded - Database (6)
The Element Management System Database Replication DefTranDest Queue Overloaded alarm (major) indicates that the EMS database replication DefTranDest queue is overloaded. The primary cause of the alarm is that the replication PUSH job is broken. To correct the primary cause of the alarm, correct the problems on remote database. The secondary cause of the alarm is that the remote database is not accessible. To correct the secondary cause of the alarm, verify that the db_heart_beat process is up. The ternary cause of the alarm is that the database is overloaded. To correct the ternary cause of the alarm, troubleshoot database performance.
Element Management System Database DefTran Queue is Overloaded - Database (7)
The Element Management System Database DefTran Queue is Overloaded alarm (minor) indicates that the EMS database DefTran queue is overloaded. The primary cause of the alarm is that the replication DefTranDest queue is overloaded. To correct the primary cause of the alarm, resume replication activities. The secondary cause of the alarm is that there are too many errors in DefError queue. To correct the secondary cause of the alarm, correct the replication errors. The ternary cause of the alarm is the replication PURGE job is broken or overloaded. To correct the ternary cause of the alarm, enable the replication PURGE job.
Element Management System Database Tablespace is Out of Free Space - Database (8)
The Element Management System Database Tablespace is Out of Free Space alarm (major) indicates that the EMS database table space is out of free space. The primary cause of the alarm is that there has been an increase in data volume or transactions. To correct the primary cause of the alarm, add more space to the tablespace.
Urgent: Element Management System Database Archive Log Directory is Getting Full - Database (9)
The Urgent: Element Management System Database Archive Log Directory is Getting Full alarm (critical) indicates that the EMS database archive log directory is getting full. The primary cause of the alarm is that transaction volume has increased. To correct the primary cause of the alarm, backup and cleanup the archive log files. Additionally, add more space to archive log directory.
Element Management System Database: Backup Fails - Database (10)
The Element Management System Database: Backup Fails alarm (major) indicates that the EMS database backup has failed. The primary cause of the alarm is that the system or hardware is unstable. To correct the primary cause of the alarm, re-start the backup process.
Element Management System Database Alert.log Alerts - Database (11)
The Element Management System Database Alert.log Alerts alarm (major) indicates that the EMS database alerts are being received and logged into the alert log. The probable cause of the ORA- error reported in this event is documented in the $ORACLE_HOME/rdbms/mesg/oraus.msg file. Login to EMS system as oracle user (or su - oracle) to view this file. If more information is needed, contact Cisco TAC support, or query Oracle Metalink library at http://metalink.oracle.com. The corrective action is documented in the $ORACLE_HOME/rdbms/mesg/oraus.msg file. To view the file, login to EMS system as root, then su - oracle. After the situation is corrected, user will need to manually clear this event. This event does not automatically clear the alarm. If further support is needed, contact Cisco TAC support, or query Oracle Metalink library at http://metalink.oracle.com. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.

Note
On rare occasions, the Database (11) alarm may report ORA-01595 and ORA-01594 errors in the alert.log file. These are not errors but Oracle reporting bugs that will be corrected in a future Oracle release. Oracle takes care of the internal processing contention automatically. There is no interruption to normal application operations and no loss of data. ORA-01595 and ORA-01594 errors can occur during concurrent processing situations that have long running transactions and other small transactions.
When the Database(11) alarm is reported with ORA-01595 and ORA-01594 errors, the database administrator should manually clear the alarm from CLI.
Element Management System Database Process Died - Database (12)
The Element Management System Database Process Died alarm (critical) indicates that the EMS database process has died. The primary possible causes of the alarm are:
•
Process Name, if local process is not running.
•
"Cannot_connect_database" if local DB is unreachable.
•
"Cannot_connect_" if remote DB is unreachable.
To correct the possible causes of the alarm, re-start process and contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Element Management System Database Performance Alert - Database (13)
The Element Management System Database Performance Alert alarm (major) indicates that the EMS database performance has degraded. To identify the primary cause of the alarm, check the "StatEventName" dataword information. To correct the primary cause of the alarm, perform database performance tuning and contact Cisco TAC. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Table Size Exceeds Minor Threshold Limit - Database (14)
The Table Size Exceeds Minor Threshold Limit alarm (minor) indicates that the table size has exceeded the minor threshold crossing limit. The primary cause of the alarm is that the pre-provisioned size for the stated table is nearing the licensed limit on the number of entries it can hold. To correct the primary cause of the alarm, contact Cisco TAC to purchase additional entry space for this particular table. Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.
Table Size Exceeds Major Threshold Limit - Database (15)
The Table Size Exceeds Major Threshold Limit alarm (major) indicates that the table size has exceeded the major threshold crossing limit. The primary cause of the Table Size Exceeds Major Threshold Limit alarm is the major threshold crossing limit has been exceeded. No further action is required.
Table Size Exceeds Critical Threshold Limit - Database (16)
The Table Size Exceeds Critical Threshold Limit alarm (critical) indicates that the table size has exceeded the critical threshold crossing limit. The primary cause of the alarm is that the critical threshold limit was exceeded. No corrective action is required.
Data Replication Failed - Database (17)
The Data Replication Failed alarm (major) indicates that the data replication failed. The primary cause of the alarm is that an index is out of range. The secondary cause of the alarm is that a record size mismatch occurred. No corrective action is required.
Replication Data Flush Timeout During Switchover - Database (20)
The Replication Data Flush Timeout During Switchover alarm (major) indicates that the replication data flush timed out during a switchover. The primary cause of the alarm is that a Replication Module software problem has occurred. To correct the primary cause of the alarm, a database restore procedure needs to be executed on the side which goes active after a switchover. The alarm should be cleared manually after recovery action is taken.
Database Statistics Collection Exception - Database (21)
The Database Statistics Collection Exception alarm (minor) indicates that the database statistics collection process had an exception. To identify the primary cause of the alarm, check the information listed in the "Exception" dataword field. The correction action varies and is determined by the type of exception. For more information about the ORA-xxxxx errors, execute an oerr ora xxxxx command as an Oracle user.
Secure File Transfer Protocol Transfer Failed—Database (25)
The Secure File Transfer Protocol Transfer Failed alarm (major) indicates that a secure file transfer has failed. The primary cause of the alarm is that the SFTP was unable to connect between active and standby call agents. To troubleshoot and correct the primary cause of the alarm, verify communication between primary and secondary call agent (CA). On each CA, ping the other node. The secondary cause of the alarm is that the system was unable to login to the remote host. To troubleshoot and correct the secondary cause of the alarm, verify that the SSH keys have been pre-configured for user root on both active and standby call agents. The ternary cause of the alarm is that a file transfer error has occurred. To troubleshoot and correct the ternary cause of the alarm, check the Error dataword to see if it gives an indication of the kind of error that occurred. It could be a file-system error on the remote host, or a communication failure between the active and standby call agents.
File Write Error—Database (26)
The File Write Error alarm (major) indicates that a file write error has occurred. The primary cause of the alarm is that a system error has occurred and that the system may be out of file descriptors. To troubleshoot and correct the primary cause of the alarm, call Cisco TAC technical support.
Refer to the "Obtaining Documentation and Submitting a Service Request" section on page liii for detailed instructions on contacting Cisco TAC and opening a service request.