Table Of Contents
Fault Management In MPLS Networks
MPLS Related Faults
MPLS Black Hole Found Alarm
Broken LSP Discovered Alarm
Black Hole To Link Down
BGP Related Faults
BGP Neighbor Loss
BGP Process Down
Traffic Engineering Faults
MPLS TE Tunnel Down and TE Tunnel Flapping
Tunnel Reoptimized
Layer 2 VPN Faults
Pseudo Wire (L2 VPN) MPLS Tunnel Down
Alarms Summary
Fault Management In MPLS Networks
This chapter describes the alarms that Cisco ANA detects and reports for BGP, MPLS TE (using RSVP TE), MPLS Black Holes, as well as alarm reports for Layer 2 and Layer 3 VPNs:
•
MPLS Related Faults—Describes the "MPLS black hole found" and "Broken LSP discovered" alarms.
•
BGP Related Faults—Describes the "BGP neighbor loss" alarm.
•
Traffic Engineering Faults—Describes the "MPLS TE tunnel down", "MPLS TE tunnel flapping" and "Tunnel reoptimized" alarms.
•
Layer 2 VPN Faults—Describes the "Pseudo Wire (L2 VPN) MPLS tunnel down" alarm.
•
Alarms Summary—Provides a brief description of the alarms for VPNs, including their severity and the up alarm (clearing alarm) for each.
Cisco ANA supports the following alarms:
•
MPLS Related Faults:
–
MPLS black hole found
–
Broken LSP discovered
•
BGP Related Faults:
–
BGP neighbor loss
•
Traffic Engineering Faults:
–
MPLS TE tunnel down
–
MPLS TE tunnel flapping
–
Tunnel reoptimized
•
Layer 2 VPN Faults:
–
Pseudo Wire (L2 VPN) MPLS tunnel down
The alarms are displayed in the ticket pane of the Cisco ANA NetworkVision window. For more information about the ticket pane, see the Cisco Active Network Abstraction NetworkVision User Guide.
MPLS Related Faults
This section includes descriptions of the following MPLS related faults:
•
MPLS black hole found
•
Broken LSP discovered
Note
The MPLS black hole feature is only supported when the PEs are managed by the system.
MPLS Black Hole Found Alarm
A MPLS "black hole" is defined as an abnormal termination of a MPLS path (LSP) inside a MPLS network. A MPLS "black hole" exists when on a specific interface there are untagged entries destined for a known PE router. It is assumed that a router functions as a PE router if there are services using the MPLS network, such as L3 VPNs or Pseudo Wire (L2 VPN) MPLS Tunnels. Note that the untagged interfaces may exist in the network in normal situations. For example, where the boundary of the MPLS cloud has untagged interfaces this is still considered normal.
The existence of a MPLS "black hole" results in a loss of all the MPLS labels on a packet including the VPN information which lies in the inner MPLS label. So if a packet goes through an untagged interface, the VPN information is lost. The VPN information loss translates directly to VPN sites losing connectivity.
A "MPLS Black Hole Found" alarm is detected actively by the system, namely, service alarms are generated whenever Cisco ANA discovers a MPLS interface that has at least one untagged LSP leading to a known PE router.
Black hole alarms are detected either:
•
When the system is loaded for the first time and performs the initial discovery of the network.
•
Through the ongoing discovery process, which identifies changes in the network.
Broken LSP Discovered Alarm
The "MPLS Black Hole Found" alarm activates a backward flow on the specific untagged entry in order to traverse the full path of the LSPs passing through it. If Cisco ANA locates services (VRFs, Pseudo Wire L2 tunnels) along this path that are using these LSPs a "Broken LSP Discovered" alarm is issued. Such services can only be found on PE routers and they can be found on more than one PE router. The source of the "Broken LSP Discovered" alarm is the PE router on which the service was discovered and in many cases this router is different from the router that issued the "MPLS Black Hole Found" alarm.
"Broken LSP Discovered" alarms are correlated to the "MPLS Black Hole Found" alarm (except in the case of a Black hole alarm due to a link down as described on page 6-4).
The "Broken LSP Discovered" alarm is detected actively by the system, namely, service alarms are generated.
An example of a MPLS black hole scenario is provided below.
In the network described in this example, the shortest path from PE2 to PE3 is PE2<->P2<->PE3. The link between P2 and PE3 is a MPLS link, meaning interfaces on both side of the link are configured as MPLS interfaces. Also assume that for some reason the MPLS configuration is incomplete or incorrect, namely:
•
Only one interface is configured as a MPLS interface.
•
The label distribution protocol is configured differently on both interfaces (protocol mismatch).
In this case the label switching table on P2 and PE3 will have untagged entries for the LSPs between PE2 and PE3. If PE2 and PE3 have VPN services (VRFs, Pseudo Wire tunnels) the outcome will be that the data flow between PE2 and PE3 will be affected.
Figure 6-1 Example of a MPLS Black Hole Scenario
In this case Cisco ANA does the following:
•
Identifies untagged label switching entries on P2 and PE3.
•
Issues "MPLS Black Hole Found" alarms on the interfaces on both sides of the link (since the LSP is unidirectional).
•
Initiates a backward flow starting from the link on the specific untagged entries and identifies the 2 LSPs traversing the link, namely:
–
LSP from PE2 to PE3
–
LSP from PE3 to PE2
•
Issues "Broken LSP Discovered" alarms on both LSPs in PE2 and PE3, which are correlated to the corresponding "MPLS Black Hole Found" alarm.
Note
The clearing alarm does not activate flows to locate the LSPs that were passing through it in order to issue a clearing alarm for Broken LSPs, but rather uses the auto clear functionality. The Gateway periodically reviews the tickets and checks if all the alarms under each ticket are cleared or configured as auto cleared alarms, and whether the Gateway correlation timeout has passed, and in this case the Gateway closes the ticket.
Using this functionality, once the "MPLS Black hole" alarm is cleared, then after a specific time interval (configured Gateway correlation timeout) has passed, the Gateway will be able to close the ticket since all the alarms correlated to "MPLS Black hole" are "Broken LSP" which are configured as auto cleared.
Black Hole To Link Down
In a case where a link down event in a MPLS network has caused an IP reroute and therefore LDP redistribution, a case may arise where new LSPs are now redirected through a non-MPLS segment thereby creating a black hole.
In this case the "Broken LSP Discovered" alarms are issued as described in Broken LSP Discovered Alarm, but all the broken LSPs that are found are correlated to the "Link Down" alarm and not to the "MPLS Black Hole Found" alarm.
BGP Related Faults
Cisco ANA monitors BGP neighbor information and makes correlation and impact analysis information available to users.
This section includes a description of the BGP related faults.
BGP Neighbor Loss
When the VNE's BGP component is polling the status of BGP neighbors (expedite or normal polling), and an entry for a neighbor does not exist any more or its state changed from Established state to any other state, the BGP component will issue an alarm for "BGP Neighbor Loss". This alarm causes the BGP component to issue a Root Cause Analysis (RCA) correlation flow to find its root cause.
If RCA does not find any alarm to correlate to, the VNE will send this alarm to the gateway as a ticket.
If this alarm is configured in the registry to issue "Look For Affected" flow, the following process will be issued:
•
For each BGP link, the BGP component holds a list of PE routers (routers with VRFs) that it can reach.
•
If there is a BGP neighbor loss and the BGP component has no other BGP link to a PE, all VRFs on this device with route entries with that PE as BGP next hop will be true affected.
•
This information will be sent as an update for the previous "BGP Neighbor Loss" alarm.
BGP Process Down
A query checks the status of the BGP process when the VNE's BGP component is polling for the status and configuration of its BGP neighbors (expedite or normal polling).
If the BGP process is not running, the VNE's BGP component will issue an alarm "BGP Process Down".
This alarm will always be a ticket and will not try to correlate to other alarms.
All the "BGP Neighbors Down" alarms issued due to the "BGP Process Down" should correlate to this "BGP Process Down" ticket.
Traffic Engineering Faults
This section includes a description of the following Traffic Engineering related faults:
•
MPLS TE tunnel down
•
MPLS TE tunnel flapping
•
Tunnel reoptimized
MPLS TE Tunnel Down and TE Tunnel Flapping
When a TE tunnel's operational status changes to down and the tunnel is not flapping, the system generates a "Tunnel Down" alarm.
The correlation engine identifies various faults that affect the TE tunnel's status and reports on them as the root cause for the TE "Tunnel Down" alarm, for example, Link down.
Multiple up and down alarms that are generated during a short time interval are suppressed and displayed as a "Tunnel Flapping" alarm (according to the specific flapping configuration).
The "MPLS TE Tunnel Down" and the "TE Tunnel flapping" alarms are detected actively by the system, namely, service alarms are generated.
The system also supports "MPLS TE Tunnel Down" syslogs, which are correlated to the service alarm.
Tunnel Reoptimized
Tunnel reoptimization occurs when a tunnel is up and its route changes but the tunnel continues to remain up. When a TE tunnel is reoptimized to take a different path, the system parses the tunnel reoptimized syslog, if such a syslog is available, and displays this syslog as a ticket.
The "Tunnel Reoptimized" alarm is generated from a syslog message sent by the router.
Layer 2 VPN Faults
This section includes a description of the Layer 2 VPN fault, Pseudo Wire (L2 VPN) MPLS tunnel down.
Pseudo Wire (L2 VPN) MPLS Tunnel Down
A "Pseudo Wire MPLS Tunnel Down" alarm is issued when the pseudo wire link goes down, namely, the pseudo wire tunnel is reported as down from both the devices (based on the status of the tunnel), and the tunnel is not flapping.
The correlation engine identifies various faults that affect the Pseudo Wire tunnel status and reports on them as the root cause for the "Pseudo Wire MPLS Tunnel Down" alarm, for example, Link down.
Cisco ANA traces the LSE path to the edge of the PWE3 tunnel and marks the edges of the tunnel as affected.
The "Pseudo Wire MPLS Tunnel Down" alarm is detected actively by the system, namely, service alarms are generated.
Alarms Summary
The following section describes the alarms that may be displayed in the ticket pane of the Cisco ANA NetworkVision window for VPNs, including their severity and the up alarm for each:
Table 6-1 Alarms Displayed In the Ticket Pane
Alarm
|
Default Severity
|
Description
|
Up Alarm
|
BGP Neighbor Loss
|
Red (critical)
|
The "BGP Neighbor Loss" alarm is generated whenever BGP connectivity is lost to a specific device.
|
BGP Neighbor Found
|
MPLS Black Hole Found
|
Dark blue (information)
|
A "MPLS Black Hole Found" alarm is generated whenever Cisco ANA discovers a MPLS interface that has at least one untagged LSP leading to a known PE router.
|
MPLS Black Hole Cleared
|
Broken LSP Discovered
|
Orange (major)
|
The "MPLS Black Hole Found" alarm activates a backward flow on the specific untagged entry in order to traverse the full path of the LSPs passing through it. The "Broken LSP Discovered" alarm is generated whenever Cisco ANA locates services (VRFs, Pseudo Wire L2 tunnels) along this path that are using these LSPs.
|
N/A
|
MPLS TE Tunnel Down
|
Orange (major)
|
The "MPLS TE Tunnel Down" alarm is generated whenever a TE tunnel's operational status changes to down and the tunnel is not flapping.
|
MPLS TE Tunnel Up
|
MPLS TE Tunnel Flapping
|
Orange (major)
|
The "TE Tunnel flapping" alarm is generated whenever multiple up and down alarms are generated during a short time interval and they are suppressed.
|
Is the last state of the tunnel after it has stopped flapping
|
Pseudo Wire (L2 VPN) MPLS Tunnel Down
|
Yellow (minor)
|
The "Pseudo Wire MPLS Tunnel Down" alarm is generated whenever the pseudo wire link goes down, namely, the pseudo wire tunnel is reported as down from both the devices (based on the status of the tunnel).
|
Layer 2 Tunnel Up
|
Tunnel Reoptimized
|
Dark Blue (information)
|
The "Tunnel Reoptimized" alarm is generated from a syslog message sent by the router whenever a tunnel is up and its route changes but the tunnel continues to remain up.
|
N/A
|
For more information about the ticket pane, see the Cisco Active Network Abstraction NetworkVision User Guide.