Cisco SFS Product Family Element Manager User Guide, Release 2.9.0
InfiniBand Performance Management Tasks

Table Of Contents

InfiniBand Performance Management Tasks

Using the InfiniBand Menu

Enabling and Disabling InfiniBand Port Performance Management

Enabling Performance Management

Disabling Performance Management

Enabling and Managing Port Monitoring

Enabling Port Monitoring

Configuring Port Monitoring

Configuring Port Monitoring Thresholds

Viewing Port Monitoring Errors

Resetting Counters

Resetting Counters on a Hop

Resetting Counters on All Ports on a Node

Resetting Counters on All Ports in a Connection

Resetting All Counters in a Subnet

Monitoring Connections

Creating a Connection to Monitor

Viewing Monitored Connections

Viewing Connection Counters

Viewing Connection Monitor Counters

Testing Connections

Viewing Port Counters of Connections

Viewing InfiniBand Port Counters

Viewing Port Counters

Enabling or Disabling Monitoring a Port

Viewing Cumulative Port Counters


InfiniBand Performance Management Tasks


These topics describe the InfiniBand menu tasks for Element Manager that relate to performance management:

Using the InfiniBand Menu

Enabling and Disabling InfiniBand Port Performance Management

Enabling and Managing Port Monitoring

Resetting Counters

Monitoring Connections

Viewing InfiniBand Port Counters


Note See "InfiniBand Concepts" to familiarize yourself with the InfiniBand technology. For hardware-specific information, consult the relevant hardware documentation.


Using the InfiniBand Menu

The InfiniBand menu has two choices for performing InfiniBand performance management tasks:

Performance Management

Performance Management (tabular format)

This section describes how to use the Performance Management menu option. Most of the tasks can also be performed by choosing the Performance Management (tabular format) menu option, which presents information and configurable options in tables, but is a less user-friendly way to perform your InfiniBand performance management tasks.

Enabling and Disabling InfiniBand Port Performance Management

Use performance management to view InfiniBand port counters, test connectivity between InfiniBand ports, and monitor InfiniBand ports for errors.

These topics describe how to enable and disable InfiniBand port performance management:

Enabling Performance Management

Disabling Performance Management

Enabling Performance Management

To enable InfiniBand-port performance management, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Click the subnet of the ports that you want to manage (for instance, fe:80:00:00:00:00:00:00).

The Port Counter Configuration display appears in the right pane of the window.

Step 3 Click the Enable radio button.


Disabling Performance Management

To disable performance management, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Click the subnet of the ports that you want to manage (for instance, fe:80:00:00:00:00:00:00).

The Port Counter Configuration display appears in the right pane of the window.

Step 3 Click the Disable radio button.


Enabling and Managing Port Monitoring

These topics describe how to enable and manage port monitoring:

Enabling Port Monitoring

Configuring Port Monitoring

Configuring Port Monitoring Thresholds

Viewing Port Monitoring Errors

Enabling Port Monitoring

To enable port monitoring, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Select the Port Monitor branch.

Step 4 Click the General tab.

Step 5 From the State drop-down menu, choose Enable.


Note Enable enables port monitoring only for the ports that are configured in the Monitor Port Config table; enableAll enables port monitoring for all ports regardless of whether the port is configured in the Monitor Port Config table or not.


Step 6 Click Apply.


Configuring Port Monitoring


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

The navigation tree expands.

Step 3 Select the Port Monitor branch.

Step 4 Click the General tab.

Step 5 In the Polling Period field, enter an integer value between 1 and 600 to configure the number of seconds between polls.

Step 6 In the Start Delay field, enter an integer value between 1 and 600 to configure the delay between startup and polling.


Configuring Port Monitoring Thresholds

To configure port monitoring thresholds, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

The navigation tree expands.

Step 3 Select the Port Monitor branch.

Step 4 Click the Threshold tab.

Step 5 Enter an integer value in the fields where you want to apply a threshold. Enter none in the fields to which you do not want to apply a threshold.

Step 6 Click Apply.


Viewing Port Monitoring Errors

To view port monitoring errors, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

The navigation tree expands.

Step 3 Select the Port Monitor branch.

Step 4 Click the Port Errors tab.

Step 5 Port errors are displayed.


Resetting Counters

You can reset counters for the following:

Resetting Counters on a Hop

Resetting Counters on All Ports on a Node

Resetting Counters on All Ports in a Connection

Resetting All Counters in a Subnet

Resetting Counters on a Hop

To reset counters on a hop, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Expand the connection that includes the hop that you want to clear.

Step 5 Right-click the hop with counters you want to clear, and choose Clear counters on this Hop.


Resetting Counters on All Ports on a Node

To reset counters on all ports of a node, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Expand the connection that includes the node that you want to clear.

Step 5 Right-click the node with counters you want to clear, and choose Clear counters on this Node.


Resetting Counters on All Ports in a Connection

To reset counters on all ports in a connection, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Right-click the connection with counters you want to clear, and choose Clear counters on this Connection.


Resetting All Counters in a Subnet

To reset all counters in a subnet, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Right-click the Connection Counters branch, and choose Clear Counters for All Connections.


Monitoring Connections

To monitor connections, you complete tasks such as:

Creating a Connection to Monitor

Viewing Monitored Connections

Viewing Connection Counters

Viewing Connection Monitor Counters

Testing Connections

Viewing Port Counters of Connections

Creating a Connection to Monitor

To create a connection to monitor, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Choose Connection Counters.

The Monitored Connection tab appears in the right pane of the window.

Step 4 Click Add.

The Add Connection window opens.

Step 5 In the Source LID field, enter a source LID.


Note To view available source and destination LIDs, return to the main Element Manager display, click the InfiniBand menu, choose Subnet Management, and then click the SwitchRoute tab. For more information, see the "Viewing and Managing InfiniBand Routes" section.


Step 6 In the Destination LID field, enter a destination LID.

Step 7 Check the Enable Connection Monitoring check box.


Note If this check box is not selected, you can view only counter information and cannot view monitoring information.


Step 8 Click Add.

The connection entry appears under the Monitored Connections tab.


Viewing Monitored Connections

These instructions assume that you have already defined connections to monitor. To view monitored connections, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

The navigation tree expands.

Step 3 Select the Connection Counters branch.

The Monitored Connection tab appears in the right pane of the window. Table 9-1 describes the fields in this pane.

Table 9-1 Monitored Connections Field Descriptions Pane 

Field
Description

Subnet Prefix

Subnet prefix of the monitored connection.

Source LID

16-bit source Local ID of the connection.

Destination LID

16-bit destination Local ID of the connection.

Error Status

Displays unknown, exceeded, or notExceeded to indicate if the error value has exceeded the threshold that you configured. To configure thresholds, see the "Configuring Port Monitoring Thresholds" section.

Util Status

Displays unknown, exceeded, or notExceeded to indicate if the utilization value has exceeded the threshold that you configured. To configure thresholds, see the "Configuring Port Monitoring Thresholds" section.



Viewing Connection Counters

Each hop in the display is a port on a node. When connections move through nodes, they enter the node in one hop (GUID A, port a), and exit in another hop (GUID A, port b). Though the GUIDs of subsequent hops may match, the ports do not match. To view connection counters, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Select the connection with counters that you want to view.

Step 5 Click the Connection Counters tab.

Table 9-2 describes the fields in the display.

Table 9-2 Connection Counters Field Descriptions 

Field
Description

Subnet Prefix

Subnet prefix of the subnet on which each hop resides.

Node Guid

Global unique ID of the node (switch chip, HCA, or TCA) of the next-hop port.

Port Number

Port number (on the appropriate node) of the hop.

Chassis Guid

Global Unique ID (GUID) of the chassis.

Slot Number

Slot of the port.

Ext Port Number

External port number of the port.

Data Is Valid

Displays true or false.

Symbol Errors

Number of symbol errors on the port.

Link Recovery Errors

Number of link recovery errors on the port.

Link Downs

Number of link-down errors on the port.

Received Errors

Number of received errors that the port experienced.

Received Remote Physical Errors

Number of physical errors that the port experienced.

Received Switch Relay Errors

Number of switch relay errors that the port experienced.

Transmitted Discards

Number of transmitted discards that occurred on the port.

Transmitted Constraint Errors

Number of Transmitted Constraint errors that the port experienced.

Received Constraint Errors

Number of Received Constraint errors that the port experienced.

Local Link Integrity Errors

Number of logical link integrity errors on the port.

Excessive Buffer Overrun Errors

Number of excessive buffer overrun errors on the port.

VL15 Dropped

Number of VL15 drops on the port.

Transmitted Data

Volume of transmitted data on the port.

Received Data

Volume of received data on the port.

Transmitted Packets

Volume of transmitted packets on the port.

Received Packets

Volume of received packets on the port.



Viewing Connection Monitor Counters

To view connection monitor counters, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Select the connection with counters that you want to view.

Step 5 Click the Connection Monitor Counters tab.

Table 9-3 describes the fields in the tab.

Table 9-3 Connection Monitor Counters Field Descriptions 

Field
Description

Node Guid

Global unique ID of the InfiniBand node of the hop port.

Port Number

Port number of the hop.

Chassis Guid

GUID of the chassis that includes the connection.

Slot Number

Slot number of the port(s) in the connection.

Ext Port Number

External port number of the connection port.

Error Type

Type of error that occurred.



Testing Connections

To test connections, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Select the connection with counters that you want to view.

Step 5 Click the Test Connection tab.

Step 6 Click Test.


Viewing Port Counters of Connections

To view port counters, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Connection Counters branch.

Step 4 Expand the connection with port counters that you want to view.

Step 5 Select the port (in GUID - port-number format) with counters that you want to view.

Table 9-4 describes the fields in this display.

Table 9-4 Port Counters Field Descriptions 

Field
Description

Subnet Prefix

Subnet prefix of the subnet on which each hop resides.

Node Guid

Global unique ID of the node (switch chip, HCA, or TCA) of the next-hop port.

Port Number

Port number (on the appropriate node) of the hop.

Chassis Guid

GUID of the chassis that includes the connection.

Slot Number

Slot number of the port(s) in the connection.

Ext Port Number

External port number of the connection port.

Symbol Errors

Total number of symbol errors detected on one or more lanes.

Link Recovery Errors

Total number of times the port training state machine has successfully completed the link error recovery process.

Link Downs

Total number of times that the port training state machine has failed the link error recovery process and downed the link.

Received Errors

Total number of packets containing an error that was received on the port. These errors are as follows:

Local physical errors (ICRC, VCRC, FCCRC, and all physical errors that cause entry into the bad state)

Malformed data packet errors (Lver, length, VL)

Malformed link packet errors (operand, length, VL)

Packets discarded due to buffer overrun

Received Remote Physical Errors

Total number of packets marked with the EBP delimiter received on the port.

Received Switch Relay Errors

Total number of packets received on the port that were discarded because they could be forwarded by the switch relay. Reasons for this are as follows:

DLID mapping.

VL mapping.

Looping (output port = input port).

Transmitted Discards

Total number of outbound packets discarded by the port because the port is down or congested. Reasons for this are as follows:

Output port is in the inactive state.

Packet length has exceeded neighbor MTU.

Switch lifetime limit has been exceeded.

Switch HOQ limit has been exceeded.

Transmitted Constraint Errors

Total number of packets not transmitted from the port for the following reasons:

FilterRawOutbound is true and packet is raw.

PatitionEnforcementOutbound is true and packet fails the partition key check, the IP version check, or the transport header version check.

Received Constraint Errors

Total number of packets received on the port that are discarded for the following reasons:

FilterRawInbound is true, and packet is raw.

PartitionEnforcementInbound is true and the packet fails the partition key check, the IP version check, or the transport header version check.

Local Link Integrity Errors

Number of times that the frequency of packets containing local physical errors exceeded local_phy_errors.

Excessive Buffer Overrun Errors

Number of times that overrun errors' consecutive flow control update periods occurred with at least one overrun error in each period.

VL15 Dropped

Number of incoming VL15 packets dropped due to resource limitations on port selected by PortSelect.

Transmitted Data

(Optional) Shall be zero if not implemented. Total number of data octets, divided by 4, transmitted on all VLs from the port selected by PortSelect. This includes all octets between (and not including) the start of packet delimiter and VCRC. It excludes all link packets.

You may choose to count data octets in groups larger than four but are encouraged to choose the smallest group possible. Results are still reported as a multiple of four octets.

Received Data

(Optional) Shall be zero if not implemented. Total number of data octets, divided by 4, received on all VLs from the port selected by PortSelect. This includes all octets between (and not including) the start of packet delimiter and VCRC. It excludes all link packets.

You may choose to count data octets in groups larger than four but are encouraged to choose the smallest group possible. Results are still reported as a multiple of four octets.

Transmitted Packets

(Optional) Shall be zero if not implemented. Total number of data packets, excluding link packets, transmitted on all VLs from the port selected by PortSelect.

Received Packets

(Optional) Shall be zero if not implemented. Total number of data packets, excluding link packets, received on all VLs from the port selected by PortSelect.



Viewing InfiniBand Port Counters

These topics describe how to view InfiniBand port counters:

Viewing Port Counters

Enabling or Disabling Monitoring a Port

Viewing Cumulative Port Counters

Viewing Port Counters

To view port counters, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Port Counters branch.

Step 4 View port counters using one of the following methods:

Click the GUID with port counters that you want to view; all available port counters appear.

Expand the GUID of the node with port counters that you want to view, and then select the port with counters that you want to view.

Counters appear for that individual port. Table 9-5 describes the fields in the port counters display.

Table 9-5 Port Counters Field Descriptions 

Field
Description

Subnet Prefix

Subnet prefix of the subnet on which each hop resides.

Node Guid

Global unique ID of the node (switch chip, HCA, or TCA) of the next-hop port.

Port Number

Port number (on the appropriate node) of the hop.

Chassis Guid

GUID of the chassis that includes the connection.

Slot Number

Slot number of the port(s) in the connection.

Ext Port Number

External port number of the connection port.

Symbol Errors

Total number of symbol errors detected on one or more lanes.

Link Recovery Errors

Total number of times the port training state machine has successfully completed the link error recovery process.

Link Downs

Total number of times the port training state machine has failed the link error recovery process and downed the link.

Received Errors

Total number of packets containing an error that were received on the port. These errors are as follows:

Local physical errors (ICRC, VCRC, FCCRC, and all physical errors that cause entry into the "bad" state)

Malformed data packet errors (Lver, length, VL)

Malformed link packet errors (operand, length, VL)

Packets discarded due to buffer overrun

Received Remote Physical Errors

Total number of packets marked with the EBP delimiter received on the port.

Received Switch Relay Errors

Total number of packets received on the port that were discarded because they could be forwarded by the switch relay. Reasons for this are as follows:

DLID mapping.

VL mapping.

Looping (output port = input port).

Transmitted Discards

Total number of outbound packets discarded by the port because the port is down or congested. Reasons for this are as follows:

Output port is in the inactive state.

Packet length has exceeded neighbor MTU.

Switch lifetime limit has been exceeded.

Switch HOQ limit has been exceeded.

Transmitted Constraint Errors

Total number of packets not transmitted from the port for the following reasons:

FilterRawOutbound is true, and packet is raw.

PatitionEnforcementOutbound is true and the packet fails the partition key check, the IP version check, or the transport header version check.

Received Constraint Errors

Total number of packets received on the port that are discarded for the following reasons:

FilterRawInbound is true, and packet is raw.

PartitionEnforcementInbound is true and packet fails partition key check, IP version check, or transport header version check.

Logical Link Integrity Errors

Number of times that the frequency of packets containing local physical errors exceeded local_phy_errors.

Excessive Buffer Overrun Errors

Number of times that overrun errors consecutive flow control update periods occurred with at least one overrun error in each period.

VL15 Dropped

Number of incoming VL15 packets dropped due to resource limitations on port selected by PortSelect.

Transmitted Data

(Optional) Value is zero if not implemented. Total number of data octets, divided by 4, transmitted on all VLs from the port selected by PortSelect. This includes all octets between (and not including) the start of the packet delimiter and the VCRC. It excludes all link packets.

You may choose to count data octets in groups larger than four but are encouraged to choose the smallest group possible. Results are still reported as a multiple of four octets.

Received Data

(Optional) Shall be zero if not implemented. Total number of data octets, divided by 4, received on all VLs from the port selected by PortSelect. This includes all octets between (and not including) the start of the packet delimiter and the VCRC. It excludes all link packets.

You may choose to count data octets in groups larger than four but are encouraged to choose the smallest group possible. Results are still reported as a multiple of four octets.

Transmitted Packets

(Optional) Shall be zero if not implemented. Total number of data packets, excluding link packets, transmitted on all VLs from the port selected by PortSelect.

Received Packets

(Optional) Shall be zero if not implemented. Total number of data packets, excluding link packets, received on all VLs from the port selected by PortSelect.



Enabling or Disabling Monitoring a Port

To enable or disable port monitoring for a specific port, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Port Counters branch.

Step 4 Expand the GUID of the node with port counters that you want enable or disable.

Step 5 Right click the port for which you want to enable or disable monitoring.

Step 6 From the drop-down menu, select Enable Port Monitoring or Disable Port Monitoring.


Viewing Cumulative Port Counters

To view cumulative port counters, follow these steps:


Step 1 From the InfiniBand menu, choose Performance Management.

The Performance Management window opens.

Step 2 Expand the subnet of the connections that you want to monitor.

Step 3 Expand the Port Counters branch.

Step 4 Expand the node of the port with cumulative counters that you want to view.

Step 5 Click the port with navigation counters that you want to view.

Step 6 Click the Port Cumulative Counters tab.

Table 9-6 describes the fields in the tab.

Table 9-6 Cumulative Port Counters Field Descriptions 

Field
Description

Subnet Prefix

Subnet prefix of the subnet on which each hop resides.

Node Guid

Global unique ID of the node (switch chip, HCA, or TCA) of the next-hop port.

Port Number

Port number (on the appropriate node) of the hop.

Chassis Guid

Global Unique ID (GUID) of the chassis.

Slot Number

Slot of the port.

Ext Port Number

External port number of the port.

Error Status

Displays true or false.

Util Status

Number of symbol errors on the port.

Symbol Errors

Number of link recovery errors on the port.

Link Recovery Errors

Number of link-down errors on the port.

Link Downs

Number of received errors that the port experienced.

Received Errors

Number of physical errors that the port experienced.

Received Remote Physical Errors

Number of switch relay errors that the port experienced.

Received Switch Relay Errors

Number of transmitted discards that occurred on the port.

Transmit Discards

Number of Transmit Constraint errors that the port experienced.

Transmit Constraint Errors

Number of Received Constraint errors that the port experienced.

Received Constraint Errors

Number of logical link integrity errors on the port.

Logical Link Integrity Errors

Number of excessive buffer overrun errors on the port.

Excessive Buffer Overrun Errors

Number of VL15 drops on the port.

VL15 Dropped

Volume of transmitted data on the port.

Transmit Data

Volume of received data on the port.

Received Data

Volume of transmitted packets on the port.

Transmit Packets

Volume of received packets on the port.

Received Packets

Subnet prefix of the subnet on which each hop resides.

Transmit Rate

Global unique ID of the node (switch chip, HCA, or TCA) of the next-hop port.

Received Rate

Port number (on the appropriate node) of the hop.