Cisco UCS Manager B-Series Troubleshooting Guide
Troubleshooting Server Disk Drive Detection and Monitoring

Troubleshooting Server Disk Drive Detection and Monitoring

This chapter includes the following sections:

Support for Disk Drive Monitoring

Disk drive monitoring only supports certain blade servers and a specific LSI storage controller firmware level.

Supported Cisco UCS Servers

Through Cisco UCS Manager, you can monitor disk drives for the following servers:

  • B-200 blade server
  • B-230 blade server
  • B-250 blade server
  • B-440 blade server

Cisco UCS Manager cannot monitor disk drives in any other blade server or rack-mount server.

Storage Controller Firmware Level

The storage controller on a supported server must have LSI 1064E firmware.

Cisco UCS Manager cannot monitor disk drives in servers with a different level of storage controller firmware.

Prerequisites for Disk Drive Monitoring

In addition to the supported servers and storage controller firmware version, you must ensure that the following prerequisites have been met for disk drive monitoring to provide useful status information:

  • The drive must be inserted in the server drive bay.
  • The server must be powered on.
  • The server must have completed discovery.
  • The results of the BIOS POST complete must be TRUE.

Viewing the Status of a Disk Drive

Viewing the Status of a Disk Drive in the Cisco UCS Manager GUI

Procedure
    Step 1   In the Navigation pane, click the Equipment tab.
    Step 2   On the Equipment tab, expand Equipment > Chassis > Chassis Number > Servers.
    Step 3   Click the server for which you want to view the status of the disk drive.
    Step 4   In the Work pane, click the Inventory tab.
    Step 5   Click the Storage subtab.
    Step 6   Click the down arrows to expand the Disks bar and view the following fields in the States section for each disk drive:
    Name Description

    Operability field

    The operational state of the disk drive. This can be the following:

    • Operable—The disk drive is operable.
    • Inoperable—The disk drive is inoperable, possibly due to a hardware issue such as bad blocks.
    • N/A—The operability of the disk drive cannot be determined. This could be due to the server or firmware not being support for disk drive monitoring, or because the server is powered off.
    Note   

    The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.

    Presence field

    The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state. This can be the following:

    • Equipped—A disk drive can be detected in the server drive bay.
    • Missing—No disk drive can be detected in the server drive bay.

    Viewing the Status of a Disk Drive in the Cisco UCS Manager CLI

    Procedure
       Command or ActionPurpose
      Step 1 UCS-A# scope chassis chassis-num 

      Enters chassis mode for the specified chassis.

       
      Step 2 UCS-A /chassis # scope server server-num 

      Enters server chassis mode.

       
      Step 3 UCS-A /chassis/server # scope raid-controller raid-contr-id {sas | sata} 

      Enters RAID controller server chassis mode.

       
      Step 4 UCS-A /chassis/server/raid-controller # show local-disk [local-disk-id | detail | expand] 

      Displays the following local disk statistics:

      Name Description

      Operability field

      The operational state of the disk drive. This can be the following:

      • Operable—The disk drive is operable.
      • Inoperable—The disk drive is inoperable, possibly due to a hardware issue such as bad blocks.
      • N/A—The operability of the disk drive cannot be determined. This could be due to the server or firmware not being support for disk drive monitoring, or because the server is powered off.
      Note   

      The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.

      Presence field

      The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state. This can be the following:

      • Equipped—A disk drive can be detected in the server drive bay.
      • Missing—No disk drive can be detected in the server drive bay.
       

      The following example shows the status of a disk drive:

      UCS-A# scope chassis 1
      UCS-A /chassis # scope server 6
      UCS-A /chassis/server # scope raid-controller 1 sas
      UCS-A /chassis/server/raid-controller # show local-disk 1
      
      Local Disk:
          ID: 1
          Block Size: 512
          Blocks: 60545024
          Size (MB): 29563
          Operability: Operable
          Presence: Equipped
      

      Interpreting the Status of a Monitored Disk Drive

      Cisco UCS Manager displays the following properties for each monitored disk drive:

      • Operability—The operational state of the disk drive.
      • Presence—The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state.

      You need to look at both properties to determine the status of the monitored disk drive. The following table shows the likely interpretations of the property values.

      Operability Status Presence Status Interpretation

      Operable

      Equipped

      No fault condition. The disk drive is in the server and can be used.

      Inoperable

      Equipped

      Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

      • The disk drive is unusable due to a hardware issue such as bad blocks.
      • There is a problem with the IPMI link to the storage controller.

      N/A

      Missing

      Fault condition. The server drive bay does not contain a disk drive.

      N/A

      Equipped

      Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

      • The server is powered off.
      • The storage controller firmware is the wrong version and does not support disk drive monitoring.
      • The server does not support disk drive monitoring.

      Note


      The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.


      HDD Metrics Not Updated in Cisco UCS Manager GUI

      Problem—After hot-swapping, removing, or adding a hard drive, the updated hard disk drive (HDD) metrics do not appear in the Cisco UCS Manager GUI.

      Possible Cause—This problem can be caused because Cisco UCS Manager gathers HDD metrics only during a system boot. If a hard drive is added or removed after a system boot, the Cisco UCS Manager GUI does not update the HDD metrics.

      Procedure
      Reboot the server.

      Disk Drive Fault Detection Tests Fail

      Problem—The fault LED is illuminated or blinking on the server disk drive, but Cisco UCS Manager does not indicate a disk drive failure.

      Possible Cause—The disk drive fault detection tests failed due to one or more of the following conditions:

      • The disk drive did not fail, and a rebuild is in progress.
      • Drive predictive failure
      • Selected drive failure on Disk 2 of a B200, B230 or B250 blade
      • Selected drive failure on Disk 1 of a B200, B230 or B250 blade
      Procedure
        Step 1   Monitor the fault LEDs of each disk drive in the affected server(s).
        Step 2   If a fault LED on a server turns any color, such as amber, or blinks for no apparent reason, create technical support file for each affected server and contact Cisco TAC.

        Cisco UCS Manager Reports More Disks in Server than Total Slots Available

        Problem—Cisco UCS Manager reports that a server has more disks than the total disk slots available in the server. For example, Cisco UCS Manager reports three disks for a server with two disk slots as follows:

        RAID Controller 1:
                   Local Disk 1:
                       Product Name: 73GB 6Gb SAS 15K RPM SFF HDD/hot plug/drive sled mounted
                       PID: A03-D073GC2
                       Serial: D3B0P99001R9
                       Presence: Equipped
                   Local Disk 2: 
                       Product Name:
                       Presence: Equipped
                       Size (MB): Unknown
                   Local Disk 5:
                       Product Name: 73GB 6Gb SAS 15K RPM SFF HDD/hot plug/drive sled mounted
                       Serial: D3B0P99001R9
                       HW Rev: 0
                       Size (MB): 70136
        

        Possible Cause—This problem is typically caused by a communication failure between Cisco UCS Manager and the server that reports the inaccurate information.

        Procedure
          Step 1   Upgrade the Cisco UCS domain to the latest release of Cisco UCS software and firmware.
          Step 2   Decommission the server.
          Step 3   Recommission the server.