SAS Controllers and Flashy Lights

Posted on February 2, 2021

Overview

Imagine you have a nice fancy pile of disks in a server’s SAS tray. You don’t have a RAID controller and are using software RAID - perhaps because you want to let ZFS manage it? Whatever the reason, you have it and things are cool. Until suddenly the inevitable happens and you have a disk fail. How the heck are you supposed to know which disk goes where? /dev/sdae for example doesn’t tell you what slot! Fortunately, there’s a good chance you can blink the lights via sg_ses - a utility to talk to the SCSI generic driver included with the kernel.

Software

You will need the “sg3_utils” package. On RHEL derivatives (and SUSE), this is sg3_utils as you might suspect. Debian derivatives, look for sg3-utils. As well, if your storage isn’t using the SCSI generic driver or doesn’t support SCSI Enclosure Services, this won’t work for you.

The Process

Figure out which disk is in trouble

Presumably if you’ve gotten to the point where you need to do all this, you know you have a disk failure and your software RAID and/or SMART is telling you which disk it is. In this document, we’ll assume the disk is /dev/sdae. If you’ve never seen it before, if you run out of letters, a new letter is appended (ie, sdz wraps to sdaa).

Get the SAS address of the disk

There may be other/better ways to get at this bit of information, but I don’t know about them. This works on my system. Do a long list of /dev/disk/by-path and look for items that are links to the disk in question. On our example system:

[root@redacted ~]# ls -l /dev/disk/by-path | egrep 'sdae$'
lrwxrwxrwx. 1 root root 10 Jan 25 17:24 pci-0000:03:00.0-sas-0x5000c50041f23b8a-lun-0 -> ../../sdae
lrwxrwxrwx. 1 root root 10 Jan 25 17:24 pci-0000:03:00.0-sas-exp0x500c04f2cc388cbf-phy18-lun-0 -> ../../sdae

“Hey, there’s more than one!” - yes, good eyes… Look for the hex code following “sas-“ - the one prefixed with “exp” is the “attached SAS address” and, I believe, is the SAS address of the enclosure the disk is installed within. You want the other one, which is 0x5000c50041f23b8a in this example. This is the SAS address of the disk itself.

Look for enclosures

You want to identify /dev/sg* devices that are disk enclosures. Fortunately the lsscsi command can help:

[root@redacted ~]# lsscsi -g | grep -i encl
[0:0:24:0]  enclosu DELL  MD1220  1.06  -  /dev/sg24
[0:0:49:0]  enclosu DELL  MD1220  1.06  -  /dev/sg49

In this example, there are two - /dev/sg24 and /dev/sg49

Find the slot name for the SAS address

In the previous command (to find enclosures) you can remove the grep command - you may find on your systems the ordering of disks and enclosures is predictable. I am not assuming this is true, so we do a bit more digging to be sure.

For each enclosure found earlier, you’ll want to run sg_ses --join /dev/sg49 (of course, changing the device argument as necessary):

[root@redacted ~]# sg_ses -j /dev/sg49

This will print out an absolute ton of information, so you will need to pipe it through a pager or redirect it to a file for your perusal. Look for the SAS address. For example, ours was found on /dev/sg49 and not /dev/sg24.

Slot 6 [0,6]  Element type: Array device slot
  Enclosure Status:
    Predicted failure=0, Disabled=0, Swap=0, status: Unknown
    OK=0, Reserved device=0, Hot spare=0, Cons check=0
    In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
    App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
    Ready to insert=0, RMV=0, Ident=0, Report=0
    App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
    Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
  Additional Element Status:
    Transport protocol: SAS
    number of phys: 2, not all phys: 0, device slot number: 6
    phy index: 0
      device type: end device
      initiator port for:
      target port for: SSP
      attached SAS address: 0x500c04f2cc388c3f
      SAS address: 0x5000c50041f23b89
      phy identifier: 0x0
    phy index: 1
      device type: end device
      initiator port for:
      target port for: SSP
      attached SAS address: 0x500c04f2cc388cbf
      SAS address: 0x5000c50041f23b8a
      phy identifier: 0x1

This is a whole lot of useful information for sure, but we only care about two things:

  1. SAS address: 0x5000c50041f23b8a
    • this confirms we are looking at the slot our disk of concern is installed into.
  2. Slot 6 [0,6] Element type: Array device slot
    • the first part of this, “Slot 6”, is the “device name” we will be using next.

Now for the punchline: ask the enclosure nicely to please flash the LED so you know what this slot in the OS actually maps to in the real world. There’s three different associated commands - GET, SET, and CLEAR. As with the rest of this writeup I’m using short arguments. In the examples below, NAME is the device name (ie, “Slot 6” found earlier) while ENCLOSURE is the enclosure device (ie, “/dev/sg49”)

  1. sg_ses -D "NAME" -G "ident" ENCLOSURE
    • query the locator LED status (0 = off, 1 = on)
  2. sg_ses -D "NAME" -S "ident" ENCLOSURE
    • turn the locator LED on (solid vs blink is dependent on hardware)
  3. sg_ses -D "NAME" -C "ident" ENCLOSURE
    • turn the locator LED off (normal LED operation)

For example, if I wanted to enable/disable the locator LED in 10 second intervals (to make it easier to tell which is locator vs coincidental disk activity) you can use a simple loop as such:

while true; do
  sg_ses -D "Slot 6" -S "ident" /dev/sg49
  sleep 10
  sg_ses -D "Slot 6" -C "ident" /dev/sg49
  sleep 10
  # CTRL-C when done
done
Disclaimer/Copyright

The information, views, and opinions published on this website were done so in the author's personal capacity. The information, views, and opinions expressed in this article are the author's own and do not reflect the view of their employer, or any other entity unless explicitly stated otherwise.

All data and information provided on this site is for informational purposes only. This website and it's operators makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

All original content on this website is, unless explicitly stated otherwise, licensed under the MIT license. Full license text is available here. Non-original content that is included on this website in whole or in part, linked, or otherwise made available remains under copyright of the original owners.