62

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized:

[root@fs-2 ~]# tail -18 /var/log/messages
May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake }
May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset
May 5 16:54:45 fs-2 kernel: ata1: soft resetting link
May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:54:55 fs-2 kernel: ata1: soft resetting link
May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:05 fs-2 kernel: ata1: soft resetting link
May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
May 5 16:55:40 fs-2 kernel: ata1: soft resetting link
May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up
May 5 16:55:45 fs-2 kernel: ata1: EH complete

I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work:

[root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan
-bash: echo: write error: Invalid argument
[root@fs-2 ~]#
[root@fs-2 ~]# /root/rescan-scsi-bus.sh -l
[snip]
0 new device(s) found.
0 device(s) removed.
[root@fs-2 ~]#
[root@fs-2 ~]# ls /dev/sda
ls: /dev/sda: No such file or directory

I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting?

The operating system in question is RHEL5.3:

[root@fs-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS.

Here is the lscpi output:

[root@fs-2 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)

Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful.

The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output:

[root@www-1 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)
rzr
  • 269

12 Answers12

63

If your SATA controller supports hot swap, it should "just work(tm)."

To force a rescan on a SCSI BUS (each SATA port shows as a SCSI BUS) and find new drives, you will use:

echo "0 0 0" >/sys/class/scsi_host/host<n>/scan

On the above, < n > is the BUS number.

24
echo "- - -" >/sys/class/scsi_host/host<n>/scan
       ^ ^
        \_\_______ note spaces between the dashes.
20

When a drive has failed in some circumstances Linux won't realise you've actually pulled it physically from the array. If you have that problem (as I did this morning) you can do the following:

echo 1 > /sys/block/<devnode>/device/delete

e.g., in my case, /dev/sda had failed and I didn't want to reboot the server, so I did:

echo 1 > /sys/block/sda/device/delete

After I did that, the new drive (which had actually been physically added already) was immediately visible.

If it is not visible at this point, you can also do this to force a re-scan:

echo "- - -" > /sys/class/scsi_host/host<n>/scan

That "- - -" is wildcards for channel, id & LUN respectively, so you can restrict the scan to some subset if you want by specifying numbers instead.

Before you start, you could also:

readlink /sys/block/<devnode>

Which will show you the path with the right host number to check in /proc/scsi/scsi for disappearence after removal.

karora
  • 111
  • 1
  • 4
18

I can't believe nobody mentioned AHCI yet... your SATA controller has to be in AHCI mode to enable hot swap. Check this by looking at the driver you are using:

root@peter:~ # find /sys -name sdk
/sys/devices/pci0000:00/0000:00:11.0/ata5/host4/target4:0:0/4:0:0:0/block    /sdk
/sys/block/sdk
/sys/class/block/sdk

root@peter:~ # readlink /sys/devices/pci0000:00/0000:00:11.0/driver
../../../bus/pci/drivers/ahci

root@peter:~ # lspci -k | less
[... big long output... search for ahci or your pci address, or use the awk below ...]

root@peter:~ # lspci -k | awk '$1 == "00:11.0" {x=1}; x && /in use/ {print $0; exit}'
    Kernel driver in use: ahci

See how it says "ahci" there.

If it doesn't, then just enable it in your BIOS. Also, some BIOSses, especially on servers or UEFI have a "Hot Swap = enabled/disabled" setting per disk which you should also enable if it exists.

Peter
  • 3,046
8

How about this (seems to work in Ubuntu):

sudo partprobe

4

In some cases hot-swap may need to be enabled on the BIOS of either the motherboard and/or the SATA controller. This completely depends on the make and model of both, but if you have on-board SATA controllers that should support hotswap then it's worth combing through the motherboard BIOS. SATA cards may or may not have their own BIOS settings, many lower-end cards don't, but server-grade cards typically do.

If I recall correctly I've needed to this with a number of Gigabyte motherboards, and perhaps some other makes. I needed it for a hot-swap SATA tray to work; with the feature disabled removing the drive didn't cause issues but a new drive wouldn't register until reboot. Enabling the setting worked as-expected, drives that were placed in the tray were immediately spun up and available to the OS.

STW
  • 1,119
2

Here's why I needed to reboot the computer...

I just hot-swapped my /dev/sdc. I have used scsiadd -r 3 0 0 to power the old disk off before pulling it out. Then after installing the new disk the new disk didn't appear as /dev/sdc but rather as /dev/sdd. After a reboot, the disk would reappear as /dev/sdc again.

So it seems hotswap works Ok, it may be just that the /dev/sd* isn't the same anymore.

Could this be an answer to your problem?

Peter
  • 21
1

For hotplug to work you must have the acpiphp module loaded.

[root@example ~]# modprobe acpiphp

obviously if you want this to work on boot, you will have to configure that to be loaded at boot time - one way is to create / edit /etc/rc.modules (which is called by rc.sysinit) and add the line :

modprobe acpiphp

remember if you create this file to chmod +x it, as it's called in that manner.

Frankie
  • 449
nox
  • 19
1

My DVD on my Fedora 16 machine is connected to a SATA interface. It was locked up and would not open or close. Running partprobe as root got my cdrom/DVD working again. I reckon it will help on anther machine where I have the occasional hot swap problem. Thanks!

1

The Fusion-MPT SAS controller you have is a low end RAID controller. If you're not using it for RAID, it may still be providing an unhelpful layer of obstruction/abstraction.

You may need to poke at the RAID controller with mpt-status or lsiutil to get it to actually scan the bus.

http://hwraid.le-vert.net/wiki/LSIFusionMPT has a nice amount of documentation, but I can't say I've verified it.

aij
  • 203
1

I know this question is old, but I had some success I did not see reported elsewhere. Had similar trouble on a Dell Precision 380 today. Eventually got it to work by doing some combination of the following:

echo "- - -" > /sys/class/scsi_host/host2/scan
echo 1 > /sys/class/scsi_device/2:0:0:0/device/reset
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/rescan
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/reset

WARNING: This may disrupt other ATA devices on the system as well. If you have mounted filesystems on those devices, that is likely to end badly. My situation did not care, but yours might.

Exactly which of the above commands are needed, and in what order, is unknown to me at this time. Some commands may need to be repeated. If I had to guess, I would say do in the order shown above, then another scsi_host scan again at the end. I did quite a few more in my explorations.

The first command (scsi_host scan) tells the SCSI midlayer to scan all buses for new/changed devices. The second command tries to reset the SCSI target (disk device). The last two are working with the driver for the AHCI controller itself.

I found the items in question mostly by detailed examination and bold experimentation.

You can match scsi_device nodes to device make and model with (using grep to print the file names in front of the contents):

grep . /sys/class/scsi_device/*/device/model

The first digit of the SCSI device ID should be the scsi_host number. You can then match scsi_host nodes to their devices nodes with:

ls -l /sys/class/scsi_host

I suspect I will never get a chance to refine further, so I wanted to share this info in the hopes of getting others closer. If I do get more info, I will edit this answer to reflect.

Hope this helps.

Ben Scott
  • 380
0

Just picked up a new system with hot-swap. I had drives in two of the three slots, but neither was found using any technique I could find. The solution ended up being: enter setup -> Advanced -> SATA Configuration -> SATA Enable = Enabled. Previously this had been set to Auto.