10

After 3 years in 24x7 service a 1TB Seagate Barracuda ES.2 enterprise drive is showing signs of failure. S.M.A.R.T. reallocated sector count is high.

Wikipedia article suggests that the drive can still be used for less sensitive purposes like scratch storage outside of an array if remapped sectors are left unused.

A workaround which will preserve drive speed at the expense of capacity 
is to create a disk partition over the region which contains remaps and 
instruct the operating system to not use that partition.

In order to create such a partition it is necessary to fetch the list of remapped sectors. However there are no badblocks visible to the operating system. I.e. badblocks returns an empty list.

Is there a way to recover the list of reallocated sectors?

Edit: This drive is from an array. We get a few of them failing every year and just throwing them away seems to be a waste. I am thinking of giving a second chance to the better parts of the platters.

Here is how the S.M.A.R.T. report looks now.

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    **********
Firmware Version: SN05
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   056   054   044    Pre-fail  Always       -       164293299
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   005   005   036    Pre-fail  Always   FAILING_NOW 1955
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       8677183434
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       24893
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       14
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   097   097   000    Old_age   Always       -       3
190 Airflow_Temperature_Cel 0x0022   050   043   045    Old_age   Always   In_the_past 50 (0 6 50 32)
194 Temperature_Celsius     0x0022   050   057   000    Old_age   Always       -       50 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   021   010   000    Old_age   Always       -       164293299
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       21
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       21
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

8 Answers8

20

You don't.

You go buy another disk to replace it unless you just really like losing data.

13

I'd like to thank you for the advice and share some of the details that I've got from experiments.

In short, there is no easy way to get the list of reallocated sectors and even statistical methods of mapping the disk are heavily encumbered by the need to play against the logic of the firmware.

To test the drive I ran badblocks -wv with the default blocksize and monitored the reallocated sector count in the process. I made several observations.

  1. I observed that there was a sharp rise in the number of reallocated sectors when writing to the beginning of the disk. Then from the first 10G to 700G there was no change. This can be explained by the fact that certain RAID houskeeping data was stored at the beginning of the disk, therefore the wear in the small addresses area was higher than in the rest of the disk.

  2. Then after a single error the disk turned itself into a blocked mode. That is every ATA command, even IDENTIFY DRIVE returned ABRT. Even though the value of reallocated sectors was still positive. To explain this behaviour as David Schwartz suggested, I assumed that reserved sectors are somehow distributed over the address space of the drive. This means that the drive might have reserved sectors, yet a part of it may run out of sectors to remap. In this situation the firmware just blocks the drive.

  3. The drive returns out of the blocked mode only after powercycling the drive. When the old drives let the software keep track of bad blocks and avoid using them, modern drives do not give this opportunity. When the firmware thinks it cannot cope with the errors, it makes the drive unusable.

  4. By running the value of reallocated sectors down to 02 I conclude that there are 2048 reserved sectors on this drive.

  5. So-called low level formatting, or writing zeros to every accessible sector of the drive to reallocate the sectors from less reliable parts of the disk would not work because when the drive runs out of reserved sectors it changes the way it handles errors in a way that makes it much less convenient to use than a traditional drive that does not do any predictive failure analysis and simply reports an error.

3

If you have business data that is worth less than the cost of the drive then use them for that, if not then throw them away or give them to people from the department who understand the risks. Contact the manufacturer and see if they offer recycling.

user9517
  • 117,122
3

If the drive is still under warranty, you can return it to the manufacturer via their RMA process for a free replacement, after sanitizing it first. (Secure Erase will wipe the entire drive, including reallocated or otherwise inaccessible sectors.) (I'm quite surprised nobody suggested this.) Otherwise, you do what @SpacemanSpiff said and buy a new drive.

Michael Hampton
  • 252,907
3

actually an enhanced secure erase is better as that covers the reserved blocks as well.

However: If there are really that many bad sectors, the disk is a paperweight. Ditto if it won't reallocate them or declare them ok (Pending sectors occur when there's a read issue. Most of them are "soft" errors, usually caused by external vibration.)

Stoat
  • 31
2

To find worn-out sectors on a disk, you need to scan the entire disk and note any sector with slow read speed.

Here’s a step-by-step guide on how to do it in Windows:

  1. Use a GUI Tool: It’s easier to use a tool with a graphical interface like Victoria 5.37. This tool shows a grid view of slow blocks, a speed graph for the whole drive, and logs each slow sector it encounters.

  2. Prepare the Drive: To prevent the drive from locking up. Disable the read and write cache. Downgrade the interface speed to UDMA-2. Scan the drive in small blocks, such as 256 or 64 sectors. Scanning smaller blocks takes more time but less likely to lock up.

  3. Mark Slow Sectors:

    For NTFS file systems, use a tool like NTFSmarkbad. Extract the list of slow sectors from the log files into a spreadsheet. Use NTFSmarkbad with this list to mark the bad sectors so they are not used by the OS.

By following these steps, not only that you now know the distribution of worn-out sectors, but also able to tell the OS to individually avoid specific spots without partitioning it away.

2

I've had many drives like that, llf with manufacture's tools after changing the start position if that's where most of the bad sectors are and take 5-10% off the drive capacity. If it's a decent controller and software it'll use the unallocated as spares. I ran a WD 1800 cut down to 160 GB for 5 years without trouble until the controller was torched by a bad power supply. I am presently using a Samsung similarly for TV caps, removed 100 GB of a 2 TB, more errors in a transport stream than a drive would hope of introducing so it's not an issue for a while.

Hitachi, Samsung and WD llf tools seem to do a good job of remapping, don't know about Seagate yet as they've either went into disuse or suffered immediate catastrophic failure.

*Doing these things are a lot easier now with the ultimate boot disk.

Ty2010
  • 21
1

If you really want to risk your data on this disk (I wouldn't) then use dd to write the disk entirely to zeros.

dd if=/dev/zero of=/dev/sdX

This will cause the drive to reallocate the pending sectors and the whole surface of the disk will be usable. For a while ;-)