0

I've the following WD drive (3TB) that gave me a problem (I was unable to access any file: even an ls command on it caused a never ending wait).

Here some details on the disk:

Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: EZRX-00D8PB0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt

Device Start End Sectors Size Type /dev/sda1 2048 5860532223 5860530176 2.7T Linux filesystem

After this problem I run some test to discover what kind of problem is affecting it. As first step I run a short test on it sudo smartctl -t short /dev/sda that shown me the following error:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17480         8467144

Then I tried to get some attributes as described in this other post Understanding smartctl -a output using sudo smartctl -a /dev/sda. Here you can find the attribute table and the 5 most recent error log:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       71
  3 Spin_Up_Time            0x0027   174   161   021    Pre-fail  Always       -       6266
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       695
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       17481
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       457
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   179   179   000    Old_age   Always       -       64193
194 Temperature_Celsius     0x0022   122   101   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       356
198 Offline_Uncorrectable   0x0030   197   197   000    Old_age   Offline      -       1691
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   196   196   000    Old_age   Offline      -       1691

SMART Error Log Version: 1 ATA Error Count: 47 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 47 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH


04 61 0a 00 00 00 00

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


e0 00 0a 00 00 00 00 00 04:00:17.522 STANDBY IMMEDIATE ef 03 46 00 00 00 a0 00 04:00:16.815 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:16.815 IDENTIFY DEVICE

Error 46 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH


04 61 46 00 00 00 a0 Device Fault; Error: ABRT

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


ef 03 46 00 00 00 a0 00 04:00:16.815 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:16.815 IDENTIFY DEVICE e1 00 0f 00 00 00 00 00 04:00:15.095 IDLE IMMEDIATE ef 03 46 00 00 00 a0 00 04:00:14.575 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:14.575 IDENTIFY DEVICE

Error 45 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH


04 61 0f 00 00 00 00

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


e1 00 0f 00 00 00 00 00 04:00:15.095 IDLE IMMEDIATE ef 03 46 00 00 00 a0 00 04:00:14.575 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:14.575 IDENTIFY DEVICE

Error 44 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH


04 61 46 00 00 00 a0 Device Fault; Error: ABRT

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


ef 03 46 00 00 00 a0 00 04:00:14.575 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:14.575 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 04:00:12.170 SET FEATURES [Set transfer mode]

Error 43 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH


04 61 46 00 00 00 a0 Device Fault; Error: ABRT

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


ef 03 46 00 00 00 a0 00 04:00:12.170 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:12.170 IDENTIFY DEVICE e1 00 0f 00 00 00 00 00 04:00:10.445 IDLE IMMEDIATE ef 03 46 00 00 00 a0 00 04:00:09.925 SET FEATURES [Set transfer mode] ec 00 00 00 00 00 a0 00 04:00:09.925 IDENTIFY DEVICE

SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed: read failure 90% 17480 8467144

SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Then I tried to inspect on the LBA_of_first_error (8467144) and, following a part of this guide, I run sudo sg_verify --lba=8467144 /dev/sda obtaining the following output that confirms me that there is a hardware failure:

verify(10):
Fixed format, current; Sense key: Medium Error
Additional sense: Id CRC or ECC error
VERIFY(10) medium or hardware error near lba=0x8132c8

As final step I tried to reassign the block without success sudo sg_reassign --address=8467144 /dev/sda:

REASSIGN BLOCKS: Illegal request, Invalid opcode
sg_reassign failed: Illegal request, Invalid opcode

So, to summarize, did I miss some step on this disk investigation? Is my drive dead or can still be used? I am not able to understand if there are some bad error form the SMART Attribute list, can you help me understanding if the drive have further errors?

Timmy
  • 101

0 Answers0