7

After a hard power cut and a UPS failure y ZFS pool is in a state I cannot understand:

$ zpool status -c serial
  pool: storage
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 1 days 10:40:40 with 0 errors on Wed Apr  2 10:40:42 2025
config:
    NAME                                                  STATE     READ WRITE CKSUM                serial
    storage                                               DEGRADED     0     0     0
      raidz2-0                                            DEGRADED     0     0     0
        8844532865098720143                               FAULTED      0     0     0  was /dev/sda1  ZL2AJ3S10000C1111G0H
        scsi-35000c500cafcbb67                            ONLINE       0     0     0  ZL2AJ3S10000C1111G0H
        scsi-35000c500cafc9a63                            ONLINE       0     0     0  ZL2AKR3F0000C1128SV6
        scsi-35000c500cafcb303                            ONLINE       0     0     0  ZL2AKQVX0000C1143G62
        scsi-35000c500cafcff33                            ONLINE       0     0     0  ZL2AKAG10000C11445AW
        scsi-35000c500cafc392b                            ONLINE       0     0     0  ZL2AKCWB0000C1143ARJ
        wwn-0x5000c500cafa8287                            ONLINE       0     0     0  ZL2AHSSL0000C107BQWN
        scsi-35000c500cafbec03                            ONLINE       0     0     0  ZL2AGE6X0000C1122SME
        7647119559265938125                               FAULTED      0     0     0  was /dev/sdi1  ZL2AGE6X0000C1122SME
        scsi-35000c500cafca18b                            ONLINE       0     0     0  ZL2AKR0B0000C1128RNJ
        scsi-35000c500cafc29c3                            ONLINE       0     0     0  ZL2AGDN30000C1140NTP
        scsi-35000c500cafbe293                            ONLINE       0     0     0  ZL2AKDSM0000C11278YB
      raidz2-1                                            DEGRADED     0     0     0
        scsi-SSEAGATE_ST16000NM002G_ZL2AKBXB0000C1126C6X  ONLINE       0     0     0  ZL2AKBXB0000C1126C6X
        1470086598115969130                               UNAVAIL      0     0     0  was /dev/sdy1          20342A6158FC
        wwn-0x5000c500cae0af8b                            ONLINE       0     0     0  ZL29T97Q0000C107188W
        12722321230162544658                              FAULTED      0     0     0  was /dev/sdl1  ZL2AKDSM0000C11278YB
        scsi-35000c500cafc3be7                            ONLINE       0     0     0  ZL2AJJZF0000C1143AH2
        scsi-35000c500cafc611f                            ONLINE       0     0     0  ZL2AKC6Z0000C11438R8
        scsi-35000c500cafcfb97                            ONLINE       0     0     0  ZL2AHY5R0000C11441Z5
        scsi-35000c500cafc8663                            ONLINE       0     0     0  ZL2AKBNX0000C1128RLR
        scsi-35000c500cafc9fa3                            ONLINE       0     0     0  ZL2AKR0Y0000C1128RN1
        scsi-35000c500cafc96b3                            ONLINE       0     0     0  ZL2AKR6T0000C1128SQP
        scsi-35000c500cafc2f23                            ONLINE       0     0     0  ZL2AK1NP0000C1143FXB
        scsi-35000c500cafc4ccf                            ONLINE       0     0     0  ZL2AKCKG0000C1143ETM
    logs
      nvme-INTEL_SSDPED1K375GA_PHKS01530050375AGN         ONLINE       0     0     0    PHKS01530050375AGN
    cache
      sdu                                                 FAULTED      0     0     0  corrupted data  ZL2AKR6T0000C1128SQP
      sdw                                                 FAULTED      0     0     0  corrupted data  ZL2AK1NP0000C1143FXB

There is a lot I don't understand in the above status:

  1. All the FAULTED disks have the same serial number of one of the ONLINE disks
  2. The UNAVAIL disk serial number is of one of the two SSDs that I use for cache, not of one of the HDDs used for the pool
  3. The serials of the two FAULTED cache disks are those of two storage HDDs

What could have happened to reduce the pool in this status, and is it recoverable? I thought about trying the procedure described here but I cannot even understand which are the faulted disks. Could the simple procedure described in this post work for my case? I really need help with this, thanks in advance.

ewwhite
  • 201,205

1 Answers1

8

You linked to one of my other posts. What operating system is this? I think it's more telling that the OS shuffled device names around than anything else.

Please export your pool and re-import it.

zpool import -d /dev/disk/by-id storage
ewwhite
  • 201,205