ZFS: bringing a disk online in an unavailable pool

Question

I have a home server using FreeBSD and ZFS that has worked well for the last 5 years, and on several occasions I have successfully replace faulty disks.

However, today a minor disaster has happened, and I'm hoping to find a solution.

I have a top-level pool that consists of 3 vdevs, each of which is a raidz1 pool, so up to 3 disks can fail -- assuming that they all belong to different vdevs -- and data integrity is intact.

Yesterday, I noticed quite a few errors being reported by 1 disk in 1 vdev. From past experience, this usually indicates that the disk is about to fail, so I do what I usually do:

Offline the disk: zpool offline tank gpt/ta4
Physically replace the disk
Set up the new disk with gpart, and then zpool replace tank gpt/ta4

However, this time between steps 2 and 3 disaster struck: when I powered on the server after installing the new drive, I smelled something burning, and my HBA indicated that 4 of the drives weren't available! By a stroke of unbelievably bad luck, there must have been some voltage surge, because another drive in the same vdev (gpt/ta2) is now completely dead, and visual inspection reveals one of the MOSFETs on the PCB is blown.

So now gpt/ta2 is UNAVAIL and gpt/ta4 is OFFLINE, so obviously the vdev, which is raidz1, is also UNAVAIL.

My questions are: 1) Is there a way to bring gpt/ta4 back online? When I try to issue "zpool online tank gpt/ta4", it tells me that the pool is unavailable, so I can't do so. I can understand why this may be so, but I was thinking that gpt/ta4, although experiencing some read errors, was basically still a 'good' member of the raidz1 pool prior to taking it offline (zpool status reported there were no known data errors). Is there anyway to achieve this?

2) Failing that, is there a way to at least bring the remainder of my top-level pool (which consists of 3 raidz1 vdevs) online? The other 2 vdevs are perfectly fine.

Please help, I have a lot of precious data on it :-)

Thanks in advance.

score 8 · Accepted Answer · answered Dec 22 '13 at 19:14

Not that it helps you at this juncture, but this is precisely why you'll never see me advising people use raidz1 - and for mirror sets, if they're using huge disks, often suggesting triple-mirrors.

It is /unlikely in the extreme/ that any act you can take is going to get tank back online. I must start with that, so as not to raise your hopes.

1: Make sure the disks are safe - even if that means unplugging all of them.

2: Update to the latest version of FreeBSD - you want the latest ZFS bits you can get your hands on.

3: Put the original gpt/ta4 (that is supposedly 'OK' and just experiencing read errors) back in the system or into a new system with newer ZFS bits (as well as all the others if you've removed them), boot it, and run, in order until one works (be forewarned - these are not safe, especially the last one, in that in their attempts to recover the system they're likely to roll back and thus lose recently written data):

zpool import -f tank
zpool import -fF tank
zpool import -fFX tank

If all 3 fail, you're outside the realm of "simple" recovery. Some Googling for 'importing bad pools', 'zdb', 'zpool import -F', 'zpool import -X', 'zpool import -T' (danger!), and the like, might provide you some additional blogs and information on recovery attempts made by others, but it's already on very dangerous and potentially further-data-damaging ground at that point and you're rapidly entering territory of paid recovery services (and not from traditional data recovery companies, they have zero expertise with ZFS and will not be of any use to you).

Note: A more precise and 'safer' method would be to 'zpool import -o readonly=on -f -T [txg_id] tank'. However, for this to work, you'd need to use zdb on your own, first, to locate a seemingly healthy recent txg_id, and I'm not prepared to try to explain all that here. Google will be your friend here - take no action until you've read sufficient information to feel somewhat comfortable with what you're doing. Trust no single source.

Note 2: the 'safest' thing to do would be to immediately contact someone capable of ZFS recovery services.

Note 3: the next 'safest' thing to do would be to put the drives in a safe system and dd each entire raw drive to a new disk, as well, giving you, theoretically, identical copies of your disks, but that would mean you'd need a like number of new disks, preferably of similar or identical size/type to the old ones but not strictly necessary. And only then attempt any of the above on one set of the drives while keeping the others aside for safe-keeping.

ZFS: bringing a disk online in an unavailable pool

1 Answers1