18

On Debian 8.7 I had a zfs pool. (obviously using ZFS on Linux, not Oracle or Solaris zfs)

It was needed to extend ZFS pool from mirror on 2 disks to raidz on 4 disks. I did backup (one copy of data - it was my first mistake)

I thought that zpool destroy would not work until I remove all datasets (volumes), so I did zfs destroy (this was my second mistake).

After that I issued 'zpool destroy', repartitioned all 4 disks and found out that backup is damaged.

So I started my recovery adventure: First good thing about ZFS is that it's able to import destroyed pools. After zpool destroy yourPoolName you can invoke zpool import -D to see list of destroyed pools. Your can then imoprt it using zpool import -D yourPoolName or if you have destroyed several pools with same name then you can import it by id, which is shown by zpool import -D.

zpool import -D requires partitions in their original place. It has to be exact up to sector. I have used fdisk to create partitions with exact start and end sector number. I have used cfdisk to set partition type (because it's more user friendly :) ) And then you should invoke partprobe in order to be sure that OS knows about changed partitions.

zpool import -D worked like a charm and I had my pool online in perfect health again!.. But with full consequences of zfs destroy - all the data was missing.

ZFS stores changes to files and file system in transactions, which are saved to disk in transaction groups (TXG) My further research has shown that I have to rollback last transaction groups.

There are 2 ways to rollback ZFS transaction groups:

  1. using special zpool import with -T option
  2. using zfs_revert-0.1.py

First of all you need to find last good TXG. zpool history -il helped me.

According to first way you should invoke something like: zpool import -o readonly=on -D -f -T <LAST-GOOD-TXG> poolName (with additional parameters, if you like: -F, -m, -R) Unfortunately this command worked only with actual TXG. Going back even to pre-last TXG didn't worked and showed error messages like "device is unavailable". It looks like this feature is working (or has worked) on Solaris only. Pity.

I have analyzed the code of the zfs_revert-0.1.py, it looks clear and promising. I have used this tool but it looks like I needed to delete too much TXGs. After that zpool import -D was unable to detect the pool anymore.

Currently I have recovered one of the older backups, I have dd dumps of 2 disks, which were mirrored but after zfs destroy and zpool destroy. It looks like we will just live with data from older backup and stop further recovery process. Nevertheless I will be glad to try to recover data if somebody will suggest what to do in such situation.

Further recovery would be done in VMWare Workstation, so I will need to find a way how to import zpool in a VM (disk IDs probably will change)

Question What can I try next?

Lessons learned:

  1. Always keep at least 2 copies of data. When you are manipulating main storage you need a backup of backup.
  2. zfs destroy is not needed and very dangerous if you are going to do zpool destroy anyway.

Comments: It's obvious that during recovery you should completely stop writes to disks where damaged data was stored.

Useful commands: zpool import -D zpool import -o readonly=on -D -f originalPoolName newPoolName zpool status tank

zpool online dozer c2t11d0
zpool scrub tank

zpool history -il

zpool export tank

zpool import dozer zeepool

Links:

  1. Tools
  1. Information about damaged ZFS
  1. ZFS Import

0 Answers0