Which filesystem for large LVM of disks (8 TB)?

Question

I have a Linux server with many 2 TB disks, all currently in a LVM resulting in about 10 TB of space. I use all this space on an ext4 partition, and currently have about 8,8 TB of data.

Problem is, I often get errors on my disks, and even if I replace (that is to say, I copy the old disk to a new one with dd then i put the new one in the server) them as soon as errors appear, I often get about 100 MB of corrupted data on it. That makes e2fsck go crazy everytime, and it often takes a week to get the ext4 filesystem in a sane state again.

So the question is : What would you recommend me to use as a filesystem on my LVM ? Or what would you recommend me to do instead (I don't really need the LVM) ?

Profile of my filesystem :

many folder of different total sizes (some totalling 2 TB, some totalling 100 MB)
almost 200,000 files with different sizes (3/4 of them about 10 MB, 1/4 between 100 MB and 4 GB; I can't currently get more statistics on files as my ext4 partition is completely wrecked up for some days)
many reads but few writes
and I need fault tolerance (I stopped using mdadm RAID because it doesn't like having ONE error on the whole disk, and I sometimes have failing disks, that I replace as soon as I can, but that means I can get corrupted data on my filesystem)

The major problem are failing disks; I can lose some files, but I can't afford lose everything at the same time.

If I continue to use ext4, I heard that I should best try to make smaller filesystems and "merge" them somehow, but I don't know how.

I heard btrfs would be nice, but I can't find any clue as to how it manages losing a part of a disk (or a whole disk), when data is NOT replicated (mkfs.btrfs -d single ?).

Any advice on the question will be welcome, thanks in advance !

c2h5oh · Accepted Answer · 2012-10-16T11:47:19.347

It's not file system problem, it's disks' physical limitations. Here's some data:

SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. That means that 1 byte per 12TB will be unrecoverably lost even if disks work fine.

This means that with no RAID you will lose data even if no drive fails - RAID is your only option.

If you choose RAID5 (total capacity n-1, where n = number of disks) it's still not enough. With 10TB RAID5 consisting of 6 x 2TB HDD you will have a 20% chance of one drive failure per year and with a single disk failing, due to URE you'll have 50% chance of successfully rebuilding RAID5 and recovering 100% of your data.

Basically with the high capacity of disks and relatively high URE you need RAID6 to be secure even again single disk failure.

Read this: http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

score 13 · Answer 2 · answered Oct 08 '12 at 17:58

Do yourself a favor and use a RAID for your disks, could even be software RAID with mdadm. Also think about why you "often get errors on your disks" - this is not normal except when you use cheap desktop class SATA drives instead of RAID grade disks.

After that, the filesystem is not that important anymore - ext4, xfs are both fine choices.

score 8 · Answer 3 · answered Oct 08 '12 at 18:36

8

I've had good luck with ZFS, you could check to see if it's available on whatever distro you use. Fair warning, it'll probably mean rebuilding your whole system, but it gives really good performance and fault-tolerance.

answered Oct 08 '12 at 18:36

TMN

181
2

score 8 · Answer 4 · answered Oct 08 '12 at 20:04

I add new disks of greater sizes progressively

Since you are interesting in using LVM, and you want to handle multiple drives, the simple answer would be to just use the mirror feature that is part of LVM. Simply add all the physical volumes into your LVM. When you are creating a logical volume pass the --mirrors option. This duplicates your data.

Another option might be to just setup several RAID1 pairs. Then add all the RAID1 volumes as PVs to your VG. Then whenever you want to expand your storage, just buy a pair of disks.

score 7 · Answer 5 · answered Oct 08 '12 at 18:17

You should really be using a RAID 5, 6, 10, 50, or 60. Here's some resources to get you started:

background info about RAIDs

howto's & setup

Check out my delicious links for additional RAID links: http://delicious.com/slmingol/raid

score 4 · Answer 6 · answered Oct 08 '12 at 23:28

If you're really worried about data corruption, I would recommend a checksummed filesystem such as zfs and btrfs -- though note that btrfs is still considered to be in-development and not production-ready.

There is no gurantee that the data read (even successfully read) from a disk will be correct. Blocks have checksums, but they're simple checksums that don't always catch errors. Newer filesystems like ZFS attach more capable checksums to files and can (and reportedly do) catch and repair data errors not noticed by the hard disk or RAID controller.

score 1 · Answer 7 · edited Apr 13 '17 at 12:14

As @c2h5oh says, the Unrecoverable is critical - it means the disk has already tried and failed to re-read the sector.

In my experience, once a disk starts producing unrecoverable read errors (UREs), some data is lost forever, and your only hope is to immediately backup all data using GNU ddrescue, which can retry the failing sectors as well as skip unrecoverable ones.

Assuming you have backups, they may well have failed due to the UREs, and will certainly have some corrupt files, so you will have to piece together a full set of data from various backups of the same filesystem.

The other answers recommending ZFS are worth reading, as its continuous data scrubbing and RAID features will help keep your data safer in future - though still not a substitute for backups, which also protect against user and admin errors.

I would only use LVM if you don't need snapshots - it doesn't integrate so well with RAID, doesn't include data scrubbing / data checksums, and you still need backups, so something like ZFS is probably a better option. See this answer on LVM problems and risks for more.

Which filesystem for large LVM of disks (8 TB)?

7 Answers7