mkfs Operation Takes Very Long on Linux Software Raid 5

Question

I've set-up a Linux software raid level 5 consisting of 4 * 2 TB disks. The disk array was created with a 64k stripe size and no other configuration parameters. After the initial rebuild I tried to create a filesystem and this step takes very long (about half an hour or more). I tried to create an xfs and ext3 filesystem, both took a long time, with mkfs.ext3 I observed the following behaviour, which might be helpful:

writing inode tables runs fast until it reaches 1053 (~ 1 second), then it writes about 50, waits for two seconds, then the next 50 are written (according to the console display)
when I try to cancel the operation with Control+C it hangs for half a minute before it is really canceled

The performance of the disks individually is very good, I've run bonnie++ on each one separately with write / read values of around 95 / 110MB/s. Even when I run bonnie++ on every drive in parallel the values are only reduced by about 10 MB. So I'm excluding hardware / I/O scheduling in general as a problem source.

I tried different configuration parameters for stripe_cache_size and readahead size without success, but I don't think they are that relevant for the file system creation operation.

The server details:

Linux server 2.6.35-27-generic #48-Ubuntu SMP x86_64 GNU/Linux
mdadm - v2.6.7.1

Does anyone has a suggestion on how to further debug this?

score 6 · Answer 1 · answered Mar 20 '11 at 15:17

I suspect you're running into the typical RAID5 small write issue. For writes under the size of a stripe size, it has to do a read-modify-write for both the data and the parity. If the write is the same size as the stripe, it can simply overwrite the parity, since it knows what the value is, and doesn't have to recalculate it.

dtoubelis · Accepted Answer · 2011-03-21T05:20:04.467

I agree, that it may be related to stripe alignment. From my experience creation of unaligned XFS on 3*2TB RAID-0 takes ~5 minutes but if it is aligned to stripe size it is ~10-15 seconds. Here is a command for aligning XFS to 256KB stripe size:

mkfs.xfs -l internal,lazy-count=1,sunit=512 -d agsize=64g,sunit=512,swidth=1536 -b size=4096 /dev/vg10/lv00

BTW, stripe width in my case is 3 units, which will be the same for you with 4 drives but in raid-5.

Obviously, this also improves FS performance, so you better keep it aligned.

score 3 · Answer 3 · answered Mar 20 '11 at 23:11

Your mkfs and subsequent filesystem performance might improve if you specify the stride and stripe width when creating the filesystem. If you are using the default 4k blocks, your stride is 16 (RAID stripe of 64k divided by filesystem block of 4k) and your stripe width is 48 (filesystem stride of 16 multiplied by the 3 data disks in your array).

mkfs.ext3 -E stride=16 stripe-width=48 /dev/your_raid_device

score 0 · Answer 4 · answered Aug 19 '20 at 19:41

You should really look at the block group size (-g option on mkfs.ext*). I know the man page says you can ignore this, but my experience very much shows that the man page is badly wrong on this. You should adjust your block group size in a way that ensures that your block groups don't all start on the same disk, but instead rotate evenly around all the disks. It makes a very obvious difference to performance. I wrote an article on how to optimise file system alignment which you may find useful.

mkfs Operation Takes Very Long on Linux Software Raid 5

4 Answers4

Linked