10

I've set-up a Linux software raid level 5 consisting of 4 * 2 TB disks. The disk array was created with a 64k stripe size and no other configuration parameters. After the initial rebuild I tried to create a filesystem and this step takes very long (about half an hour or more). I tried to create an xfs and ext3 filesystem, both took a long time, with mkfs.ext3 I observed the following behaviour, which might be helpful:

  • writing inode tables runs fast until it reaches 1053 (~ 1 second), then it writes about 50, waits for two seconds, then the next 50 are written (according to the console display)
  • when I try to cancel the operation with Control+C it hangs for half a minute before it is really canceled

The performance of the disks individually is very good, I've run bonnie++ on each one separately with write / read values of around 95 / 110MB/s. Even when I run bonnie++ on every drive in parallel the values are only reduced by about 10 MB. So I'm excluding hardware / I/O scheduling in general as a problem source.

I tried different configuration parameters for stripe_cache_size and readahead size without success, but I don't think they are that relevant for the file system creation operation.

The server details:

  • Linux server 2.6.35-27-generic #48-Ubuntu SMP x86_64 GNU/Linux
  • mdadm - v2.6.7.1

Does anyone has a suggestion on how to further debug this?

user9517
  • 117,122

4 Answers4

6

I suspect you're running into the typical RAID5 small write issue. For writes under the size of a stripe size, it has to do a read-modify-write for both the data and the parity. If the write is the same size as the stripe, it can simply overwrite the parity, since it knows what the value is, and doesn't have to recalculate it.

malcolmpdx
  • 2,408
4

I agree, that it may be related to stripe alignment. From my experience creation of unaligned XFS on 3*2TB RAID-0 takes ~5 minutes but if it is aligned to stripe size it is ~10-15 seconds. Here is a command for aligning XFS to 256KB stripe size:

mkfs.xfs -l internal,lazy-count=1,sunit=512 -d agsize=64g,sunit=512,swidth=1536 -b size=4096 /dev/vg10/lv00

BTW, stripe width in my case is 3 units, which will be the same for you with 4 drives but in raid-5.

Obviously, this also improves FS performance, so you better keep it aligned.

dtoubelis
  • 4,797
  • 1
  • 31
  • 32
3

Your mkfs and subsequent filesystem performance might improve if you specify the stride and stripe width when creating the filesystem. If you are using the default 4k blocks, your stride is 16 (RAID stripe of 64k divided by filesystem block of 4k) and your stripe width is 48 (filesystem stride of 16 multiplied by the 3 data disks in your array).

mkfs.ext3 -E stride=16 stripe-width=48 /dev/your_raid_device
sciurus
  • 12,958
  • 3
  • 33
  • 51
0

You should really look at the block group size (-g option on mkfs.ext*). I know the man page says you can ignore this, but my experience very much shows that the man page is badly wrong on this. You should adjust your block group size in a way that ensures that your block groups don't all start on the same disk, but instead rotate evenly around all the disks. It makes a very obvious difference to performance. I wrote an article on how to optimise file system alignment which you may find useful.