5

I have four NVMe drives in a RAID 0 configuration.

I am attempting to determine how many IOPS the array is handling.

When I run iostat, it appears that one drive is handling more IO than the other three drives.

Is this an error with the way that iostat collects data, a known issue with mdadm, or have I misconfigured the array?

Usage Details.

# iostat

Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn nvme0n1 1669.12 22706.35 13975.13 63422465065 39034761844 nvme3n1 753.28 13228.56 12185.39 36949483692 34035736524 nvme1n1 635.93 13781.47 14014.10 38493855272 39143630456 nvme2n1 744.35 14704.94 14283.13 41073264648 39895068820 md0 4291.15 72863.78 56468.04 203520212237 157724286024

Software RAID device details

# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Fri Feb 19 22:45:06 2021
        Raid Level : raid0
        Array Size : 8001060864 (7630.41 GiB 8193.09 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent
   Update Time : Fri Feb 19 22:45:06 2021
         State : clean 
Active Devices : 4

Working Devices : 4 Failed Devices : 0 Spare Devices : 0

    Chunk Size : 512K

Consistency Policy : none

          Name : eth1:0
          UUID : 2e672c70:de98a756:160877d2:d8fe2c94
        Events : 0

Number   Major   Minor   RaidDevice State
   0     259        1        0      active sync   /dev/nvme0n1p1
   1     259        5        1      active sync   /dev/nvme1n1p1
   2     259        7        2      active sync   /dev/nvme2n1p1
   3     259        3        3      active sync   /dev/nvme3n1p1

Block Devices

# lsblk

nvme0n1 259:0 0 1.9T 0 disk
└─nvme0n1p1 259:1 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0 nvme3n1 259:2 0 1.9T 0 disk
└─nvme3n1p1 259:3 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0 nvme1n1 259:4 0 1.9T 0 disk
└─nvme1n1p1 259:5 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0 nvme2n1 259:6 0 1.9T 0 disk
└─nvme2n1p1 259:7 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0

File System Details

# dumpe2fs -h /dev/md0
dumpe2fs 1.44.5 (15-Dec-2018)
Filesystem volume name:   QuadSSD
Last mounted on:          /mnt/raid0
Filesystem UUID:          8b33fb9d-1f98-44ff-a012-38ac10ffece3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              250036224
Block count:              2000265216
Reserved block count:     100013260
Free blocks:              1759673576
Free inodes:              249676044
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         4096
Inode blocks per group:   256
RAID stride:              128
RAID stripe width:        512
Flex block group size:    16
Filesystem created:       Tue Mar  2 22:54:32 2021
Last mount time:          Sun Mar 14 15:55:16 2021
Last write time:          Sun Mar 14 15:55:16 2021
Mount count:              4
Maximum mount count:      -1
Last checked:             Tue Mar  2 22:54:32 2021
Check interval:           0 (<none>)
Lifetime writes:          14 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f8a38f43-4d67-4137-972d-db2f8650ffad
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x3f3be24d
Journal features:         journal_incompat_revoke journal_64bit journal_checksum_v3
Journal size:             1024M
Journal length:           262144
Journal sequence:         0x06a3502a
Journal start:            154915
Journal checksum type:    crc32c
Journal checksum:         0x963b1ac7

Notes:

  • I also see similar results running iostat 10 (nvme0n1 consistently has higher usage than the other drives)
  • The array/drive was never as a root partition.
  • Some output has been abbreviated. For example, other block devices are in the system.
shodanshok
  • 52,255

1 Answers1

2

The apparent bigger usage of the first device probably is an artifact of read alignment.

You have a 4x 512K chunk RAID0, meaning that the first device hits for any read aligned at 2 MB boundary. Both 2 MB and 4 MB are common alignment values for applications (ie: LVM physical chunks are 4 MB big by default), so the first drive can appear as more stressed than the others.

For a more in-depth (and correct) evaluation, you should observe your drives behavior during a typical real world test (or a reasonable approximation done via fio).

shodanshok
  • 52,255