5

On the servers I have, with HDD or SSD, I have a cron that periodically runs:

/usr/sbin/smartctl --test=short/long /dev/sd1

(for each disk)

While it runs, it just looks at the output of /usr/sbin/smartctl -c /dev/sd1, looping until it no longer contains:

[0-9]+% of test remaining.

And then checks if it completed without errors:

(   0)  The previous self-test routine completed

However, it appears that smartctl doesn't yet support testing of NVMe, as of version 7.0, and as per: https://www.smartmontools.org/wiki/NVMe_Support

It does say that

The smartd daemon tracks health (-H), error count (-l error) and temperature (-W DIFF,INFO,CRIT)

but what does actually run the tests? I'm not sure if the output of -H and -l update unless we run short/long tests?

I also read about nvme-cli, but I don't seem to find ways of running health tests on disks with it.

Any ideas?

Using CentOS 7 here.

Nuno
  • 673
  • 3
  • 10
  • 28

2 Answers2

4

SMART self-test were conceived for mechanical disks. SATA SSDs almost completely mirrors earlier HDD interface-level behavior supporting such self-test but not doing very much when you run it, actually. NVMe drives dropped such SMART self-test routines entirely.

For flash-based disks one should really track cells wear, spare block count and reallocated sectors rather then relying on old self-test routines which are not supported on NVMe drives.

shodanshok
  • 52,255
4

Get the NVME test client installed

sudo apt install nvme-cli

Find the drive you want to check

nvme list
sudo nvme smart-log /dev/nvme0n1

There are some other self-test commands you can run with this command too, I believe these give the old short/long tests that smartctl did.

nvme device-self-test /dev/nvme0 -n 1 -s 1
nvme self-test-log /dev/nvme0n1
jamboNum5
  • 421