I have a Windows server (2022) with two Samsung 990 Pro SSDs of 2TB. I've had some weird problems with one of them disappearing from time. What happens is that every 2 months or so, the disk in question, does not exist anymore: diskpart or Get-PhysicalDisk (in PS) simply do not list the disk anymore. The only thing to do at that time is a complete powerdown and restart, a simply restart in the OS is not sufficient.
At first I thought it was an issue with the motherboard, so I got in touch with the manufacturer and -surprise!- they told me to make sure it wasn't a problem with the disk. After some back and forth, I decided to explore a potential issue with the disks, simply to avoid the hassle of replacing the mobo and then still have the problem.
Examining the situation of the disks was not so easy, because this is Server Core installation, so no GUI, but I was able to do some analysis, which revealed a shocker: running MS's diskspd showed a completely abysmal performance for both disks. Both read and write are just below 50MiB/s which is way lower than the specs of the 990 Pro.
So I now have several questions:
- Are the two problems (disk disappearing from time to time) linked?
- Could the speed problem by caused by the motherboard (it is an ASRock X570S PG Riptide)?
- Could it be that the SSDs are counterfeit? And how can I check this?
- Any suggestions on further analyzing this?
Clarification:
- Server logs: nothing shows up in event viewer
- Age of the drives: they're a year old and haven't been used intensively
- Smart readings: This is the output I got from Samsung DC Toolkit:
Disk Number: 1:c | Model Name: Samsung SSD 990 PRO with Heatsink 2TB | Firmware Version: 0B2QJXG7
| Bytes | Description | Value |
|---|---|---|
| 0 | Critical Warning | 0x00 |
| 2:1 | Composite Temperature | 0x0142 |
| 3 | Available Spare | 0x64 |
| 4 | Available Spare Threshold | 0x0A |
| 5 | Percentage Used | 0x02 |
| 47:32 | Data Units Read | 0x000000000000000000000000011BD521 |
| 63:48 | Data Units Written | 0x000000000000000000000000010D94FB |
| 79:64 | Host Read Commands | 0x0000000000000000000000000DD8604F |
| 95:80 | Host Write Commands | 0x0000000000000000000000001282EACA |
| 111:96 | Controller Busy Time | 0x00000000000000000000000000009963 |
| 127:112 | Power Cycle | 0x00000000000000000000000000000020 |
| 143:128 | Power On Hours | 0x00000000000000000000000000001F93 |
| 159:144 | Unsafe Shutdowns | 0x00000000000000000000000000000014 |
| 175:160 | Media and Data Integrity Errors | 0x00000000000000000000000000000000 |
| 191:176 | Number of Error Information Log Entries | 0x00000000000000000000000000000000 |
| 195:192 | Warning Composite Temperature Time | 0x00040880 |
| 199:196 | Critical Composite Temperature Time | 0x00000000 |
| 201:200 | Temperature Sensor 1 | 0x0142 |
| 203:202 | Temperature Sensor 2 | 0x0149 |
| 205:204 | Temperature Sensor 3 | 0x0000 |
| 207:206 | Temperature Sensor 4 | 0x0000 |
| 209:208 | Temperature Sensor 5 | 0x0000 |
| 211:210 | Temperature Sensor 6 | 0x0000 |
| 213:212 | Temperature Sensor 7 | 0x0000 |
| 215:214 | Temperature Sensor 8 | 0x0000 |