1

I have two machines connected by two ConnectX-7 to each other.

When running ib_write_bw test, the BW average starts from 395Gb/s, which is very good. But the speed drops very fast to less than 250Gb/s.

pci status test log

2 Answers2

0

The CPU frequency is decreasing with each line shown in the test log.

Maybe you could try to set a more aggressive CPU frequency governor on both servers to prevent it ?

You can use mlnx_tune utility to generate a report that might help you finding the problem (I guess it's installed within Mellanox OFED), such as : mlnx_tune --report

To globally improve performance over Mellanox hardware, you can have a look at the Mellanox Support ToolKit that will give you a bunch of tools to understand and address performance issues.

0

This sounds similar to an issue I was having, I asked and answered it here if you want to try what worked for me: https://serverfault.com/a/1143987/110252

Evan
  • 497