5

I have several machines with ConnectX-7 Infiniband cards and they're plugged into an Nvidia QM9700 switch. I've confirmed 400 Gbit NDR at both ends (ibstat on the host and in the console on the switch). The machines are running Ubuntu 22.04 and the Mellanox 5.8-3.0.7.0 drivers. I've done a lot of testing with ib_write_bw and the most I can get is ~251 Gbit/s. The actual test commands are:

Server side (host_a):

numactl -N 0 -m 0 ib_write_bw -d mlx5_4 -F --report_gbits

Client side (host_b):

numactl -N 0 -m 0 ib_write_bw -d mlx5_4 -F --report_gbits --run_infinitely host_b

The cards are in the correct numa domains to match numactl, but I've tried other combinations of that with no luck. Output ends up looking something like this:

---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_4
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x54 QPN 0x0058 PSN xxx RKey 0x1820e0 VAddr xxx
 remote address: LID 0x53 QPN 0x0058 PSN xxx RKey 0x1820e0 VAddr xxx
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      2353827          0.00               246.81             0.470754
 65536      2339084          0.00               245.27             0.467815
 65536      2338736          0.00               245.23             0.467746
 65536      2338574          0.00               245.22             0.467713
 65536      2338610          0.00               245.22             0.467720

I know this is probably a long shot, but wondering if anyone has actually achieved 400 gbit over infiniband with ib_write_bw that might know something we missed.

Evan
  • 497

1 Answers1

3

So the answer ended up being that we needed to set the PCI parameter MAX_ACC_OUT_READ to 128. Once that was set via mlxconfig -y -d mlx5_4 s MAX_ACC_OUT_READ=128 for each card and then power cycling the machines, throughput jumped from ~250 gbit to ~375 Gbit. Not 400 but I'll take it. To do each card:

apt-get install mlnx-ofed-all
mst start
for i in `mst status -v | grep 'net-ibp' | awk '{print $3}'` ; do mlxconfig -y -d $i s MAX_ACC_OUT_READ=128  ; done
Evan
  • 497