Since about 18 months I run a cluster of Rockchip RK3588 based systems with dual Realtek 8125 ports. The systems have been running Ubuntu 22.04, 24.04 and 24.10 and kernels 5.10, 6.1, 6.8, 6.9, 6.11 and now 6.12, some with the mainline r8169 driver, some with Realtek r8125 driver and some with my rewrite of the Realtek r8125 driver. Some systems are connected to a business grade 10Gb switch (HPE), some to a consumer grade 2.5Gb switch (TPlink TL-SG105-M2) and some are directly connected using short cables. All cables are quality CAT8 (achieve full 10Gb on 10Gb NIC's).
In general, both drivers have been performing very well:
- Transmit performance (iperf3, 12 minute stress test): 2.47 Gb/s (MTU 9000), 2.35 Gb/s (MTU 1500) sustained
- Receive performance (iperf3, 12 minute stress test): 2.47 Gb/s (all MTU).
when measured on 10Gb switch or direct connect.
With the TPlink switch, receive performance degrades as time goes by due to transmit collisions (measured wih iperf3 -R option) to about 1.50 Gb/s after ~3 months. A power cycle is required to restore 2+Gb/s performance but that requires pulling the plug (no reset button) which for quite some owners has caused the switch to die!
Functionality-wise, the Realtek 8125 driver offers MSI/MSIX messaging support, 4 RSS queues and 2 transmit queues, as well as PTP support (although I never was able to get that working correctly) over and above the mainline r8169 driver.
Stability-wise, there have been a few bugs, but on "regular" heavy load (e.g. TCP, docker, kubernetes, NFS, Samba, clusters FS etc.) it has ben rock solid:
- on both drivers: wrong number of fragments in cornerstone cases (fixed on r8169 since April 2024)
- on r6169: 8125 chip bug with UDP packets (fixed on r8169, definately since 6.8)
- on r8125: checksum issues on heavy UDP load (not fixed yet)
- on r8125: kernel hang in certain PTP tests
Some systems have been running 3 months 24/7 without a single failure.
I ended up rewriting the Realtek 8125 driver, fixing some issues (the 8 core RK3588 stresses the 6 TX/RX queues and ARM64 cache lines and some memory barriers were missing or not placed correctly), but most importantly reducing system CPU consumption, improving throughput by 20% when all 4 RSS and 2 TX queues are 100% loaded @2.5Gb and code size by 50%. Currently trying to fix PTP, making progress but not fully working yet (trial and error as Realtek does not provide HW documentation).
Edit: the TPlink switch itself is not the only thing to take into account.
Also (parameters of) iperf3 can significantly change the results. For example iperf3 -u -c <ip> yields an abysmal 1Mb/s UDP result. iperf3 -u -c <ip> -b 0, which allows for unlimited bandwidth, yields a more respective 1.62Gb/s, but iperf3 -u -c <ip> -b 10GB performs strange enough at the full 2.5Gb/s (also on the TPlink switch).