I'm currently on test of VMware NSX network environment and met some trouble.
My Environment is:
- Management Cluster with 3 Hosts and NSX components on 2 dedicated Hosts
- Compute Cluster with 2 Hosts
- Single 1Gbps Switch
- vSphere version 6.0 and NSX version 6.2
- One dedicated UTP line per all Host for Management and iSCSI(VLAN tagged)
- One dedicated UTP line per all Host for Transit Network(for VM traffic)
- One dedicated UTP line per Management Host for External Network
When a VM V on Host H send data to VM W on Host I over NSX network, heavy restransmission is occurred. I tested many cases below:
Cases with Problem:
Vsend about 20MB toWin single session: retransmission at around 19MBVsend about 50MB toWin single session: retransmission at 19MB onlyVsend about 2MB toWin 30 concurrent sessions: retransmission at random position.
When this condition, I found some packet order mismatches (maybe cause of retransmission) on packet dump from H's vmnic(uplink), and delayed packets are uniq(not occur previously on dump), but on dump from vDS downlink to VM V or sfw of V, they are occurred twice(original packets and retransmitted packets).
So I think, the problem is some lost packets on sender side stack especially between VM V and Host H's Physical NIC.
To divide the data path/stack into two sectors and to check independantly, I tested same cases with another destination VM X on same Host H. then I got clean dump and I found there is no retransmissions problem between VMs on same Host. (so I think, there is no error point on vDS itself and above.)
Next, I tested cases below to check the problem is related on heavy data traffic or heavy filtering and/or encapsulation or not:
- same test with
Network I/O Controlenabled: same problem - same test without
Network I/O Control: same problem with some diffs. - same test but slowdown the throuput with
N I/O C Limit: same problem - same test with
TSOdisabledvnicofV(e1000 driver): same problem - same test with
vDSMTU9000: same problem with more Question
Some different things are:
When Network I/O Control is enabled, At first, RTT is increased just before the restransmission and then after retransmission os completed, RTT values are in stable range.
But when Network I/O Control is disabled, RTT after restransmission also incleased again as same as start.
One ore strange thing is although I set MTU to 9000, the size of UTP packets which is embed VxLAN packets are under 1600. so effect of MTU 9000 is not affected.
I'm on trouble. can I get some helps? Thanks.
EDIT ---
If the VMs are on the normal, NSX disabled, vDS, all is fine.
EDIT* Is there any similar issues on OpenvSwitch?