2

I have an Ubuntu server connected to multiple VLAN networks over a single physical 1 Gbps network port. The network connections are configured via /etc/network/interfaces like this:

auto lo
iface lo inet loopback

auto eno1.395 iface eno1.395 inet dhcp vlan-raw-device eno1

auto eno1.453 iface eno1.453 inet static address 10.1.2.3 netmask 255.255.255.0 vlan-raw-device eno1

auto eno2 iface eno2 inet static address 192.168.1.2 netmask 255.255.0.0

That is, IP numbers are connected to eno1.395 (technically DHCP but public static IP in practice), eno1.453 (static IP) and eno2 (static IP). Interface eno1 doesn't have an IP number. Note that this is a pure server and will not route traffic between networks but it needs to communicate with other servers in multiple networks. This results in Ubuntu server applying following qdisc config by default:

$ tc qdisc list
qdisc noqueue 0: dev lo root refcnt 2 
qdisc mq 0: dev eno1 root 
qdisc fq_codel 0: dev eno1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
qdisc mq 0: dev eno2 root 
qdisc fq_codel 0: dev eno2 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
qdisc noqueue 0: dev eno1.395 root refcnt 2 
qdisc noqueue 0: dev eno1.453 root refcnt 2 

However, it seems that a long file transfer over eno1.395 network using cubic congestion control algorithm causes high latency for bursty traffic in eno1.453 network. The bursty traffic needs to transfer 0.1–2 MB of data with random delays with low latency and requires maybe max 50 Mbps on average but short spikes might use 300 Mbps for 10–30 ms.

Is the fq_codel on the eno1 device able to balance the traffic in VLAN networks eno1.395 and eno1.453? If not, is there some way to configure fq_codel so that it can balance traffic in both networks eno1.395 and eno1.453 at the same time (that is, drop packets in single hog connection over eno1.453 to reduce latency in eno1.395)?

I see that e.g. eno1.395 is currently running qdisc noqueue but if I add fq_codel at that level, I'm pretty sure it can only balance traffic within that single VLAN.

I know that I could configure static limits for the VLANs using tbf but that would prevent using the whole network connection nearly all the time and tbf will introduce extra latency when traffic exceeds about 5 Mbps if I've understood correctly. I would much prefer using full bandwidth and simply balance the traffic with fq_codel so that the single connection hogging all the bandwidth would be slowed down the most.

The RTT between all the problematic connections is about 0.25 ms. This allows cubic to very effectively take all the bandwidth available for the single long running connection transferring lot of data. I guess that using e.g. vegas congestion control algorithm could be one way to workaround this problem but I'm wondering if this is just incorrectly configured situation where fq_codel could work perfectly well if configured well.

Update 1:

Original wording used expression "traffic shaping" but it turns out that this expression should be used only for soft limiting traffic to some predefined throughput. This question is stricly about allowing use of full bandwidth but avoiding latency for bursty traffic. As such, I only want fair queuing with Active Queue Management (AQM).

I'm currently thinking that the problem is actually caused by the bursts which may transmit data even faster than I expected because when I change to congestion control algorithm vegas the induced latency goes away and the hog is limited to about 550 Mbps speeds.

I still haven't been able to figure out how to measure the actual bandwidth required for the bursty traffic. The problem is that the bursts are so short (typically way shorter than 30 ms) and this is a production server which limits the type of expirements I can do. I'm pretty sure the bursty traffic happens in long running TCP/IP connections that idle often so they may have to enter the "slow start" to start sending again. I'm currently thinking that the idle period could be long enough to allow congestion control algorithms such as cubic and even cdg for the hog to take over the full bandwidth and I'm seeing the induced latency in burst throughput because it ends "slow start" too early because of collision with hog traffic. I'm pretty sure this is not bufferbloat but about different TCP/IP streams getting different balance than I'd like to have.

Open questions:

  1. How to measure the actual throughput of bursty connections without having to poll e.g. /proc/net/netstat continuously? I have about 450-500 TCP/IP mostly idle connections that have bursty traffic that I would want to serve with minimal latency. The hog seems to limit the effective throughput these connections can use immediately after the idle period. I think I would need to reserve the max expected spike all the time for these connections to avoid latency for time period different TCP/IP streams stabilize again.
  2. How long can TCP/IP connection idle before it must start with slow start when it starts to retransmit? Is this adjustable? Is this affected by the hog traffic?
  3. Is it possible to disable tcp_slow_start_after_idle for given VLAN networks only? I think slow start after idle makes sense for connections going to the internet but I think disabling the slow start after idle for the VLAN 453 connections (which I know are always local and the network conditions are stable) would make a lot of sense. I see the basically same question has been asked in 2011 in LKML: https://lkml.iu.edu/hypermail/linux/kernel/1111.1/02240.html

Update 2:

I've been thinking this a bit more and I think I'm basically looking for equivalent of Linux cgroup process scheduling for VLAN networking. The cgroup allows grouping processes and defining that a collection of processes as a whole can take 50% of the CPU when the whole system is bottlenecked by the CPU but the collection of processes can take up to 100% when the system has any idle remaining.

I'd want to allow either VLAN connection to take 100% of the physical connection if there's no competing traffic. But traffic in both VLAN networks should have each 50% when both try to move bits at the same time and the connection is full. And this redistribution of the physical connection should be instant or nearly instant!

Considering that the actually available physical connection can be detected only by monitoring RTT, ECN marking or packet loss, I'm not sure if the connection can be shared correctly without some kind of delay until the fair share is measured. I think even with Linux cgroup balancing the minimum time delay for the balancing is 1/HZ seconds. I guess that the logical equivalent for networks would be RTT or RTT multiplied by some constant.

1 Answers1

4
  1. fq_codel is not a shaper, but a qdisc that does fair queuing and aqm. tbf and htb are shapers, as is cake in bandwidth mode.

  2. Up until this very moment I thought fq_codel peeled off the vlan headers, and would automatically balance the flows across vlans (and I'm one of the authors!). I suspect the real culprit is your switch or some other bottleneck along this path that is starving the flows you care about.

Simplest suggestion: Try:

tc qdisc replace dev eno1 root cake bandwidth 900Mbit # if you have cake

or tbf + fq_codel

to try and shift the bottleneck to your machine.

  1. simplest diagnostic - hit your first vlan up with a long running flow, simultaneous using mtr to measure at what hop your second vlan is acting up. If you have any hops to measure. flent's test suite is helpful for this.