QoS or more bandwidth?

Question

Being in SP field I sometimes hear folks say "we do not need QoS, just let's make sure we have enough bandwidth". This has always been bothering me, because currently we have a network without any QoS in the core and there are no problems with services (Internet, IPTV, VPN).

The core links (bundled 10Gs) are utilized in 50%-70% at peak time.

Correct me if I'm wrong, but I think that we are just lucky enough to carry traffic that is not that bursty to fill up the links.

My understanding is that because of bursty nature of IP traffic QoS should be implemented on core links regardless of their utilization, or in the worst case, when the utilization reaches 50%.

EDIT:

Thanks for your comments. Actually my primary concern are microbursts. I understand the result of a microburst as a packet drop which occur when there is no noticeable congestion on an interface. I also understand that microbursts happen in every packet network.

I was curious if there is any "magic" link utilization percentage below which microbursts are harmless? (and we should be safe without QoS). I think I saw 50% as a best practice in some QoS book, but I don't remember the details.

score 4 · Accepted Answer · answered Apr 14 '15 at 07:33

There are at least two questions in your one question :)

"Throwing more bandwidth" at the problem doesn't solve anything, it just hides the problem. In IP network you should always prepare baseline QoS architecture to guarantee delivery of critical traffic and manage other traffic parameters (like jitter or delay) for different classes of traffic. You can take a look into NANOG presentation archives for extensive amount of material generated in that kind of discussions, with usual outcome being "yeah, maybe we should do QoS".

Traditionally, QoS is perceived as the "hard thing to do right" as the PHB model for IP network is indeed quite complex to plan and implement right. Historically, it was usually also connected with specific hardware requirements, architectures and configuration complexity which didn't help. But when you look at traditional SPs and their networks - they generally implement at least 3-4 classes of traffic and QoS policies to manage traffic flow within their networks. Throught last couple of years traditional certification testing tends to move from testing 4 classes and queues to 8-16 for transport network.

OTOH, not having any QoS in the network usually also means those saying "QoS is not needed, I don't have it and everything works OK" have no actual means of monitoring how the network behaves and what is the actual environment applications have for their own use. TCP has great adaptability to network conditions and sometimes problems are not visible with "naked" eye, but become painfully obvious when we dig into details and bit flows.

As for the second part of your question - there is nothing that can help you to fight with microburst apart from having deep enough buffers to accomodate them. Which leads almost immediately to things like buffer bloat and additional delays on the path if you tend to simplify and throw packet buffers with memory fast enough to actually deal with microburst (which is not simple and cheap). QoS unfortunately (at least - the mechanisms available in the usual toolset of networking gear) doesn't help 'control' microburst. Good news however is that You'll find microburst dangerous or damaging usually only in HPC and generally DC environments, not in typical transport networks.

matteo · Answer 2 · 2016-08-28T08:35:48.220

QoS takes action when there is a congestion. So, yes, your team mates might be right when saying that a link used at 50 to 70 percent doesn't need QoS.

First, let's think to a theoretical link of 1 bit per second with a clock rate of 1 second (meaning that there would be 1 wire that transmit either 1 or 0 for 1 second, because destination wouldn't be able to catch the value if the signal is shorter): until the traffic that we need to send is of 1 bit per second, we just put that bit into the wire. No QoS is needed.

But if we receive 2 bits per second from a faster link (a LAN for instance), these 2 bits need to be forwarded to the 1bit/s link, and so we need to either queue or drop 1 of the 2 packets we received, while forwarding the other 1 bit. Here QoS should be used to decide what bit must be forwarded first, and what should we do with the other one (basically, drop or queue).

Second, in a real world situation, we have links that have a fixed bandwidth and that can transmit only at that bandwidth; for instance, an Ethernet 100M full duplex can send and receive data at 100Mbps only. If we're connected to an ISP with a 100M Ethernet link but we pay for 50Mbps, our link must send frames/packets at 100Mbps. To achieve the 50Mbps we need to do something like transmitting at 100Mbps for an half second, and than wait another half second without transmitting anything, obtaining the average of 50Mbps in the time of 1 second. In this example, a burst may allow to transmit at 100Mbps for 1 full second if we didn't transmit anything in the previous 1 second.

With these concepts in mind, we can understand that the link used at 50%, that hasn't any burst above the link capacity, will never be congested and QoS won't be used. On the other side, in a real world, it's rare to spend a lot of money for a WAN link that is never fully utilised (but it might be not in a LAN); also, peaks of traffic happen in a usually unforeseeable manner. Consequently, moments of congestion should be taken in account in a good plan, in order to permit the flow of the critical traffic while sacrificing the non critical one.

QoS is quite complex anyway, if you're interested I wrote this column: https://www.matteo.site/blog/post/quality-of-service/

score -1 · Answer 3 · answered Apr 14 '15 at 05:02

QoS is complicated. It requires absolutely correct marking across the entire edge of your network and correct trust/queueing internal to your network to guarantee operation. Additionally the software/hardware operating in your network require different syntax and can affect QoS capabilities and operation. This is not to say that QoS is not a valid solution when you cannot afford to increase link capacity. Also with jitter-sensitive SLA-based traffic (SIP trunk/hosted VoIP) it is usually worthwhile to configure a priority queue in addition to a 'standard' queue.

Adding bandwidth solves the long-term problem of insufficient capacity without the complications involved in configuring and monitoring QoS performance, and then still having to add bandwidth when the link is over-saturated in some shorter-than-expected time period.

QoS or more bandwidth?

3 Answers3

Linked