Service discovery and load balancers with high-frequency health checks

Question

I am trying to understand the advantages of service discovery compared to load-balancing within the context of a microservice mesh where many instances/nodes/VMs/containers of many web services are all calling each other in a complicated web-pattern. And my understanding of the situation is this:

Service discovery -- when implemented properly -- keeps each instance of each service aware of all the valid IPs/addresses of all the services it cares about
Each service can then employ its own client-side load-balancing to determine which instances of which services it makes a request to
Traditional service-level load-balancers have the problem where an instance sitting behind them becomes unhealthy or goes offline, and they continue serving traffic to it

Hence let's say we have Microservice A and B, and A needs to make requests to B. In a traditional load balanced setting, A and B would be behind their own load balancers respectively, say, https://a.example.com and https://b.example.com. When any instance of A need to make a request to B, it has https://b.example.com in its configs as the host to make contact with, and the load balancer there will farm the request out to whatever healthy nodes sit behind it.

The problem here, again, is that a B instance might have gone offline since the last health check to it (from the B balancer) and now that B balancer unknowingly sends a request off to it.

Also, the load balancer still needs to know which instances to route traffic to, which is a configuration that typically needs to be done manually. The solution to this, in my mind, is have automated registration/de-registration of instances with their load balancer.

So properly-implemented service discovery swoops in and takes care of this scenario. The service discovery tool (Consul, etc.) is constantly pinging each instance of both service A and B and letting each instance of A know the hosts/IPs/etc. of each healthy instance of B. But now its up to A to come up with its own client-side balancing solution for which healthy instance of B it connects to.

So to begin with, if anything I've said above is incorrect or misled, please begin by correcting me! Assuming I'm more or less understanding the lay of the land here, I'll move on to my question:

My question

To me, this just seems like a timing issue with balancer health checks. If the balancer can be written/configured to check the health of the instances sitting behind it just as regularly as the service discovery (Consul, etc.) tool would, and if the instances are always registering themselves with their load balancer at startup time, then the balancer will have the same information available to it as you would have in a service discovery paradigm. No?

Quasi-believable, pseudo-real world example to work with

Say I have an Order Service (orderws) that has a "order item" endpoint exposed at, say, POST /v1/orders
Let's say all the orderws nodes are sitting behind a load balancer at https://orderws.example.com
Say this orderws calls 2 other web services:
- a Payment WS to make payments and transfer money, paymentws exposing a POST /v1/payments endpoint with all nodes behind load balancer at https://payments.example.com; and
- a Shipping WS (shippingws) that, upon successful payment of an order, results with the ordered item being taken off the shelf at a warehouse and shipping to the customer; the Orders WS calls POST /v1/shipment endpoint with all Shipping WS nodes behind https://shippingws.example.com
Now then, how would service discovery and load balancing compliment each other here? Ideally, if even 1 Payment or Shipping node exists, their load balancer will serve traffic from the load balanced URLs. If no nodes exist behind a balancer, then what would happen (service discovery-wise, between Order WS instances and the Service Registry) with Order WS nodes attempting to handle order requests?

Greg Burghardt · Accepted Answer · 2022-08-05T23:17:16.773

There seems to be a misconception about both service discovery and load balancing. There is no "versus" comparison, because service discovery and load balancing are independent, but complimentary concepts.

Load balancing a service allows clients to be decoupled from the scalability of those other services. All clients have a single URL to interact with. Cloud environments have automated tools that can add and remove nodes behind a load balancer. This helps enable the scalability promised with micro services. Service discovery has nothing to do with this aspect. Literally, this is why you would put a load balancer in front of a cluster of nodes to scale a micro service.

If load balancing decouples clients from the scalability of a service, then service discovery decouples clients from knowing which URLs can be used to communicate with the other services. Think of service discovery as an index of all the micro services in your ecosystem. The meta data about each service should return the URL of the load balancer in front of a service.

Load balancers allow you to add and remove nodes for a single service without affecting clients. Service discovery allows you to add and remove entire services without deploying config changes to the entire ecosystem. Clients can discover the new endpoints on their own.

So, don't choose one over the other. Choose both. They work well together, because they serve different purposes.

Note that resiliency in micro services is not fully achieved at the load balancer level. Concepts like eventual consistency, and utilizing simple message queues helps make the entire system resilient. The system should be resilient from the perspective of entire services, not individual nodes. This is what Martin Fowler calls "dumb pipes (message brokers) and smart endpoints (the service reading from the message queue)." The simpler a component is, the less likely it is to fail.

Message queues should be pretty simple, so they do not fail often. The service is complex and more prone to failure. If the service goes down, then messages pile up in the queue. When the service comes back online, then it starts processing the backlog of messages. This is the heart of resiliency in micro services.

Service discovery and load balancers with high-frequency health checks

My question

Quasi-believable, pseudo-real world example to work with

1 Answers1