What is the starting point of a geo-distributed server farm?

Question

Scenario:

A web application that wants to implement geographically high availability (HA), has multiple web servers in different cities around the globe. Each server is connected to a database in the same city. These databases are configured in an active-active DB cluster, meaning each database will handle its own read and write requests, but in the background, they will sync their data. For simplicity let's say data is not sharded (partitioned) and all databases will store the same data.

Ideally a user should be connected to the nearest server. If a server goes down, users of that region should be connected to the next available server (again, the nearest). This means the latest data on a region that did not have the chance to be copied will be lost (but that's a DB architecture problem and let's keep it aside for now).

Now my question is what is the correct way to direct a user to the intended server?

I can think of two implementation:

A load balancer (proxy) will get the initial request and will redirect it to the intended server based on a policy (the most obvious one being the IP address)
But if I'm correct then the load balancer itself is a fixed server on some part of the globe, which means some users access it faster than others (kinda defeats the purpose at least for the initial request).
Use different DNS records for each server. So, a DNS request of our domain on each region will be different.
To my knowledge this depends on the DNS provider and the DNS software that is used, some may support it, some may not.

As a side question I would like to ask if using Kubernetes will facilitate such scenario?

score 3 · Accepted Answer · answered Jan 29 '24 at 18:00

307 redirect vs transparent proxy

A load balancer (proxy) will get the initial request and will redirect it to the intended server

Please understand that you're describing two different things in that sentence.

As far as client is concerned, a proxy is the server. It listens on ports {80, 443}, accepts GET requests, and sends back web documents. This allows low latencies observed by clients.

In contrast, a client might send www.example.com a GET, receive a 307 with Location: www-east.example.com, and then the client begins the process again, now that it has learned of a new webserver name. This costs WAN round-trip times.

bookmarks

For assets like JS or a JPEG image, no one really cares and the URL can be as ugly as you like, for example starting with "www-west", "www-east", or "www-eur". But the web page URL visible at top of browser window has a lot of branding power; its spelling matters a great deal.

It should typically start "www", without mentioning data center geography.

If you mess this up, people will bookmark ugly URLs, and they will post links on social media to them. This will force you to keep supporting ugly URLs, long after you have shut down a data center in some geographic region. Don't let ugly URL spellings leak out where users will use them.

DNS and geography

When a browser causes a DNS lookup to happen, the nameserver has an opportunity to tailor the A record we send back. Amazon AWS and other firms offer this kind of "enhanced" DNS service.

Here is how one high tech company currently does it:

www.apple.com.                                     CNAME  www.apple.com.edgekey.net.
www.apple.com.edgekey.net.                         CNAME  www.apple.com.edgekey.net.globalredir.akadns.net.
www.apple.com.edgekey.net.globalredir.akadns.net.  CNAME  e6858.dscx.akamaiedge.net.
e6858.dscx.akamaiedge.net.                         A      23.40.25.24

Note that they're contracting with Akamai. They prefer to pay someone who's good at tracking changing WAN latencies, rather than making substantial investments to develop that capability in-house.

Here is the approach taken by a different high tech company:

www.netflix.com.                                                           CNAME  www.dradis.netflix.com.
www.dradis.netflix.com.                                                    CNAME  www.us-west-2.internal.dradis.netflix.com.
www.us-west-2.internal.dradis.netflix.com.                                 CNAME  apiproxy-website-nlb-prod-1-bcf28d21f4bbcf2c.elb.us-west-2.amazonaws.com.
apiproxy-website-nlb-prod-1-bcf28d21f4bbcf2c.elb.us-west-2.amazonaws.com.  A      44.240.158.19
apiproxy-website-nlb-prod-1-bcf28d21f4bbcf2c.elb.us-west-2.amazonaws.com.  A      44.242.13.161
apiproxy-website-nlb-prod-1-bcf28d21f4bbcf2c.elb.us-west-2.amazonaws.com.  A      52.38.7.83

We see multiple A records (three of them), which lets clients auto-failover in this case to network 44 if some transient event impacts network 52 connectivity. You can choose to similarly expose multiple addresses in your A record lookup responses.

Imagine the two networks were geographically confined to different continents, or different coasts. Then a nameserver (perhaps run by AWS) could choose to send just a single A record, corresponding to what it believes is the closer datacenter. Notice that this plays nicely with the concern voiced above, since user will bookmark a "www" URL, without remembering IP details.

PoC

There's no substitute for experience.

Purchase client VMs in multiple geographic locations. Ssh into them, and use ping + curl to probe webservers from different locations. Write down your numeric performance observations.

Purchase test server VMs in multiple geographic locations, plus DNS services. Configure a "www" test page to be served by all webservers. Use your clients to measure its performance.

Based on this, bring up production webservers.

score 2 · Answer 2 · answered Jan 29 '24 at 10:13

To my knowledge this depends on the DNS provider and the DNS software that is used, some may support it, some may not.

Sure. So if this is an important feature for you, choose a provider / software which supports this.

Note there's nothing which stops you combining both your suggestions for double-redundancy - geo-aware DNS so the "happy path" hits a load balancer in your local geography, and that load balancer can then if necessary redirect the request to a different geography if the local application servers are having issues.

At some point you may decide this kind of undifferentiated heavy lifting is boring to write yourself, so you pay Cloudflare or any other number of CDN providers to do it for you.

What is the starting point of a geo-distributed server farm?

2 Answers2

307 redirect vs transparent proxy

bookmarks

DNS and geography

PoC