When is the right time to introduce high availability for web site?

Question

There are many articles on High Availability options. It’s not that obvious however WHEN is the right time to switch from single server to high availability configuration.

Please consider my situation:
http://www.postjobfree.com is 24/7 web site with significant traffic:
http://www.similarweb.com/website/postjobfree.com

Currently I run it on a single server: both IIS 7.0 web server and SQL Server 2008 run on the same hardware box.

There is occasional (~one per month) ~5 minutes downtime usually caused by reboot required by some Windows Server update. Usually downtime is scheduled and happens at night. Still it’s unpleasant, because Google Bot and some users are still active at night.

Current web site revenue is at ~$8K/month.

I consider switching to two-servers configuration (web farm of 2 web servers and cluster of 2 SQL Servers hosted on two hardware servers).

Pros:
1) High Availability (theoretically no downtime). Even if one of servers goes down – another server would take over.
2) No data loss: without SQL cluster, up to one day of data can be lost in case of hardware failure (we do daily backup).

Cons:
1) More effort to setup and maintain such configuration.
2) Higher hosting cost. Instead of ~$600/month it would be about $1200/month.

What would be your recommendation?

score 16 · Answer 1 · answered Jun 14 '11 at 04:25

Short answer: When down time or the risk of it costs you more than it would cost you to have high availability.

It is fundamentally an economic decision. As an example. $8k/month implies that an outage of 2 hours will cost you $22. If you can configure your system such that you can go from scratch to a fully functional site in 2 hours, then high availability would only gain you $22 of functionality above that.

Put another way, you can save money unless / until you have 54 hours of unpreventable down-time in a given month.

score 11 · Answer 2 · answered Jun 14 '11 at 05:01

Your stakeholders/business folk (which could be you!) have to decide

acceptable loss of revenue
consequences loss of reputation, respectability etc
acceptable data loss (Recovery point objective)
acceptable downtime (Recovery Time Objective)

Loss of revenue is easy to quantify: the rest can't be answered here sorry...

score 2 · Answer 3 · answered Jun 14 '11 at 05:56

I think most users can handle a bit of scheduled downtime. Consider that ebay has weekly updates on friday nights, and bids around then sometimes don't work. My (major australian) bank's online banking has scheduled outages for hours every week. Twitter goes offline all the time. Heroku / EC2 was down for days recently.

I'd keep it in that perspective, if you're really only talking 5 mins a month, you're doing quite a good job as a sysadmin.

score 1 · Answer 4 · answered Jun 14 '11 at 05:15

You've already mentioned Google as a factor in terms of indexing, but it may also be worth considering the impact that latency/site responsiveness may have on SEO. It's a black box and all that, so difficult to quantify - though for what it's worth, Matt Cutts reckons it's a one-percenter. I'd be more concerned about reputation, as others have stated.

score 1 · Answer 5 · answered Jun 14 '11 at 13:34

Keep in mind that HA, like security, isn't a product, but rather a process.

For example, database replication will only get you to the point where each mirror of the database will be able to continue on its own, but you will also need a strategy for resynchronization after failed components have been replaced.

Consider an ordering system as an example: the customer submits an order, and during processing, the physical system he was talking to fails after storing the order information in its local copy of the database. Impatient, the customer presses "submit" again, and is directed to another server, which accepts the order. If your databases resynchronize by simply replaying the missing INSERT statements on the other side, then the order will be duplicated, which may not be what you want.

As @Slartibartfast suggested, it all boils down to an economic decision, however I'd recommend that you also plan a few years in the future here. If you expect to need a proper HA setup then, then now would be a good time to set aside resources for the preparatory work.

Nath · Answer 6 · 2015-11-21T13:10:27.170

While you think about this I think you consider setting up a "fail whale" page.

There are plenty of ways to do this but the aws combo of route53 and s3 works well on my small sites.

I setup the domain with healthchecks so that on failures DNS sends users to users to a static html page sitting in s3; Costs next to nothing.

In my experience having your site say "sorry things are broken but we are working on it" makes a world of difference to users. A Twitter account where you can communicate with users even is even better.

This goes a long to mitigating the "loss of reputation" that can be the most significant impact of an outage.

see: https://aws.amazon.com/blogs/aws/create-a-backup-website-using-route-53-dns-failover-and-s3-website-hosting/ for a guide on setting it up.

DynDns' social failover http://dyn.com/managed-dns/social-failover/ is a simlar kind of thing.

You could roll your own and do your healthchecks and then script the DNS changes, provided your DNS records have a low TTL and you have some way of manipulating them programatically.

score 0 · Answer 7 · answered Jun 25 '11 at 17:06

0

Have you considered using something like EC2 that will let you scale flexibly and also negate your cons ? It is ultimately an economic decision if using EC2 is worth it or not, but it is at the least, an option to consider.

answered Jun 25 '11 at 17:06

manku

111

score -2 · Answer 8 · answered Jun 14 '11 at 14:10

-2

To avoid data loss, you should look into Raid configurations before clusters. You should also configure a Failover IP that you can switch from one server to another in case of a disaster without having to wait for DNS propagation.

answered Jun 14 '11 at 14:10

yqt

1

When is the right time to introduce high availability for web site?

8 Answers8