How does a company like Amazon avoid bottlenecks accessing the database layer?

Question

If you imagine a company like Amazon (or any other large e-commerce web application), that is operating an online store at massive scale and only has limited quantity of physical items in its warehouses, how can they optimize this such that there is no single bottleneck? Of course, they must have a number of databases with replication, and many servers that are handling the load independently. However, if multiple users are being served by separate servers and both try to add the same item to their cart, for which there is only one remaining, there must be some "source of truth" for the quantity left for that item. Wouldn't this mean that at the very least, all users accessing product info for a single item must be querying the same database in serial?

I would like to understand how you can operate a store that large using distributed computing and not create a huge bottleneck on a single DB containing inventory information.

score 27 · Accepted Answer · answered Dec 12 '16 at 10:50

However, if multiple users are being served by separate servers and both try to add the same item to their cart, for which there is only one remaining, there must be some "source of truth" for the quantity left for that item.

Not really. This is not a problem that requires a 100% perfect technical solution, because both error cases have a business solution that is not very expensive:

If you incorrectly tell a user an item is sold out, you lose a sale. If you sell millions of items every day and this happens maybe once or twice a day, it gets lost in the noise.
If you accept an order and while processing it find that you've run out of the item, you just tell the customer so and give them the choice of waiting until you can restock, or cancelling the order. You have one slightly annoyed customer. Again not a huge problem when 99.99% of orders work fine.

In fact, I recently experienced the second case myself, so it's not hypothetical: that is what happens and how Amazon handles it.

It's a concept that applies often when you have problem that is theoretically very hard to solve (be it in terms of performance, optimization, or whatever): you can often live with a solution that works really well for most cases and accept that it sometimes fails, as long as you can detect and handle the failures when they occur.

Ewan · Answer 2 · 2016-12-12T10:38:20.823

I have seen the 'Last Item In Stock' problem solved in the following way:

Update all the stock levels daily and flag products as high, low, on order or out of stock categories according to threshold levels.

Obviously its the 'low stock' items which are problematic

Items with high stock levels

Don't bother checking the stock level. Just place the order

Items with low stock levels

Warn the user when browsing 'Last few left!'. when they go to pay, check and decrement the stock level. If its out of stock, Update the item status.

This way you only hit the database for the 'low stock' items and you only do that when the customer is quite far down the process of buying. The cost is that some customers will not be able to complete their purchase.

However, In most cases 'out of stock' really just means you are waiting for another delivery, so you want to accept the order anyway and maybe just pop up a warning or restrict the delivery options. So those customers arent lost.

During high load times such as sales, you might even turn the stock checking off and just email customers later, 'sorry we ran out of X, would you like Y'

Essentially the aim of any ecommerce platform is never read from the database. Always serve cached pages and do everything client side.

Michael Durrant · Answer 3 · 2016-12-12T10:28:25.530

A combination of

hashing
sharding
replication
distribution
high fail-over
key-value stores

There's no magic, just more and more complex situations. Just like DNS, it is made to scale.

The 'single version of the truth' is part of such systems. Generating a new key becomes a more complex operation than just generating the next number in the sequence. For example other sequences exist. This is the sort of complexity that distributed database systems can handle and they do it by making several operation to and from components when making new objects, making them available to others, ensuring that sequences are unique when they need to be, composite keys, etc.

score 2 · Answer 4 · answered Dec 12 '16 at 20:41

In this video, Martin Fowler discusses NoSQL databases:

https://www.youtube.com/watch?v=qI_g07C_Q5I

One of the points (somewhere in there), is that places like Amazon would rather keep 99% of people happy by accepting their order without being able to check "for sure" whether it's actually available, and maybe irritate a very small percentage by having to say "sorry, looks like someone beat you to it."

Which is to say, there's no real handling for the scenario you describe, just that Amazon takes the benefit of the doubt based on the last successful inventory read, and if a concurrent transaction slipped in between - oopsie.

(btw, that's a great video if you're curious about NoSQL)

How does a company like Amazon avoid bottlenecks accessing the database layer?

4 Answers4

Linked