I want a trivial example of where MongoDB can scale but a relational database will have trouble

Question

I'm just learning to use MongoDB, and when discussing with other programmers would like a quick example of why NoSQL can be a good choice compared to a traditional RDBMS - however the scenarios I come up with and can find online seem pretty contrived.

E.g. a blog with lots of traffic could be represented relationally, but will require some performance tuning and joins across tables (assuming full denormalization is being used). Whereas MongoDB would allow direct retrieval from one collection to the same effect.

But the response I'm getting from other programmers is "why not just keep it relational and then add some trivial caching later?"

Does anybody have a less contrived example where MongoDB will really shine and a relational db will fall over much quicker? The smaller the project/system the better, because it leaves less room for disagreement.

Something along the lines of the complexity of the blog example would be really useful.

Thanks.

Philipp · Accepted Answer · 2013-10-28T18:05:14.373

First, it scales well.

When a MongoDB database is too frequented or too big for a single server, you can easily add more servers by creating a cluster or replica-set of multiple shards. It scales almost linearly. This doesn't work nearly as well with most relational databases. Take a look at MySQL's list of limitations when working as a cluster, for example. Most of the entries in the list are no problem for MongoDB (or don't apply).

Second, it allows heterogeneous data.

Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.

score 3 · Answer 2 · answered Oct 28 '13 at 19:15

Some real world example of a problem I would have no idea how to solve in a reasonable way with SQL and an relational database alone (my fault maybe).

So we have a (common relational) database with about 30,000 products. Nothing big so far. Each of these products has many attributes. There are the common ones like group (cables, antennas, iphone cases... about 80), assortment (somehow similar to groups: car, hifi, mp3, only 15), brand (30).

Then comes the technical data. Each item has many of those like color, cable length, weight, volume. about 200 such value types and thousands of values.

And the most complicated: Many of those products belong either to some car type (or several of them) or some kind of mobile device. Those come in hierarchies in the form like: brand (apple) model (ipad) type (1,2,3,4) and in some cases generation. (for cars it's similar, though instead of generation we have build years)

Problem step One:

We want the amount of products for each of those attributes. How many are red? How many are in the cable group? And so on.

This could partially be solved with SQL. It would be a lot of queries and rather ugly but I think possible. Maybe slow but we could do it even more ugly and keep counters in each table and update at every change. Especially difficult with those attributes where a product can have multiple (like works with iPhone and 12 other mobile phones)

But here comes problem step Two:

When a customer selects one attribute (say he want only see products that are red) we want to update all those counters in real time. This means we would have either extremely complicated queries (not likely fast enough anyway) or keep counters for possible combinations of attributes (billions).

When I started on this project they had given the counter option a try and done this for a very small subset of attributes (group, assortment, brand). The code was ugly, buggy and slow. In addition they now had a table with counters that was far larger then the table of products.

Using Apache Solr's facets was actually the solution. Flatten the tables into a list of Documents (one per product) allowed to get all this data in real time with far simpler queries.

score 2 · Answer 3 · answered Oct 28 '13 at 20:47

You can think of anytime you think a EAV table is the best way to do things (notoriously slow in realtional datbases and hard to query), you might need a nosql database. This is especially true when you have no way of knowing in advance what the fields would be. An example would be storing the details of medical tests. Each new test might have entirely different data that you would need to store. And while you could (in theory) model existing tests (with a lot of time and effort as there are thousnads of them), how would you know what new tests you might get results from for tests (and maybe medical equipment) we haven't even invented yet.

score 0 · Answer 4 · answered Oct 28 '13 at 17:05

The smaller the project/system the better, because it leaves less room for disagreement.

This is hard because NoSQL is only better in large environments. I take it that you mean a Simple Example, and I have a perfect one for you.

Suppose you are making a Travel website and you need to have you users travel from and of the 5,170 US airports destined for any of the other of the (same) 5,170 US airports...

But here is the Kicker, not all flights are direct, you need to tell the user all the stopover options as well, sometimes 2 or 3 stop overs. You also need to tell the user all the options over a 5 hour window! And you need to compute this in under 10 seconds while the user is waiting.

This is the Relational DB Nightmare... In comes NoSql, flight routes are usually set in stone a few weeks in advance, so you can calculate all the Gazillions of possible routs in advance store than in a simple NoSql DB cluster...

NoSql is the clear winner is such a scenario.

I want a trivial example of where MongoDB can scale but a relational database will have trouble

4 Answers4