26

The definition is a bit confusing - basically I'm asking if SQL is a subset of the NoSQl family:

kk

I'm asking this because "Not-only" means NoSQL is much larger, but still includes SQL as a part of it.

On the other hand, since we can't do typical sql operations like joins in a NoSQL database, then SQL is not part of nosql!

I'm wondering which holds true?

Vérace
  • 30,923
  • 9
  • 73
  • 85
ERJAN
  • 483
  • 5
  • 7

4 Answers4

49

TL;DR

The main distinction between database systems is not the language used to query the database, but rather the consistency model of the system. The Venn diagram should be two intersecting sets - SQL is not a proper subset of NoSQL, but rather its own data access language which may or may not be complemented by other techniques. SQL would be a proper subset of database query languages.


There is NoSQL, OldSQL and NewSQL.

NoSQL used to mean "Not SQL" but now is (more) likely to mean "Not only SQL" - as many of the providers of NoSQL systems are (trying to) bolt-on/graft SQL interfaces over their Key-Value (KV), Document or Graph stores.

OldSQL systems are essentially the main RDBMS providers' offerings (Oracle, SQL Server, Sybase, PostgreSQL, MySQL...). On page 13 of this presentation (.pdf), Michael Stonebraker purports to show that OldSQL only spends 4% of the time doing "useful" work - also to be found here:

enter image description here

His contention is that one should split OLTP work and OLAP work between different systems - OLTP should be done by shared nothing sharded architectures such as, suprise, surprise, his own system, VoltDB and that OLAP should be done by columnar store (also, see here) type architectures (such as Vertica, in which he also had a role). It is worth noting that Stonebraker, as well as being very successful in the commercial arena is also huge in academia and has won a Turing award - Computer Science's "Nobel prize"!

NewSQL is (IMHO) the most interesting of the systems to spring (figuratively speaking) like the Phoenix reborn from the ashes of OldSQL. Their USP is that they are HTAP systems (Hybid Transactional and Analytical Processing).

These are distributed systems which can support both OLAP and OLTP queries simultaneously due to the data being spread over several nodes - which can be on or in the same rack, data centre or continent and/or be globally distributed between and amongst cloud providers to increase resilience and redundancy - expect to add ~ a '0' to the cost for every fractional '9' you add to your uptime provision.

They use a consensus algorithm (usually Raft or Paxos) to coordinate nodes' data and sharding is transparent - even to the systems' DBA's. The three exemplars of such systems would be CockroachDB, TiDB and YugaByte.

It is interesting to note that, while having a certain level of success, these systems are not (yet?) household names. The "big boys" are fighting back with columnar store and KV offerings bolted/grafted on to their own systems. Of particular interest in this debate is that these systems themselves sit "on top" of a KV store (usually RocksDB - although CockroachDB is developing their own one in Go called Pebble). PostgreSQL also has documents (JSONB) and a KV store.

To answer the question, the real distinguishing feature between systems is not the interface one uses to query the data (which can and does range from direct C language programming (imperative) to SQL (declarative) - and flavours/adaptations thereof), but rather the transaction consistency model.

These models are either ACID (consistent) vs. BASE (available) and the key is where these systems fit with respect to the CAP theorem. Also, KV stores can support some or all ACID transaction characteristics.

OldSQL and NewSQL value consistency above all else ('CP' under the CAP system) - their argument is that, for example, a banking system with inconsistent results is a recipe for disaster (true!).

However the NoSQL aficionados would suggest that not all systems require cast-iron consistency. For example, ordering a book from Amazon (say) using a BASE database ('AP' in CAP terms) - might (incorrect stock level shown) be a few days late, or even cancelled - but the upside is faster queries and easier operation (no consensus to maintain).

This is the crux of the distinction between database systems - 'CP' or 'AP'! CP will always either give you the correct answer (majority of nodes still available) or no answer, whereas AP systems will normally respond (even if only one node is up) but your answer may not have taken other nodes' data updates into account when responding (i.e. network links between them down...).

I hope this is a satisfactory "first-pass" at an answer - it's a large topic and, for db nerds, absolutely fascinating. I would urge you to read around it (Wiki is a good start, but no substitute for primary sources). In particular, I would advise you to look in detail at the underlying architectures of CockroachDB and TiDB and see how they shard and move data from node to node while maintaining consistency.

There are comprehensive lists of various NoSQL systems here and here (best IMHO). There's also a popularity ratings site here (but avoid the comparisons - they just repeat the blurb from the systems' websites - often out-of-date).

Final word, there are "mixed-model" (or multi-model) databases (ArangoDB and OrientDB - both have F/LOSS and Enterprise versions - OrientDB now part of SAP), but there's a reason why they're not household names. ArangoDB doesn't support ANSI SQL, but rather its own flavour - AQL and neither does OrientDB support ISO SQL.

The best multi-model db that I've worked with is PostgreSQL which has KV and document stores as mentioned above. You can use SQL to query these and join with "ordinary" tables. Work is ongoing to add OpenCypher - an open graphical query language standard - to it (see here and here).

Final, final word: PostgreSQL also has a Columnar Store extension and a Time Series extension - both of which are very important (sub)-classes of database system. Both extensions are just that, extensions and not forks - you can use "normal" SQL (i.e. the full range of PostgreSQL's very standards compatible SQL) with these extensions.

So, we can see that while NoSQL is, in some cases, but not all (many systems designers are happy to remain as simple KV stores for example), adding SQL capabilities, SQL is fighting back, adopting and adapting to new and/or more sophisticated data storage and retrieval requirements - some of which I haven't even touched on (GIS systems...). So to say that NoSQL fully encompasses SQL would be an incomplete picture...

As pointed out in comments to the question itself, the SQL standard now tackles non-relational paradigms and this will only increase into the future! Also addressed by other answers - worth reading (1 & 2)!

Vérace
  • 30,923
  • 9
  • 73
  • 85
23

NoSQL was, is, and always will be a vague buzzword, not a well-defined term.

Its origin is related to a historical trend: at the beginning of this century, if you said "database", the assumption would be that you were talking about an SQL-based relational database, and "choosing a database" just meant choosing which implementation of that technology you were using. As part of the "Web 2.0" and "cloud" technology movements, people began to challenge this assumption, and work with completely different database systems - systems that, unusually for the time, featured "no SQL". Wikipedia suggests the first use of "NoSQL" to refer to these was 2009.

The term initially referred mostly to document-based and key-value databases, because that is what a lot of people were working on at the time; it was a useful catch-all in a rapidly evolving field. As more ideas were explored, the term became more confusing than useful - some of the databases it covers have nothing in common other than not being SQL-relational. Worse, some non-relational databases started adding SQL or SQL-like querying languages; and some relational databases started adding features that let you use them in hybrid non-relational ways.

Retro-fitting the "no" to mean "not only" is a cute way to make the term make sense again, but the better idea is probably just not to use it. Its only value is as a reminder that "database" does not have to mean "relational SQL-based database"; using other terms like "data store" can serve that same purpose without centring everything on the mostly-irrelevant question of how things relate to SQL.

IMSoP
  • 387
  • 1
  • 7
8

SQL is a language. NoSQL is an approach to choosing database systems. It makes no sense to compare them.

The origin of "Not Only SQL" lies in the fact that not too long ago, when you had some data to store, the only question was "Oracle or Microsoft?" Nobody ever thought about whether or not a SQL RDBMS would actually be the best solution for storing that data. Using something other than Oracle or SQL Server was simply unthinkable.

The idea behind "Not Only SQL" is exactly what it says: when you choose how to store your data, you consider not only SQL but also other possibilities. The idea is that you analyze your data and choose the storage solution that best fits the data. If your data is object-shaped, you use an object store, if your data is document-shaped, you use a document store, if your data is XML-shaped, you use an XML store, if your data is a graph, you use a graph database, if your data is tree-shaped, you use a tree database, if your data looks like a flat key-value structure, you use a key-value store, if you want to store time series data, you use a time series database, … and if your data is rectangular table-shaped relational data, then by all means use an RDBMS.

Even if your data is relational, SQL is still not the only choice. There are non-SQL relational databases, hyper-relational databases, post-SQL relational databases (e.g. Rel), there are things like NF², and so on. All of those are relational, but none of those are SQL.

There is a myth that NoSQL databases don't support consistency, or don't support transactions. That is wrong. First off, there is no such thing as a "NoSQL database". NoSQL is not a database technology, it is an approach to choosing a database. But even if we look at the databases that are typically called "NoSQL databases", we see that there are actually different ones with many different approaches towards consistency and transactions. E.g. Datomic offers SQL-like ACID guarantees and transactions. There are some that offer even higher guarantees than SQL does. And there are ones that offer relaxed guarantees compared to SQL.

Some offer flexible, configurable guarantees, that allow you to pick and choose between performance and guarantees along various dimensions. Sometimes even different choices for different data.

So, really the idea behind NoSQL is simply that you analyze your requirements and choose the solution that best fits your requirements. That's all there is to it.

And really, that's so far only talking about how data is stored. How the data is queried is really another question altogether. It is perfectly possible to query many "non-SQL" databases with SQL. The performance characteristics may be different, but the functionality won't be. Likewise, you can query "traditional SQL" databases with languages other than SQL. In fact, in many modern systems, there is actually not a lot of SQL. They may be using LINQ, for example, or an ORM. Ruby, Python, Perl, ECMAScript, many other programming languages have internal and external DSLs for database queries, e.g. ARel in Ruby.

Jörg W Mittag
  • 205
  • 1
  • 4
2

SCHEMA NOW OR SCHEMA LATER

NoSQL, imho, is a misnomer.

NoSQL is a schema-on-read data repository. The opposite is RDBMS, which is schema first. RDBMS puts the onus on the designer to come up with an efficient data structure and reduces the burden on the analyst, who has a defined schema and constraints to ensure data integrity. NoSQL, on the other hand, requires minimal design upfront. It puts the burden on the data analyst, who must deal with the data structure's flexibility by imposing their own structure and checks on the data.

SQL itself is a language. SQL can be used to query NoSQL repositories just as it is used to query RDBMS.

Metaphor
  • 906
  • 1
  • 11
  • 30