48

I've been reading/watching a lot of Robert C. Martin content. I've come across him saying SQL is unnecessary because of solid state drives. When I search other sources to back this up I get a bunch of random articles describing the difference of SQL performance between hard drives and solid state drives (which is related but not what I'm trying to research).

Ultimately, I do not understand what he's trying to get at. Is he saying replace SQL with No-SQL technologies? Is he saying store data in files in a file system? Or does he just want people to stop using SQL/Relational Databases because of SQLi attacks? I fear I'm missing the point he's trying to make.

I will provide some links here so you can read straight from his mind:

  1. Bobby Tables
  2. Clean Architecture Lecture

First, he states that SQL should be removed from the system entirely.

The solution. The only solution. Is to eliminate SQL from the system entirely. If there is no SQL engine, then there can be no SQLi attacks.

And although he talks about replacing SQL with an API, I do NOT think he means putting SQL behind an API because of that previous quote and what he says earlier in the article.

Frameworks don’t handle the issue;...

Side note: In saying SQL, I'm pretty sure Robert means most relational databases. Maybe not all but most. In any case, most people are using SQL anyways. so...

If SQL is not being used to persists data, then what are we supposed to use?

Before answering that, I should also note. Robert emphasizes that solid state drives should change the tools that we use to persist data. Søren D. Ptæus's answer points this out.

I must also respond to the, "but data integrity" group. Upon some further research, Robert says we should use transactional databases like datomic. Then CRUD turns into CR (create and read) and SQL transactions go away altogether. Data integrity is of course important.

I can't find a question that encompasses all of this. I guess I'm looking for alternatives that match Robert's guidelines. Datomic is one but is that it? What other options match these guidelines? And do they actually work better with solid state drives?

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307

11 Answers11

75

Bob Martin is clearly exaggerating to make his point more clear. But what is his point?

Does he just want people to stop using SQL/Relational Databases because of SQLi attacks?

To my understanding, in that blog post (your first link) Martin tries to convince people to stop using SQL, but not relational databases. These are two different things.

SQL is an extremely powerful language, and it is standardized to some degree. It allows to create complex queries and commands in a very compact manner, in a readable, understandable, easy to learn fashion. It does not depend on another programming language, so it is usable for most application programmers, no matter if they prefer Java, C, C++, C#, Python, Ruby, JavaScript, Basic, Go, Perl, PHP, or something else.

However, this power comes for a cost: writing safe SQL queries/commands is harder than writing unsafe ones. A safe API should make it easy to create safe queries "by default". Potentially unsafe ones should need more mental or at least more typing effort. That is IMHO why Martin is ranting against SQL in its current form.

The problem is not new, and there are safer APIs than standard SQL to access a relational database. For example, all OR mappers I know are trying to provide such an API (though they are typically designed for other primary goals). Static SQL variants make it hard to create any dynamic queries with unsanitized input data (and that is not a new invention: Embedded SQL, which uses often static SQL, is around 30 years old).

Unfortunately, I am not aware of any API which is as flexible, as standardized, mature, language-independent and also as powerful as SQL, especially dynamic SQL. That's why I have some doubts about Martin's suggestion of "not using SQL" as a realistic way of solving the mentioned problems. So read his article as a thought into the right direction, not as a "best practice" you can blindly follow from tomorrow on.

Doc Brown
  • 218,378
57

Bob Martin's opinion is just that; one man's opinion.

A programmer is expected to understand the system he is writing well enough to exercise reasonable care about its security and performance. That means that, if you're talking to a SQL database, you do what the Bobby Tables website says to do: you sanitize your input data. It means that you put your SQL database on a machine that promises adequate performance. There are very well-known and well-understood ways to do these things, and while they don't guarantee absolute security or ideal performance, neither does anything else.

The assertion that we don't need SQL anymore because we now have SSD's is just specious. SQL wasn't invented because high-speed hard drives didn't exist yet; it was invented because we needed an industry-standard way to express data-retrieval concepts. Relational database systems have many other qualities besides speed and security that make them ideal for business operations; in particular, ACID. Data integrity is at least as important as speed or security, and if you don't have it, then what's the point of securing bad data or retrieving it as quickly as possible?

Before you take one man's hysteria as gospel, I suggest you learn about application and system security and performance on their own terms, not by reading random Internet articles. There's much more to security, performance and robust system design than simply "avoid this technology."

We don't ban kitchen knives because a few hapless individuals manage to accidentally cut their fingers with them.

Robert Harvey
  • 200,592
17

What is he actually saying?

Is he saying replace SQL with No-SQL technologies?

TL;DR: Yes (sort of)

In a more recent talk than the one you linked to on basically the same topic he says: "The database is a detail. Why do we have databases?".

He claims database came to be to make data access from spinning disks easier, but in the future "[...] there will not be disks" thanks to new technology and what he calls "persistent RAM" and that it will be easier to store data using the structures programmers use, such as hashtables or trees.

He goes on to predict that relational databases on a whole will largely disappear due to their new competition:

If I were Oracle, I would be pretty scared because the reason for my existence is evaporating from underneath me.[...] The reason for the database to exist is disappearing.

There will probably be some relational tables that survive, but now there is some healthy competition.

So no, for him it's not only about SQL injection, although he opines SQL is inherently flawed in this regard.


Author's note:

The statements in this post are only quotes to understand Robert C. Martin's view on this topic and do not represent the author's opinion. For a more differentiated point of view, see Robert Harvey's answer.

11

SQL is a detail. Knowledge of a detail should not spread.

As SQL is used in more and more places in your code your code becomes more and more dependent on it.

As you learn more and more SQL tricks you solve more and more problems using SQL. This means that switching to another API to persist involves more than just translating. You have to solve problems you didn't realize you had.

You run into this even switching between Databases. One offers fancy whizzbang feature 5 so you use it in a number of places only to find out fancy whizzbang feature 5 is proprietary and now you have a licensing issue that's going to cost a lot of money. So you do a lot of work digging up everywhere you used feature 5 and solving the problem on your own only to find out later you're also using whizzbang feature 3.

One of the things that makes Java so portable is that certain features of the CPU just aren't available. If they were available I'd use them. And suddenly there are CPU's that my Java code won't work on because they don't have those features. It's the same with database features.

It's all to easy to sacrifice your independence without realizing it. SQL is a choice not a given. If you make the decision to use SQL then make it in one place. Make it in a way that can be unmade.

The fact that SQL has security issues and that we're moving to persistent memory models doesn't mean SQL is doomed. It just drives home the point that it's a choice. If you want to preserve the right to make that choice you have to do the work.


It may be worth noting that the database movement of the 80's and Uncle Bob have a rather nasty history. He had all his problems solved with a flat file system when management forced a database admin into his life. This event pushed him into his stellar consulting career. (He tells this story in one of his early clean books, forget which) He knows how to solve problems without DB's and has little patience for those that act like using them is a given.

He also tells a story about putting off adding a DB to an application until the last minute when a customer demanded it, and added it in a day as an optional feature. My guess is he see's the way most of us use DB's as an addiction. He's trying to show us how to kick the habit.

candied_orange
  • 119,268
5

The quote from your first quote is (emphasis mine),

The solution. The only solution. Is to eliminate SQL from the system entirely. If there is no SQL engine, then there can be no SQLi attacks.

What would replace SQL? An API of course! And NOT an API that uses a textual language. Instead, an API that uses an appropriate set of data structures and function calls to access the necessary data.

The rant is against letting application programmers use SQL.

The suggested fix is to let them use an API instead: which isn't SQL and doesn't allow injection.

IMO, examples of such APIs might include:

  • http://bobby-tables.com/csharp suggests C# programmers can use the ADO.NET API.

    That's not a perfect example because ADO.NET is a wide or deep (i.e. powerful or general-purpose) API, which also allows its users to input raw (or raw-ish) SQL.

  • Some SQL developers or database administrators suggest that a database should be configured such that it only permits access via (a limited number of expertly written) stored procedures, and that application developers shouldn't be allowed to write their own (dangerous) SQL queries

  • Another way to "eliminate SQL from the system" is to put the database (which exposes SQL) on some other system, accessed via a REST API or similar.

So, IMO, the overall solution or system[s] can still use a database (especially given that a database engine implements useful ACID properties, and scales well and so on, it may be foolish to try to do without one, or to write an application-specific one).

The rant's requirements are satisfied if the database's SQL API is hidden from the application developers, behind some other API (e.g. ADO, perhaps an ORM, a Web service, or whatever).

More generally I suppose it means having an application-specific DAL (a "data access layer" or "database abstraction layer"). A DAL insulates the application from details of how and where the data is stored and/or fetched. The DAL may or may not be implemented using an SQL database.

ChrisW
  • 3,427
3

Everyone seems to be answering this question from a security standpoint, or with an SQL lens.

I saw a Robert Martin lecture where he recounts that as programmers, we use many different data structures that are optimal for our specific programs. So, rather than universally storing all data in a tabular structure, we should store our data in hash tables, trees, etc so we can grab the data and jump right in to the program.

I interpreted his message as only saying we should throw out our current assumptions about persistent storage for a moment to consider other future possibilities than the age-old SQL tabular format. SSD is a candidate solution, but not the only one.

Keenan
  • 39
  • 1
2

I want to address only a short statement:

Or does he just want people to stop using SQL/Relational Databases because of SQLi attacks?

No. That is a wrong assumption. We can not say we must stop using cars, because they are responsible of people dying in car accidents. In the same way, SQL/relational databases (or anything else in this context, such as RDBMS) are not responsible for malicious SQL payload an attacker can perform on your web application. I am sure the author did not mean that, because there is a whole SQL injection prevention cheat sheet for this purpose.

2

Actually, he is not to use databases and SQL - fairly explicitly. The first reference is a well know issue, the second reference comes off sounding like a rant. Although, I am interpreting it as to have a good reason to use databases and not to use SQL. From my perspective this is not even reasonable advise.

Unfortunately, the example he is using is a well know example with a well known solution that he then points out. It usually comes about when a programmer doesn't realize what he is doing. For example constructing strings containing SQL like:

    my $sql="select a from b where a=$ui_val;";
    prepare($sql)
    execute($sql)

as opposed to

    my $sql="select a from b where a=?;";
    prepare($sql)
    execute($sql,$ui_val);

This is a DBI perl like example for the ruby on rails code. The rails code that he provides is easy to confuse between the safe and the unsafe. Like many ORMs hides what the SQL is underneath and so often you are dealing with an interface that constructs and executes the sql for you. Doesn't this sound almost like what an API would be doing for you?

My interpretation of the first reference is that he is suggesting that we should replace a well known issue that has a well known solution.

It is also unfortunate that he doesn't mention that if this is done correctly it will make code easier to write and more readable, though if it is done well, it may actually be harder to write and less readable. Additionally, he doesn't mention is that SQL is really very easy to read and does what you would generally expect it to do.

He is partially correct, ultimately we will have an infinitely large and fast memory and an infinitely fast processor. Until we slip out from the current physics that constrains computing, he is not correct.

Yes spinning disk are a thing of the past, and we now use SSDs. Disks work with about ~10 milliseconds per data transfer, SSDs work with ~0.5 milliseconds (500 microsec) data access time. RAM is on the order of 100 nano seconds, processors operate on the oder of 100s of pico seconds. This is the heart of why we need databases. Databases manage the data transfer between either spinning disks or SSDs with main memory. The advent of SSDs haven't eliminated the need for databases.

Robert Baron
  • 1,132
2

Answer

does he just want people to stop using SQL/Relational Databases because of SQLi attacks?

The 'Bobby Tables' article seems to suggest that this, it and of itself is a reason to not use SQL:

The solution. The only solution. Is to eliminate SQL from the system entirely. If there is no SQL engine, then there can be no SQLi attacks.

He might have other reasons that he discusses elsewhere. I wouldn't know because I don't really read much of his stuff.

Digression

This part isn't really an answer, but I think the question of the value of SQL is far more interesting (as do others, apparently.)

I've had a lot of experience using SQL and I think I have a fair understanding of its strengths and weaknesses. My personal feeling is that it's been overused and abused, but that the idea that we should never use it is kind of silly. The idea that we must choose 'SQL always' or 'SQL never' is a false dichotomy.

As far as SQL injection being an argument for not using SQL, that's laughable. This is a well understood problem with a pretty simple solution. The problem with this argument is that SQLi isn't the only vulnerability that exists. If you think that using JSON APIs makes you safe, you are in for a big surprise.

I think every developer should watch this video titled "Friday the 13th: Attacking JSON - Alvaro Muñoz & Oleksandr Mirosh - AppSecUSA 2017"

If you don't have the time or inclination to watch through it, here's the gist: A lot of JSON deserialization libraries have remote code execution vulnerabilities. If you are using XML, you have even more to worry about. Banning SQL from your architecture will not make your system secure.

JimmyJames supports Canada
  • 30,578
  • 3
  • 59
  • 108
2

Martin's problem appears to be with programmers building dynamic SQL directly from user input, something like (forgive me, I'm primarily a C and C++ programmer):

sprintf( query, "select foo from bar where %s;", user_input );

which is absolutely a recipe for heartburn (hence the Bobby Tables strip). Any programmer that puts code like that in a production system deserves a paddlin'.

You can mitigate (if not entirely eliminate) the problem by using prepared statements and properly sanitizing your inputs. If you can hide the SQL behind an API such that programmers aren't directly building query strings, so much the better (which is part of what Martin advocates).

But as to getting rid of SQL entirely, I don't think that's practical or desirable. Relational models are useful, that's why they exist in the first place, and SQL is probably the best interface for working with relational models.

As always, it's a matter of using the right tool for the job. If your shopping cart app doesn't need a full-up relational model, then don't use a relational model (meaning you won't need to use SQL). For the times you do need a relational model, then you're almost certainly going to be working with SQL.

John Bode
  • 11,004
  • 1
  • 33
  • 44
1

The two sources you link convey different messages:

The blog post says that data access logic should not exist as text at runtime, lest it be mixed with untrusted user input. That is, the blog post condemns writing queries by concatenating strings.

The lecture is different. The first difference is in tone: The lecture speculates and calls into question, but does not condemn. He doesn't say that databases are evil, but challenges us to imagine persistence without a database. He argues that in the 30 years since relational databases became widespread many things have changed, and highlights two which might conceivably affect our choice of persistence technology:

  • storage space has increased by a factor of about 1 million - at comparable prices! Consequently, it is less necessary to conserve storage, and in particular, it is no longer necessary to delete. By using append-only storage, concurrency control can be simplified because read locks are unnecessary. This can obviate the need for transactions.
  • access times have fallen, because most datasets now fit in RAM, massively reducing read latency.

Do these changed circumstances change the optimal persistence technology? Interestingly, Uncle Bob doesn't say - presumably because he feels no answer would be correct for all programs. That's why he cautions us to treat our choice of persistence technology as a detail rather than enshrine it into stone tablets and pass it on as received wisdom to our peers.

Do alternatives exist?

Writing data access logic without strings is entirely possible. In Java land, you might use QueryDSL, where queries are described using a type-safe fluent API generated from your database schema. Such a query might look as follows:

JPAQuery<?> query = new JPAQuery<Void>(entityManager);
Customer bob = query.select(customer)
  .from(customer)
  .where(customer.firstName.eq("Bob"))
  .fetchOne();

As you can see, the query logic is not expressed as a String, clearly separating the trusted structure of the query from the untrusted parameters (and of course, QueryDSL never includes the parameters into the query text, but uses prepared statements to separate the query for its parameters at the JDBC level). To achieve SQL injection with QueryDSL, you'd have to write your own parser to parse a string and translate it into a syntax tree, and even if you did that, you probably wouldn't add support for nasty things like select ... into file. In short, QueryDSL makes SQL injection neigh impossible, and also improves programmer productivity and increases refactoring safety. Prevented the biggest risk to web application security, that has existed long enough to spawn running gags, and boosted developer productivity? I dare say that if you still write queries as strings, you're doing it wrong.

As for alternatives to relational data bases, it is curious to know that postgres' multi version concurrency control is exactly that kind of append-only data structure Uncle Bob is talking about, though he was probably thinking more of event stores, and the event sourcing pattern in general, which also fits nicely with the notion of keeping current state in RAM.

meriton
  • 4,338