30

I only heard about Robert Martin today, and it seems like he's a notable figure in the software world, so I don't mean for my title to appear as if it's a click bait or me putting words in his mouth, but this is simply how I interpreted what I heard from him with my limited experience and understanding.

I was watching a video today (on software architecture), on a talk by Robert C. Martin, and in the latter half of the video, the topic of databases was the main focus.

From my understanding of what he said, it seemed like he was saying that SSDs will reduce the usefulness of databases (considerably).

To explain how I came to this interpretation:

He discussed how with HDDs/spinning disks, retrieving data is slow. However, these days we use SSDs, he noted. He starts off with "RAM is coming" and then continues by mentioning RAM disks, but then says he can't call it RAM disk, so resorts to just saying RAM. So with RAM, we don't need the indexes, because every byte takes the same time to get. (this paragraph is paraphrased by me)

So, him suggesting RAM (as in computer memory) as a replacement for DBs (as that's what I interpreted his statement as) doesn't make sense because that's like saying all the records are in-memory processed in the lifetime of an application (unless you pull from a disk file on demand)

So, I resorted to thinking by RAM, he means SSD. So, in that case, he's saying SSDs reduce the usefulness of databases. He even says "If I was Oracle, I'd be scared. The very foundation of why I exist is evaporating."

From my little understanding of SSDs, unlike HDDs, which are O(n) seek time (I'd think), SSDs are near O(1), or almost random. So, his suggestion was interesting to me, because I've never thought about it like that. The first time I was introduced to databases a few years ago, when a professor was describing the benefits over regular filesystem, I concluded the primary role of a database is essentially being a very indexed filesystem (as well as optimizations, caching, concurrent access, etc), thus, if indexes aren't needed in SSD, this kind of does make databases less useful.

Regardless of that though, prefacing that I'm a newb, I find it hard to believe that they become less useful, as everyone still uses DBs as the primary point of their application, instead of pure filesystem, and felt as if he was oversimplifying the role of databases.

Note: I did watch till the end to make sure he didn't say something different.

For reference: 42:22 is when the whole database topic comes up, 43:52 is when he starts off with "Why do we even have databases"

This answer does say SSDs speed DBs up considerably. This question asks about how optimization is changed.

To TL;DR my question, does the advent of widespread SSD use in the server market (whether it's upcoming or has happened already) reduce the usefulness of databases?

It seemed like what the presenter was trying to convey was that with SSDs, one can store the data on disk, and not have to worry about how slow it would be to retrieve it as with older HDDs, as with SSDs, seek times are near O(1) (I think). So, in the event of that being true, that would hypothetically lose one of the advantages it had: indexing, because the advantage of having indexes for faster seek times is gone.

Honinbo Shusaku
  • 499
  • 1
  • 6
  • 8

4 Answers4

63

There are some things in a database that should be tweaked when you use SSDs. For instance, speaking for PostgreSQL you can adjust effective_io_concurrency, and random_page_cost. However, faster reads and faster random access isn't what a database does. It ensures

He's just wrong about indexes. If the whole table can be read into ram, an index is still useful. Don't believe me? Let's do a thought experiment,

  • Imagine you have a table with one indexed column.

      CREATE TABLE foobar ( id text PRIMARY KEY );
    
  • Imagine that there are 500 million rows in that table.

  • Imagine all 500 million rows are concatenated together into a file.

What's faster,

  1. grep 'keyword' file
  2. SELECT * FROM foobar WHERE id = 'keyword'

It's not just about where data is at, it's about how you order it and what operations you must do to find what you're looking for. PostgreSQL supports B-tree, Hash, GiST, SP-GiST, GIN and BRIN indexes (and Bloom through an extension). You'd be foolish to think that all of that math and functionality goes away because you have faster random access.

Evan Carroll
  • 65,432
  • 50
  • 254
  • 507
12

Based on your post, it appears the clear message is that RDBMS lookup time optimizations are being replaced with hardware which makes IO time negligible.

This is absolutely true. SSD on database servers combined with high (actual) RAM makes IO waiting significantly shorter. However, RDBMS indexing and caching is still of value because even systems with this huge IO boon can and will have IO bottlenecks from poorly performing queries caused by bad indexing. This is typically only found under high workload applications or poorly written applications.

The key value to RDBMS systems in general is data consistency, data availability, and data aggregation. Utilizing an excel spreadsheet, csv file, or other method of keeping a "data base" yields no guarantees.

SSD doesn't protect you from your primary server become unavailable for any reason (network, OS corruption, power loss). SSD doesn't protect you from a bad data modification. SSD doesn't make it faster to run analytics compared to "just having" them.

Josh Bonello
  • 600
  • 1
  • 4
  • 15
10

Uncle Bob probably was talking about in-memory databases such as Redis or Gemfire. In these databases, everything in the database really is contained in RAM. The database could start out empty and be filed with short-lived data (being used as a cache) or it start by loading everything in from disk and periodically checkpoint changes to disk.

This is becoming more and more popular because RAM is getting cheap, and it becomes feasible to have a terabyte of data stored in an in-memory clustered database. There are a lot of use cases where the speed from having instant access to things makes it valuable to put in RAM rather than even a fast disk like SSD. You can even continue using SQL for some of these if it makes sense.

Why should this worry Oracle? Data is growing and it's unlikely that RDBMSes will go away. However, a lot of Oracle's engineering time over the years has gone into ways to make data retrieval on spinning disks really fast. Oracle will need to adapt to a completely different storage tier. They are, with Oracle Database In Memory, but they're exposed to different competition than in the past. Think of how much time has gone into making sure the query optimizer chooses the right strategies based on the layout of things on disk....

Alan Shutko
  • 201
  • 1
  • 2
8

Community Wiki post collecting answers originally left as question comments


I would say just the opposite. Since read/write speeds are so fast, now you can get a GPU accelerated database (e.g. BlazingDB or Alenka) to crunch numbers even faster. Now you can have even more complex queries run faster. Now queries which people wouldn't even consider running can be run at a reasonable speed. The more complex, and the more data the better off you are - cybernard

While Bob Martin has been around for a long time and his opinions are generally worth listening to (if not agreeing with :-), in this case I think he's diving into the "The Death Of Relational Databases Is Upon Us" crowd (of which I'm an associate member :-). For some things under limited circumstances a somewhat convincing argument can be made that non-relational database technologies can provide an edge. That having been said, however, IMO the relational model, flawed in various and sundry ways as it may be, still provides the best general purpose database model available today. YMMV. - Bob Jarvis

The primary reason that we use databases isn't because disks are slow (indeed, originally, that was cited as a reason not to use databases), but rather because data is complicated. The primary purpose of a database is to enable multiple apps/users to be able to find the correct data and even to be able to simultaneously alter it in a controlled manner. Doing that quickly is only a secondary goal of databases. - RBarryYoung

RDBMS isn't going away anytime soon; they're the best choice for some types of application, and NoSQL (Mongo, etc.) is the best choice for others. Horses for courses. - sh1rts

Database helps organize data. It was not really designed for fast access of data in the first place anyways. - JI Xiang

Paul White
  • 94,921
  • 30
  • 437
  • 687