3

I'm about to develop an application that will mainly store user-encrypted data. Each piece of data being encrypted thanks to an AES key, itself encrypted with the public RSA key of each granted user.

Given that the volume of non-encrypted data is negligible (probably only users login, some dates and foreign keys), I wonder if "standard DBMS" such as PostgreSQL or MySQL would be a good choice.

Indeed, these DBMS are optimized to handle different types of data (text, integers, dates, ...), indexes, process calculus, aggregations and so on.

In my case, the vast majority of the data I will store would be large blocs of text (or maybe binary data). And most of the required calculus will be processed on the client side, after decryption.

Therefore, I won't either be able to parse my data. So even if I had a huge database, it wouldn't be qualified as "Big data". However, I wonder if MongoDb, MemSQL or whatever wouldn't be a more accurate choice. And if not, what would be the best DBMS and the best way to use it for my purpose ?

In other words, I think each DBMS have to make sacrifices in order to be more efficient on most important features. I also think the case detailed above is not that casual. So I assume there is a lot of features I don't need (or I can't use, such as FULLTEXT search for instance). On the other hand, I may need features that are discarded by "standard DBMS".

Antoine Pinsard
  • 101
  • 1
  • 9

3 Answers3

7

As a rule of thumb: If your data is well structured, well known (in advance) and of a limited size per entry (no mega BLOBs), relational databases are really good at storing it. Even if you don't use the advanced indexing features.

Managing space, especially empty space in data files, is a very hard problem to solve. Relational databases have been dealing with this for over 20 years now - and it is worth making use of them just for that. In addition to this, relational databases get you the benefits of a long history of performance tuning. They run highly optimised native code so you don't have to struggle with poor Java implementations, bad network stacks, overuse of memcpy, garbage collection, coarse locking and all the other pathologies new products (especially the noSQL stuff) tends to have.

To store encrypted data, use the BINARY data types. MSSQL, MySQL and Postgres all support these types. You may not want to do operations directly on these types (though you CAN do binary manipulation if you want to). Also note that it is relatively easy to implement the encryption/decryption in the database itself as all of the above support crypto libraries. You will also benefit from indexing on the key/foreign columns so you can find your data fast. A relational database is an excellent key/value store for small value types - SQL Server will easily get you over 1-10M key/value lookups/sec even on a small box - I expect MySQL and PostGres would deliver results in the same ballpark.

It is also easy to find programmers who can query a relational database efficiently. SQL has been around for a very long time and it is an extraordinarily powerful language. MSSQL will even give you automated parallel execution. Some programmers wont "get" it - but if they don't, chances are they also won't grok parallelism or lambda expressions (two crucial skills of a modern coder).

On top of all of this goodness, you also get powerful backup and monitoring tools for all of the standard relational databases.

So basically, unless you have a REALLY good reason to use NoSQL - just use relational databases.

Thomas Kejser
  • 6,218
  • 2
  • 24
  • 46
1

There isn't enough information in the question to make an informed decision, but here are some basic points. If you want more detail, please explain more about how the data is going to be queried and how large is it expected to grow, and how large are encrypted segments supposed to be and so on.

So, in general - regardless of what else is decided, I would put the "control data" (users login, some dates, etc.) into an RDBMS of your choice. As far as the rest of the data is concerned, there are some considerations:

how is the encrypted info going to be retrieved? Are you going to search for binary strings? have some metadata that will help locate the right value? key value pairs?

If access is by key look-up, a Berkley DB or Mongo with some sort of a local cache (like memcache) would be more than adequate.
If a successful look-up is going to require more "thinking" a relational store may be needed to support the look-up logic.

Mordechai
  • 704
  • 6
  • 6
0

It is not advisable to use a DBMS that is pointing to a file (like Mongo does) for storing text, at least not for text of your length. Each time you would want to access a single string, means I/O, and we all know what that does to performance. So indeed, you should stick to MySQL or even SQL Server (if able to pay for it).

Mainly because each of them have a datatype specialized for this kind of data: TEXT. I heard many times that TEXT is suitable for storing just text, as an alternative to VARCHAR or even VARCHAR(MAX). This is WRONG. TEXT datatype is handled differently by the database engine, and of course optimised for situations like yours.

Further, if the rows in your table are not accessed entirely ie. the other columns are queried frequently BUT your TEXT one (not "SELECT *" kind of queries), you should consider converting the table in 1NF, and reference the huge TEXT columns only when needed.

BUT, if you decide to keep all the data in a single table, an index is mandatory. The last thing you want is a full-scan, that will fill your buffer-cache in no time.

ddaniel
  • 633
  • 1
  • 6
  • 10