What is a good format to store a table of data

Question

This is for one of my personal projects. I want to store a table of data in a BLOB/TEXT column in the database. I don't want to store the data in its own database table because the schema of this data will often change, and I don't really need to query it.

So I am searching for a data format that would be:

Suited for storing a table of variously typed data (dates, timestamps, integers, floats, strings, etc)
Open source and implemented in several programming languages
Preferably binary

Thank you very much

David Arno · Answer 1 · 2018-03-09T08:25:00.067

The defacto standard is surely CSV. Unless you have good reasons not to, then store it this way.

As the format of the data can vary, then the header row can be used to both describe the column's purpose (ie its name) and its format (ie its type), allowing the remaining rows to just be simple text.

CVS's are easy to parse and are well supported by all major languages and frameworks. So development and maintenance times are kept to a minimum.

If space is an issue, then you could go down the route of a custom binary format. But then you have to "roll your own", they are harder to parse, more error prone and are not human-readable. So the other option is to zip the contents. Again zip libraries are readily available for all major languages and frameworks, saving you time and simplifying maintenance once again.

score 1 · Answer 2 · edited Oct 07 '21 at 07:34

The baseline

JSON (http://json.org) is a format that is dynamic and schema-less. But it is text-based format, so may not be appropriate for you if you require binary.

Binary formats

BSON (http://bsonspec.org) may be interesting to you. It is a "simple" JSON translated to binary (not quite, but close enough). It is used in MongoDB to store data, so it is field-proven.

CBOR (http://cbor.io) is another binary JSON-like format. It's advantage (to me, at least) is that it is supported by an RFC (RFC 7049) and so has a defined standard which may prove useful.

BSON and CBOR are probably your best shots if you need binary JSON-like format.

score 0 · Answer 3 · answered Mar 09 '18 at 08:09

Document databases are designed for flexibility. If you have varying attributes with large amount of data document database is good option. Instead of storing data in different tables they are stored together in the same document.

Open source document database is for example MongoDB. MongoDB uses JSON called binary JSON that's is able to perform binary serialization.

score 0 · Answer 4 · answered Mar 10 '18 at 02:05

0

Use JSON / BSON with Mariadb / Toukdb engine, it'll compress the data for you very nicely. Unlike innodb compression, so if you tried compression before but dropped it because it sucked in inno - try this one...

answered Mar 10 '18 at 02:05

Slawek

2,892

score -1 · Answer 5 · answered Mar 09 '18 at 08:20

Assuming you don't wish to query the data, it would seem logical that the next best priority would be being able to restore the table as efficiently as possible. There are usually supported ways of exporting/importing tables from a database to a file on the disk.

My advice would be to stream the supported format of exported table data directly into a blob. You can then take that data and call upon that same database to "import" as if it were a normal import file from the disk.

The biggest advantage of this approach is that there is literally 0 hassle with typing. The database knows how best to save its own type information. No need to reinvent the wheel.

You should be made aware that this approach may cause problems if you update your database (not likely, but there's still that possibility). So before updating your database, please check that old imports are still accepted in the new version.

Usually these import files are flat files (columns are padded with spaces to guarantee that each column value is found in the same position for each line), and therefore they might occupy a lot more space then the table itself. You might consider zipping the contents first before inserting them, but I'll leave you to decide whether the efficiency/space tradeoff is worth it in your case.

What is a good format to store a table of data

5 Answers5