3

We are considering a solution that will store large files in MySQL with InnoDB. The size of the files can range from 250MB - 1GB (might grow up to 50GB in future). Here is what the plan is.

  1. Create two tables similar to FILES(id, name) and FILE_PARTS(id, file_id, sequence int, data LONGBLOB).
  2. Add a reference to each file in table FILES and 1GB chunks of the large file in the FILE_PARTS. This can theoretically let us store files of any size.
  3. Wherever and whenever we need these files, we can get the parts and join them together on the disk to get the original file.
  4. We can also limit the amount of data in the FILE_PARTS by deleting old data. We should be able to limit ourselves within 100GB-200GB or so.

I want to understand what kind of limitations we might run into and what strategies can help us in getting this to work. Should we do a few more things? Are we missing something here? Is there a way to keep the FILE_PARTS table in a file of its own while the rest of the database is in its own file? How would database backups react?


Note: We know that storing files in relational databases is generally not recommended, but we have got some constraints in our organization that we got to live with. If you know of reasons as to why it would absolutely not work (being 5x inefficient is okay with us), please certainly let me know so that I can convince my folks here.

Amit
  • 131
  • 1
  • 4

1 Answers1

2

(OK, assuming that files bigger than 4GB must be stored in a table...)

Going past 4GB requires chunking of some form, since that is the LONGBLOB limit.

There are many settings that will trip you up. The settings occur in MySQL server, mysqldump, MySQL replication, Apache, networking, various timeouts, etc, etc. A 1MB chunk might be safe for most situations; 1GB is very likely to hit some limit that will be hard to get around. max_allowed_packet, for example, has a hard limit of 1GB, but a practical limit much lower. Anyway, there is not much code or speed difference between 200 chunks of 1GB and 200K chunks of 1MB. In fact, smaller chunks might be faster.

When I implemented something similar a decade ago, I decided that a BLOB of 50KB was a good chunk size, and I compressed each chunk separately with client code. My setup involved replication, so I was careful to insert each chunk separately so as not to hog the replication stream.

Shoveling 50GB around will take a lot of time. Do not do everything in a single InnoDB transactions. Instead, store chunks optimistically in separate transactions and worry about trash only if you crash before the final store of the master record.

Most image formats are already compressed, so it is a waste to re-compress such.

Yes, your 2 tables are approximately 'correct'. No, I don't see a 5x problem. However, there might be 2x degradation over simply storing as a file.

Rick James
  • 80,479
  • 5
  • 52
  • 119