Large, high performance object or key/value store for HTTP serving on Linux

Question

I have a service that serves images to end users at a very high rate using plain HTTP. The images vary between 4 and 64kbytes, and there are 1.300.000.000 of them in total. The dataset is about 30TiB in size and changes (new objects, updates, deletes) make out less than 1% of the requests. The number of requests pr. second vary from 240 to 9000 and is dispersed pretty much all over, with few objects being especially "hot".

As of now, these images are files on a ext3 filesystem distributed read only across a large amount of mid range servers. This poses several problems:

Using a fileysystem is very inefficient since the metadata size is large, the inode/dentry cache is volatile on linux and some daemons tend to stat()/readdir() it's way through the directory structure, which in my case becomes very expensive.
Updating the dataset is very time consuming and requires remounting between set A and B.
The only reasonable handling is operating on the block device for backup, copying, etc.

What I would like is a deamon that:

speaks HTTP (get, put, delete and perhaps update)
stores data it in an efficient structure.
The index should remain in memory, and considering the amount of objects, the overhead must be small.
The software should be able to handle massive connections with slow (if any) time needed to ramp up.
Index should be read in memory at startup.
Statistics would be nice, but not mandatory.

I have experimented a bit with riak, redis, mongodb, kyoto and varnish with persistent storage, but I haven't had the chance to dig in really deep yet.

score 0 · Answer 1 · answered Sep 06 '12 at 15:31

There is no magic solution for your needs. A noSQL database is not really going to help; you need to make some basic decisions about your application architecture.

and some daemons tend to stat()/readdir() it's way through the directory structure

Moving the data into any sort of database is not going to help unless these daemons shouldn't be reading the data in the first place. Wouldn't it just be simpler to reconfigure these or switch them off?

Without knowing anything about your application (no, that's not an invitation for a detailled specification of requirements) then a hybrid approach is probably the way to go - with meta-data held in a database while the content itself is maintained on the filesystem (and there are some very specific reasons why a relational database may be a lot more appropriate than a NoSQL db). If it were me, I'd also be looking at distributing the storage rather than just replicating it.

The index should remain in memory

If you've got 1.3 billion records, each with, say 300 bytes of metadata, you'll need about 6Gb of memory. Most of which will never be acessed but which will prevent the memory from being available for content caching.

the inode/dentry cache is volatile on linux

Have you tried tuning it?

Large, high performance object or key/value store for HTTP serving on Linux

1 Answers1