6

I have around 200 million new objects coming in, and a 90 day retention policy, so that leaves me with 18 billion records to be stored in the form of key-value pairs.

Key and value both will be a string. It is basically a mapping between a unique identifier for the object in the application to the unique identifier for the object in the actual object storage.

There is an application which loads objects into a Web OS. For each object it loads, it creates a 16 character string key, say DataID. The Web OS itself creates a 40 character string key, say ObjectID. So what I'm trying to do is create a mapping between DataID -> ObjectID for 18 billion objects. I'm don't know the mechanism being used to create the IDs.

I will have to deal with:

write(key,value)
read(key)
delete(key,value)

I am looking for ideas for an optimal way to implement this. It should be optimized for reads & writes. Space optimization is secondary.

I know Hadoop/NoSQL is one way to go, and probably another solution would be distributed Hash tables, but a few more options would help me decide which is the best solution. A relational database is not an option as we don't have an existing RDBMS in the current environment.

Chaos
  • 187

2 Answers2

6

Try redis. Its all in memory and dumps data so it can be hot on reset. However you might need to be careful and change the settings if you need to not lose data as it normally waits a second or two before dumping (or did i remember the default settings wrong?).

Use a hash where GUID/6 or 7 bits is the key and the remaining is a field http://redis.io/commands/hmset. Note having more field names make it slower so stick to <=128 as my personal rule of thumb. I recommend having 64 or 32bit but test with the keylength.

The reason I say use a hash is to decrease memory usage. More fields = less pointers (and an increase in CPU time)

Seki
  • 241
  • 1
  • 13
5

Look at these key-value stores: Berkeley DB Java Edition, or JDBM (JDBM3 is the latest), or MapDB (JDBM successor). Tokyo Cabinet is not native Java but has a Java wrapper.

For an overview see http://en.wikipedia.org/wiki/Dbm.