I've been looking at the wikipedia page for NoSQL and it lists several variations on the Key/Value store database, but I can't find any details on what it means by Key/Value store in this context. Could someone explain or link an explanation to me? Also, when would I use such a database?
5 Answers
Are you familiar with the concept of a Key/Value Pair? Presuming you're familiar with Java or C# this is in the language as a map/hash/datatable/KeyValuePair (the last is in the case of C#)
The way it works is demonstrated in this little sample chart:
Color Red
Age 18
Size Large
Name Smith
Title The Brown Dog
Where you have a key (left) and a value (right) ... notice it can be a string, int, or the like. Most KVP objects allow you to store any object on the right, because it's just a value.
Since you'll always have a unique key for a particular object that you want to return, you can just query the database for that unique key and get the results back from whichever node has the object (this is why it's good for distributed systems, since there's other things involved like polling for the first n nodes to return a value that match other nodes returns).
Now my example above is very simple, so here's a slightly better version of the KVP
user1923_color Red
user1923_age 18
user3371_color Blue
user4344_color Brackish
user1923_height 6' 0"
user3371_age 34
So as you can see the simple key generation is to put "user" the userunique number, an underscore and the object. Again, this is a simple variation, but I think we begin to understand that so long as we can define the part on the left and have it be consistently formatted, that we can pull out the value.
Notice that there's no restriction on the key value (ok, there can be some limitations, such as text-only) or on the value property (there may be a size restriction) but so far I've not had really complex systems. Let's try and go a little further:
app_setting_width 450
user1923_color Red
user1923_age 18
user3371_color Blue
user4344_color Brackish
user1923_height 6' 0"
user3371_age 34
error_msg_457 There is no file %1 here
error_message_1 There is no user with %1 name
1923_name Jim
user1923_name Jim Smith
user1923_lname Smith
Application_Installed true
log_errors 1
install_path C:\Windows\System32\Restricted
ServerName localhost
test test
test1 test
test123 Brackish
devonly
wonderwoman
value key
You get the idea... all those would be stored in one massive "table" on the distributed nodes (there's math behind it all) and you would just ask the distributed system for the value you need by name.
At the very least, that's my understanding of how it all works. I may have a few things wrong, but that's the basics.
obligatory wikipedia link http://en.wikipedia.org/wiki/Associative_array
- 6,376
- 4
- 43
- 67
In SQL terms, a NoSQL database is a single table with two columns: one being the (Primary) Key, and the other being the Value. And that's it, that's all the NoSQL magic.
You would use NoSQL for one main reason: scalability.
If your application needs to handle millions of queries per second, the only way to achieve it is to add more servers. That is very cheap and easy with NoSQL. In contrast, scaling a traditional SQL database is much more complicated.
Only the biggest websites out there are actually taking advantage of the full NoSQL potential, i.e., Facebook, having thousands of servers running Cassandra.
I strongly recommend to read this blog post, comparing SQL, NoSQL and ORM:
I assume you have basic understanding of NoSQL movement and non-relational databases models.
Key Value store is one of the non-relation database model, like graph, document oriented database models.
Key Value stores and the NoSQL movement
In general, SQL managed to deal with specially structured data and allowed highly dynamic queries according to the needs of the department in question.
While there are still no real competitors for SQL in this specific field, the use-case in everyday web applications is a different one. You will not find a highly dynamic range of queries full of outer and inner joins, unions and complex calculations over large tables. You will usually find a very object oriented way of thinking. Especially with adoption of such patterns as MVC, the data in the back-end is usually not being modelled for a database, but for logical integrity which also helps people to be able to cope with understanding huge software-infrastructures. What is being done to put these object-oriented models into relational databases is a large amount of normalization that leads to complex hierarchies of tables and completely steers against the main idea behind object oriented programming. Servers that adhere to the SQL standard also have to implement a large portion of code that is of no use to simple data storage what so ever and only inflates the memory footprint, security risks and has performance hits as a result.
The fact that SQL allows for arbitrary dynamic queries for complex sets of data is being rendered useless by using an SQL Database only for persistent storage of object oriented data, which is what basically most applications do these days.
This is where Key Value stores come into play.
Key value stores allow the application developer to store schema-less data. This data is usually consisting of a string which represents the key and the actual data which is considered to be the value in the "key - value" relationship. The data itself is usually some kind of primitive of the programming language (a string, an integer, an array) or an object that is being marshalled by the programming languages bindings to the key value store. This replaces the need for fixed data model and makes the requirement for properly formatted data less strict.
They all allow storage of arbitrary data which is being indexed using a single key to allow retrieval. The biggest difference for the "simpler" stores is the way you can (or cannot) authenticate or access different stores (if possible). While the speed advantages in storing and retrieving data might be a reason to consider it over common SQL Databases, another big advantage that emerges when using key-value stores is that the resulting code tends to look clean and simple when compared to embedded SQL strings in your programming language. This is something that people tend to fight with object-relational mapping frameworks such as Hibernate or Active Record. Having an object relational mappers basically seems to emulate a key value store by adding a lot of really complex code between an SQL database and an object-oriented programming language.A whole community of people come together under the "NoSQL" tag and discuss these advantages and also disadvantages of using alternatives to re- lational database management systems. read more
This is a bit old article, but I found very useful.
when would I use such a database? Could someone explain or link an explanation to me?
Its more of architectural decision, and a debatable one... You have to consider lots of factors like scalability, performance etc...
View below slides/articles and you'll get an idea, when, why and why not use key value store :)
- 4,682
- 3
- 33
- 35
Others have explained this, but I'm going to take a stab anyway.
A key/value database stores data by a primary key. This lets us uniquely identify a record in a bucket. Since all values are unique, lookups are incredibly fast: it's always a simple disk seek.
The value is just any kind of value. The way the data is stored is opaque to the database itself. When you store data in a key/value store, the database doesn't know or care if it's XML, JSON, text, or an image. In effect, what we're doing in a key/value store is moving the responsibility for understanding how data is stored out of the database in to the applications that retrieve our data. Since you only have a single range of keys to worry about per bucket, it's very easy to spread the keys across many servers and use distributed programming techniques to make it possible for this data to be accessed quickly (every server stores a range of data).
A drawback of this approach to data is that searching is a very difficult task. You need to either read every record in your bucket o' data or else you need to build secondary indexes yourself.
There are a few reasons you might want to use a key/value database:
- When write performance is your highest priority. Mozilla Test Pilot uses a key/value database to rapidly record data.
- When reads are guaranteed to only occur by PK.
- When you are working with a flat data model.
- When you are working with a rich, complex data model that can't be modeled in an RDBMS.
There are about as many reasons to use a key/value database as there are to using an RDBMS and there are just as many arguments to justify one over the other. It's important to take a look at how you're querying your data and understand how that data access pattern guides how you're going to be inserting and storing data.
Just remember that a key/value database is just one type of NoSQL database.
If you have a relational database, then you can easily experiment with this:
create table keyvalue (my_key varchar2(255), my_value varchar2(255));
create unique index ix_keyvalue on keyvalue (my_key, my_value);
This is how all databases used to be, with Berkeley DBM being a good example, from 1979. Since then, things have advanced (you can have many values per key in any RDBMS). For many applications a key-value store is sufficient (e.g. this is how sendmail stores its aliases). But if you find yourself pre-processing the value in your own code (or concatenating strings to make your "key"), perhaps splitting the value on a delimiter or parsing it, before you can use it, you will probably be better off with an RDBMS and actually storing it that way.
- 11,238
- 3
- 32
- 64