Database suggestion for a social network/ knowledge base community?

Question

I am looking into various database types and DBMS's for a new project I am wanting to start in the summer.

I have built systems in MySQL and postgreSQL , now I am wanting to expand my knowledge and experience in Databases.

My project will be a type of social network / aggregate knowledge thing. (still havent developed a term to describe it yet).

I have been looking at:

Cassandra (use its own type of query language); It seems to be good for feature rich content and delivering high performance query execution. However I am not too keen on it because it requires a java environment to work on and I would prefer to have nothing to do with Oracle.
MongoDB (noSQL type of DBMS) ; great scalability however you lose all the capabilities already available on the proven SQL language like business information queries.

Requirements of the system:

Data Text, dates , times, xml, small ints, blob,
Structure/behavioir: normalised 3NF, non realtime, relational , scalable, robust
Environment: unix/linux , no JAVA!, preferably run on C

I was wondering if you could point me to any other Database systems that I should research into.

I have also had a look at Object Relational Databases , I quite like the idea of them working with PHP objects (PDO's) however their performance seems a bit poor.

Seeing as there will be DBA's here, any feedback on these systems that you have operated would be appreciated.

Thanks

score 6 · Answer 1 · answered Feb 29 '12 at 18:10

6

Consider also that there is no reason why you can't use a relational database for some things and the nosql database for other things.

answered Feb 29 '12 at 18:10

HLGEM

3,153
18
18

score 4 · Accepted Answer · answered Feb 29 '12 at 05:18

Your abstract requirements scream "PostgreSQL" to me. However, I think it's worth staying abreast of what the bourgeoisie are up to, so here's a list of various stuff you might want to check into.

Free stuff

CouchDB - one of the first NoSQL databases, powerful map/reduce querying system, highly distributed and fault tolerant. One of the better NoSQL contenders.
Hyperdex - very new, distributed hash table with search capabilities.
Riak - distributed hash table worthy of some respect.

Weird free stuff

Metakit - more of an embedded database like SQLite but not SQL-based, so more procedural.
FramerD - much like a classic "network" database, very pointer-centric. Perhaps dead?
Magma - Smalltalk OODBMS. Cool but not well documented.

Non-free stuff

AllegroGraph - RDF (graph) database, supports SPARQL. Lisp-flavored.
Caché - a hybrid relational/OO database, originally based on MUMPS (IIRC).
Objectivity - One of the last few really big OODBs. Very powerful, impressive and expensive.
VoltDB - Highly scalable mostly relational database. Supports "most" SQL. Very new. I guess they have a community version too.

Conclusion

I have not used any of these things extensively. I have played with most of them a little bit and always wound up back with PostgreSQL. Looking at your requirements, the only one PostgreSQL doesn't meet out of the box is scalability. On the other hand, for my purposes it's much easier to throw $4000 of hardware at a single dedicated database machine than to throw $4000 of cloud nodes or low-end machines at this problem. And there are ways of achieving scalability with PostgreSQL, such as with EnterpriseDB.

It's great fun to play around with these things on the side, but when it comes time to put valuable, irreproducible production data into something, a bunch of boring attributes like reliability, stability and long-term viability wind up coming to the fore.

Thought experiment for you

Consider this. Imagine you're Mark Zuckerberg, and you have to choose either to give up your codebase or your data. You can keep all your development staff, but you either have to give up all your code—every line, say even all the developers memories of how they implemented everything is gone—but you get to keep all your user accounts and all your users uploaded data and all that, or you can give up all the data. Keep all the structures and servers and configuration, the setup, but lose every row in every table in every database.

It should be obvious that it would be worse to lose the data. Why would all your users regenerate all that data? Think of all the marketing data lost, which is how Facebook actually makes their money. And there are tons of entrepreneurs salivating at the opportunity to get people to use their Facebook clone—now all those disenfranchised ex-Facebook users would be out there considering alternatives. On the other hand, if they lost the codebase, they could rebuild it, probably even better than it is now, but they could have something online in very short order. Heck—they could probably buy someone else's Facebook clone codebase and load it up with the real data, but you can't just copy their data. If Facebook still has everyone's important data on their servers, the incentive to leave is much lower. Still bad, but much less so. Surprisingly less so.

The irony is that it is much easier to lose all your data in a freak accident than to lose all your code. For most internet companies, though, the data is the company, it is your most valuable asset. And this is a strong reason to consider using a traditional, time-tested, old-fashioned, unsexy relational database.

score 0 · Answer 3 · answered Mar 01 '12 at 09:29

Speaking of nosql, i just have 1 thing to add about the Facebook reference:

If you plan to scale very big, I suggest you get a DB engine sysadmin friendly versus developper friendly.

Exit developer friendly and super fast MongoDB which can not scale geographically disperse, and has no way to backup efficiently and easily. Though here we use MongoDB, it seems Riak or CouchDB look better in the specs for sysadmins (I have no experience with Riak or CouchDB)