I'm working on the architecture for my upcoming Project Mangement app (as an example) and I'm seeking clarity on how best to design the MongoDB data layer, with specific regard to multi-tenancy. The app will have multiple 'sub-apps' (e.g. Calendar, Tasklist, Media, Team, etcetera) which would each map to a Collection in the database (either a centralized DB or its own Project DB).
DB Server == Replica Set.
The Questions
- Should I use one giant, centralized database to store all the application data, or create an individual database for each Project that is created on the system?
- If I choose the individual DB strategy, does that obviate the need to shard the data layer given that the DB's are 'naturally' dispersed across several servers, thus 'naturally' spreading the load across several servers? The application would contain logic that tells it which server to access the data for any given Project.
- Would using individual DB's for each Project give me better performance (given that to find any given document, Mongo would only have to search at most a few thousand docs in the individual Project DB vs. potentially millions in a giant, centralized DB)?
- Is it at all possible to reduce the 32M minimum footprint for a MongoDB database? I've read the documentation in the
--smallfilesmanual, but that didn't really answer my question. Is this a hard minimum? - If any given Project received a large amount of traffic, and became a 'noisy neighbour', would the solution just be to spin up a new DB Server and move that Project to the new server? or would it be a better approach to shard the DB Server that houses the noisy neighbour to increase performance on that server?
- What 'maintenance' concerns would I have with regard to cleaning up space for any given deleted Project, and/or 'shrink-wrapping' each DB to minimize it's footprint as close to the actual amount of data stored in any given Project database?
- What concerns should I be aware of with regard to future changes in the data 'schema' that would have to be 'rolled-out' across all the Project DB's? Given that Mongo is 'schema less', is it correct to assume that if I want to add a new 'field' to any given Collection that I would just do so in the app logic, without having to roll out any updates to the DB's themselves?
- What MongoDB 'tools' would I use to get information about the current 'status' of any given DB Server?
- Are there any limits to the number of DB's that can be housed on any given DB Server that I should be aware of?
- How does the individual DB strategy effect back-ups? Are there any concerns I should be aware of when backing up (to S3 for disaster recovery) many DB's across many DB Servers?
The App Stack
Ubuntu 12.04 LTS
Nginx
node.js
express.js
MongoDB
Current Working Strategy
My current working strategy is to use one database to store the higher-level, 'global' data like Users, Notifications, Messages, Usage, and Preferences. And then create a new database for each project created on the system.
This seems like the ideal approach for many reasons: security (each DB has its own creds), catastrophic recovery (since if one DB Server goes down the entire app doesn't go down), and performance (I think, since Mongo would have to search far fewer docs to find the one it's looking for).
The application would contain logic that automatically detects available space on any given DB Server and creates the new Project database on the next available DB Server.
According to this article provide by MongoHQ, this is the 'best' strategy, although it consumes a large amount of storage. Especially since each DB takes up 32M even when it's empty. Which gets very expensive using a service like MongoHQ if you're offering a 'Freemium' app that gets Techcrunch'd.
So in a scenario where ProjectManager has three projects on the system my data layer would look like so:
ProjectManager
Users
Notifications
Messages
Usage
Preferences
Project01
Calendar
Tasks
Media
Team
Project02
Calendar
Tasks
Media
Team
Project03
Calendar
Tasks
Media
Team
Each of the above ProjectXX DB's would be tiny. Each one storing about 2000-3000 documents each at most.
Thanks in advance for any insight.