5

I have a database to hold data about projects.

Projects have lots of data associated with them that is segmented into other tables such as project_contacts and project_images. Those are one-to-many relations where 1 project can have many contacts and many images.

However, I need to be able to keep "versions" of projects. So, if a project is updated, the previous data about the project is still present. This is a large issue with many-to-many relationship. For example, there is a categories table, and a project_categories table - many projects can have many categories.

When a project updates, not every piece of data will be updated. For example, if a project gets a new contact, it won't always have a new image, so the images table won't be updated.

One solution I have considered is to have a versions table with an auto-incrementing ID; a project_id column and a time column. Each table will also get a version_id column. Whenever a project is updated, a new row is created in versions. Then, when the tables are updated, the version_id gets entered too. This means that the latest row about a project in project_images might be 3 whereas the latest version in project_contacts might be 27.

Because the version_id is incrementing, it's really simple to get the latest version of a single project, as you can join all the tables by the highest version_id.

Where this idea seems to fail for me is if I want to get the version of a project from a specific date.

Could I use my current solution to address that problem?

purpletree
  • 151
  • 1
  • 3

1 Answers1

1

One potential method is to create a date_time column and a version_id column on an indexing table, essentially what you have described. The difference that you should make though is a new data insert onto all project-related tables whenever version_id is incremented. Something like this psuedoSQL...

insert <current date_time>, (select MAX(version_id) from versions)+1 into versions
insert (select (* - version_id - <data that is being updated>) from project),updating_column=new_data, (select MAX(version_id) from versions) into <project related tables>

This is a rudimentary method of thinking about this, but this may lead you down the right track. The major downside with this concept is that you have a lot of data running around that is essentially redundant but is logically simple to implement.

The other, more storage-smart way to think about this relationship is to keep track of what data gets updated. Your versions table would now have a foreign-key column for all data that can potentially be updated as part of your projects as well as a version_id primary key. Instead of inserting version_id into your project tables, you would insert column_version_id.

The idea would conceptually work like this. When you are querying for all data at a point in time you will first hit the versions table. From there it will join all your project tables in based on column_version_id.

While this concept has better storage there are more moving parts to the logic and is also less flexible with changes in how your project data is managed.

JB-Learner
  • 322
  • 1
  • 4