This question is about possible approaches for time stamping every column of every row in a table.
I have a table with a small number of rows (fewer than 200 and not likely to grow much larger). Each row refers to an individual entity and each column corresponds to piece of data about that entity. The columns will be updated whenever new information becomes available, and I need to record when that happens.
For example, think in terms of a contacts table:
CREATE TABLE `contacts` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100),
`address` VARCHAR(100),
`phone_number` VARCHAR(100),
`email_address` VARCHAR(100),
`updated` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
If the phone number is updated this month, we might be inclined to believe that the `email_address` was updated as well, even though it may be severely outdated.
I can think of three approaches:
1. Create an additional timestamp column for each column:
CREATE TABLE `contacts` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100),
`name_updated` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`address` VARCHAR(100),
`address_updated` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`phone_number` VARCHAR(100),
`phone_number_updated` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`email_address` VARCHAR(100),
`email_address_updated` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
2. Or, I could created a secondary table that contains the same information:
CREATE TABLE `contacts_updated` (
`id` INT,
`name` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`address` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`phone_number` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`email_address` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
CONSTRAINT `fk_contacts_updated_contacts_id`
FOREIGN KEY (id)
REFERENCES contacts (id));
In which case, is there an easy way to duplicate a table and automatically change the column types to timestamps? And use triggers to update the secondary table?
3. Or, I could create an Entity-Attribute-Value table:
CREATE TABLE `contacts_updated` (
`id` INT,
`column_name` VARCHAR(15),
`updated` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
...but this seems like it would be more difficult to query the information.
Are there other options that would be better? And I also assume that the answer will be different for a small dataset vs. a large one, so I'd love to know your opinion on both cases.