I am tasked with implementing auditing of data changes on our database. I am aware that there are commercial products for this, but I wanted to know which design would be better. For any given row we will want to know what values were changed, who changed it, and when the change took place.
We have Enterprise Level data (possibly millions of rows), and we are using MS SQL Server Web (ver.13).
A suggested design used histid as the unique primary key on every table; What would normally be the unique primary key would be repeated (once for each version of the row):
-- primary key is histid.
-- the current programAssignmentId is what I am typically searching for.
SELECT
[histid] -- column is primary key.
,[isCurrent]
,[lastupdate] -- column is indexed.
,[programAssignmentId] -- column is indexed.
,[programid] -- foreign key, column is indexed.
,[entityid] -- foreign key, column is indexed.
,[lastUpdatedById] -- foreign key, column is indexed.
,[effectiveDate]
,[orderno]
,[status]
,[deleted]
,[created]
FROM [dbo].[mem_programAssignments]
WHERE [programAssignmentId] = 'a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2'
ORDER BY [programAssignmentId], [lastupdate] DESC
Would yield the following results:
histid isCurrent lastupdate programAssignmentId programid
=================================== ========= ======================= ==================================== ====================================
7330075F-B076-113D-B548CDDD0BF5D4B8 1 2017-08-17 13:37:56.950 a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2 5f4d1469-44a0-49c7-856d-f35bf729e661
73291EC6-F168-E817-EB42EB46A5AFB08C 0 2017-08-17 13:37:11.670 a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2 5f4d1469-44a0-49c7-856d-f35bf729e661
56F73DE0-0137-4CBF-969AEDFE55229E3B 0 2017-04-24 18:49:11.000 a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2 5f4d1469-44a0-49c7-856d-f35bf729e661
A competing design suggested one or more history tables that would record the changes.
SELECT
[id]
,[reference_class_name]
,[reference_table]
,[reference_id]
,[column_name]
,[old_value]
,[new_value]
,[created_at]
,[user_id]
,[user_name]
,[message]
FROM [dbo].[history]
Would output something like:
id reference_class_name reference_table reference_id column_name old_value new_value
== ========================= ====================== ===================================== =========== ========== ===========
24 Models\ProgramAssignement mem_programAssignments b40b2d80-99d3-11e8-a0ad-29cac47b46c7 startDate 2017-01-01 2018-07-11
25 Models\ProgramAssignement mem_programAssignments b40b2d80-99d3-11e8-a0ad-29cac47b46c7 deductible 15000000 30000000
The reference_id in this case would be referencing the id value of the primary key of the table listed in the reference_table column (mem_programAssignments in this case).
The advantage to the second design is it would separate the historical values from the current values, but the obvious drawback would be that the history table(s) could get extremely large.
Is one design better than the other -- or I should say, what are the advantages, disadvantages, or risks with these designs? Are there other design patterns I should consider?