Auditing data changes in SQL Server

Question

I am tasked with implementing auditing of data changes on our database. I am aware that there are commercial products for this, but I wanted to know which design would be better. For any given row we will want to know what values were changed, who changed it, and when the change took place.

We have Enterprise Level data (possibly millions of rows), and we are using MS SQL Server Web (ver.13).

A suggested design used histid as the unique primary key on every table; What would normally be the unique primary key would be repeated (once for each version of the row):

-- primary key is histid.
-- the current programAssignmentId is what I am  typically searching for.

SELECT 
    [histid] -- column is primary key.
    ,[isCurrent]
    ,[lastupdate] -- column is indexed.
    ,[programAssignmentId] -- column is indexed.
    ,[programid] -- foreign key, column is indexed.
    ,[entityid]  -- foreign key, column is indexed.
    ,[lastUpdatedById] -- foreign key, column is indexed.
    ,[effectiveDate]
    ,[orderno]
    ,[status]
    ,[deleted]
    ,[created]
FROM [dbo].[mem_programAssignments]
WHERE [programAssignmentId] = 'a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2'
ORDER BY [programAssignmentId], [lastupdate] DESC

Would yield the following results:

histid                                isCurrent   lastupdate                programAssignmentId                    programid                           
===================================   =========   =======================   ====================================   ====================================
7330075F-B076-113D-B548CDDD0BF5D4B8   1           2017-08-17 13:37:56.950   a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2   5f4d1469-44a0-49c7-856d-f35bf729e661
73291EC6-F168-E817-EB42EB46A5AFB08C   0           2017-08-17 13:37:11.670   a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2   5f4d1469-44a0-49c7-856d-f35bf729e661
56F73DE0-0137-4CBF-969AEDFE55229E3B   0           2017-04-24 18:49:11.000   a43c2a3e-d3a7-4f40-aeb0-cc0552c62da2   5f4d1469-44a0-49c7-856d-f35bf729e661

A competing design suggested one or more history tables that would record the changes.

SELECT 
   [id]
    ,[reference_class_name]
    ,[reference_table]
    ,[reference_id]
    ,[column_name]
    ,[old_value]
    ,[new_value]
    ,[created_at]
    ,[user_id]
    ,[user_name]
    ,[message]
FROM [dbo].[history]

Would output something like:

id   reference_class_name         reference_table          reference_id                            column_name   old_value    new_value  
==   =========================    ======================   =====================================   ===========   ==========   ===========
24   Models\ProgramAssignement    mem_programAssignments   b40b2d80-99d3-11e8-a0ad-29cac47b46c7    startDate     2017-01-01   2018-07-11 
25   Models\ProgramAssignement    mem_programAssignments   b40b2d80-99d3-11e8-a0ad-29cac47b46c7    deductible    15000000     30000000

The reference_id in this case would be referencing the id value of the primary key of the table listed in the reference_table column (mem_programAssignments in this case).

The advantage to the second design is it would separate the historical values from the current values, but the obvious drawback would be that the history table(s) could get extremely large.

Is one design better than the other -- or I should say, what are the advantages, disadvantages, or risks with these designs? Are there other design patterns I should consider?

David Spillett · Accepted Answer · 2018-09-19T14:45:11.580

For a relatively immutable tracking of history, System Versioned Temporal Tables may be the way to go as you are using a recent enough edition of SQL Server (similar support is available in some other databases). They have some gotchas that mean they are not suitable for all circumstances but if they fit your model they'll make life easier by implementing chunks of it for you "for free" and allowing you to use some useful time-travelling syntax such as SELECT <stuff> FROM <table> AS OF <date_time> and BETWEEN <date_time> AND <date_time> instead of more manually querying the history.

For an audit you need extra though, as well as what happened when you need to track who made the change. One option would be to add a user identifier column to each table being tracked, which would be included in the history too. If you are tracking users that all have their own SQL Server login then you can populate that by trigger, reading the value from SYSTEM_USER/SUSER_NAME(), though in the more common case of tracking users of an application that always logs into SQL Server using the same application account you will need to implement the population that column in your application logic.

Is one design better than the other

Your second suggestion is essentially an extended property bag (where each property is actually property+time) which makes me wary of it. It is potentially flexible, but has its own significant issues which, for instance, make the model inefficient to report from. Lookup "Entity Attribute Value model" (the more technical term that "property bag" is usually a synonym for), on DBA.SE and elsewhere, for detailed discussions as to the pros and cons of this. It is commonly considered to be an anti-pattern.

isCurrent

If you do roll your own history structure instead of using the built-in support (because the built-ins are unsuitable for some reason, or you need to implement something more cross-platform compatible, for instance) then I would recommend against keeping all the data in your core tables and instead keeping current data on its own and a separate history. This will make querying current data, likely the most common thing you application will do, less error prone. If you keep all versions¹ in your history structure you may still have an isCurrent flag in those structures.

^{1 SQL Server's temporal tables don't keep the latest version of a row in the history table (unless it has been deleted, depending on if you conceptually consider the last existing version to be the latest or not) only copying the previous version in as it is modified. Some history/audit implementations do hold the latest version of each row as well as the past ones so that the history table contains everything.}

Biju jose · Answer 2 · 2018-09-19T06:14:44.017

An easy approach is to store all the data which is in the deleted and inserted pseudo tables in trigger since its easier to query and even easier to load back to your table. It's basically a snapshot of before and and after your data change. A typical schema would be something like this.

CREATE TABLE customer (
                            customerid       INT IDENTITY(1, 1) 
                                             PRIMARY KEY,
                            customername     VARCHAR(1000),
                            customeraddress  VARCHAR(1000),
                            phone            INT,
                            email            VARCHAR(255),
                            inserteduserid   INT,
                            inserteddatetime DATETIME,
                            updateduserid    INT,
                            updateddate      DATETIME
                      );

CREATE TABLE customer_audit (
                                  customer_audit_ID    BIGINT IDENTITY(1, 1) 
                                  PRIMARY KEY,
                                  auditdate            DATETIME,
                                  ins_customerid       INT,
                                  ins_customername     VARCHAR(1000),
                                  ins_customeraddress  VARCHAR(1000),
                                  ins_phone            INT,
                                  ins_email            VARCHAR(255),
                                  ins_inserteduserid   INT,
                                  ins_inserteddatetime DATETIME,
                                  ins_updateduserid    INT,
                                  ins_updateddate      DATETIME,
                                  del_customerid       INT,
                                  del_customername     VARCHAR(1000),
                                  del_customeraddress  VARCHAR(1000),
                                  del_phone            INT,
                                  del_email            VARCHAR(255),
                                  del_inserteduserid   INT,
                                  del_inserteddatetime DATETIME,
                                  del_updateduserid    INT,
                                  del_updateddate      DATETIME
                            );
GO

CREATE TRIGGER customer_audit
ON customer
AFTER INSERT, UPDATE, DELETE
AS
INSERT INTO customer_audit
SELECT                GETDATE(),
                      i.customerid,
                      i.customername,
                      i.customeraddress,
                      i.phone,
                      i.email,
                      i.inserteduserid,
                      i.inserteddatetime,
                      i.updateduserid,
                      i.updateddate,
                      d.customerid,
                      d.customername,
                      d.customeraddress,
                      d.phone,
                      d.email,
                      d.inserteduserid,
                      d.inserteddatetime,
                      d.updateduserid,
                      d.updateddate
FROM                  inserted i
      FULL OUTER JOIN deleted  d
            ON i.customerid = d.deleted;

After this you can easily any rows inserted,deleted and updated. The only caveat is if there is a change in the real table for datatype or anything that should be also done on the audit table also. Also this table can grow really faster and from a performance view point may not a good. But does the job. Answering few questions which makes this design approach is easier.

1.Find the customer phone number changed on current day

select * from customer_audit
where ins_customerid is not null and del_customerid is not null
and ins_phone<>del_phone
and CONVERT(DATE,auditdate)=CONVERT(DATE,getdate())

2.Previous phone number of particular customer

;WITH CTE 
AS(
select *,row_number() 
OVER(parition by ins_customerid order by auditdate desc) from customer_audit
where ins_customerid is not null and del_customerid is not null
and ins_phone<>del_phone
)
select del_phone FROM cte 
where rnk=1

score 1 · Answer 3 · answered Sep 19 '18 at 07:39

Capturing the "who" can be more difficult than the "what" as far as data changes go.

For the "what", implementing temporal tables (https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-2017) to provide the point-in-time state of data is a simple way of implementing this logic in your database without reinventing the wheel. Basically, it records a valid from and valid to date for every record. When a record is changed, these values are adjusted accordingly a new "current" copy of the row is created. This allows you to view a rows' entire history to any point-in-time.

For the "who", you probably need to look at SQL Server Audit (https://learn.microsoft.com/en-us/sql/relational-databases/security/auditing/sql-server-audit-database-engine?view=sql-server-2017) which can provide details of the user, application etc making DML changes in your database. Triggers will work but tend to have performance implications, especially on any write heavy database.

Datetime values in both temporal tables and audit files would help you correlate them, although they may not match identically, so on a busy system, it may be difficult to correlate them exactly.

These options should have less impact on performance than triggers and large history tables and would make future support simpler as they're built on system functionality designed for the purpose.

Both temporal tables and audit are supported in SQL Server Web 2016 (V13).

score 1 · Answer 4 · answered Sep 19 '18 at 17:01

Thanks to the generous help of MDCCL, here is the conclusion I reached. The programAssignment table represents the relationships between the program table and the organization table. The programAssignmentHistory represents a snapshot or "version control" of the programAssignment. Although it is closely related, it is not the same thing and therefore it should be a separate entity in the entity-relationship-diagram (ERD). My ERD would have a similar structure to the StackOverflow posts here and here.

The ERD should then be the reference or starting point of how you lay out the database.

The solution I am adopting is to have a separate history table for every table that we need to be able to audit. So the Users table will have a UserHistory table and the ProgramsTable will have a ProgramHistory table and so on.

Auditing data changes in SQL Server

4 Answers4