I'm designing a data model for an application which must keep track about the changes of data.
In a first step, my application must support PostgreSQL, but I'd like to add support for other RDBMS (especially Oracle and MS SQL server) in a second step. Therefore, I'd like to choose a portable data model with less usage of proprietary features. (The DDL for the tables may be different from RDBMS vendor to RDBMS vendor. But the SQL queries / statements in the application should be the same for all supported vendors, as far as possible.)
For example, let's say there is a users and a users_versions table. users_versions has a foreign key on users.
An example of the tables could look like:
users
----------------
id | username
----------------
1 | johndoe
2 | sally
users_versions --> references id of user (userid)
---------------------------------------------------------------------------
id | userid | name | street | place | validfrom | validuntil
---------------------------------------------------------------------------
1 | 1 | John Doe | 2nd Fake St | Faketown | 2018-01-04 | 2018-01-05
2 | 1 | John Doe | Real St 23 | Faketown | 2018-01-05 | null
3 | 2 | Sally Wu | Main St 1 | Lake Fake | 2018-04-02 | 2018-04-20
4 | 2 | Sally Wu | Other St 99 | Chicago | 2018-04-20 | null
Most SQL queries will query for the entries currently valid. In the concept example above this woule look like
SELECT *
FROM users_versions uv
INNER JOIN users u ON u.id = uv.userid
WHERE uv.userid = 123 AND uv.validuntil IS NULL;
Some use cases (reporting etc.) will require SELECTing a historic version of data, as well (e.g. what data were valid at 2017-12-31?). But these won't be performance critical in my application.
In the example above, I might create a filtered unique index on validuntil to ensure that there is only 1 entry with unlimited validity at a time:
CREATE UNIQUE INDEX foo
ON users_versions ( userid )
WHERE validuntil IS NULL;
As far as I know, filtered indexes can only be used for query optimization in PostgreSQL and MS SQL but not in Oracle. Moreover, indexing null might be a tricky thing, as well (possible / only in multi-column indexes / not-possible).
Therefore, a different approach for users_versions might be the structure above plus an explicit valid column managed by the application. The most recent entry would get a 1, all historic entries would get an 0. Then I could create two indices, one for query optimization and one for integrity enforcement (only 1 valid entry at a time):
CREATE INDEX optimization
ON users_versions ( userid, valid );
For queries like:
SELECT *
FROM users_versions uv
INNER JOIN users u ON u.id = uv.userid
WHERE uv.userid = 123 AND uv.valid = 1;
And one more index to enforce the current version integrity (e.g. ORACLE version):
-- ORACLE: Entry with null-only columns ignored in indexing:
CREATE UNIQUE INDEX only_one_valid_version_per_user
ON users_versions (
CASE WHEN valid = 1 THEN userid ELSE null END,
CASE WHEN valid = 1 THEN valid ELSE null END
);
Probably this index cannot be used for query optimization, but it should ensure that there can only be 1 valid entry per userid but an unlimited amount of invalid entries (valid=0) for the same userid.
What's your suggestion for a portable design of such history tables which allows performance in usage?
validfrom+validuntil, withvaliduntil(nullable) set tonullin the currently valid entryvalidfrom+validuntil, withvaliduntil(not nullable) set to far future date like2999-12-31in the currently valid entryvalidfrom+validuntil+validflag, withvalidflag managed by the application and used in queries for the currently valid entry- ...?
When INSERTing new versions, my application will always perform two steps:
- Invalidate current version (set
validuntilto current date (plus, optionally, setvalidflag to0)) - Insert new version (
validfromcurrent date, plus, optionally, withvalidflag1)
I don't have the requirement that the database enforces overlap-free time intervals for historic entries. I only must make sure that there is only 1 entry with unlimited validity.
For some very large tables, it might be worth splitting into current and history table: One table only contains the currently valid versions (users_versions_current), one other contains all the historic versions (users_versions_history). Whenever a new versions is inserted, the previous version is inserted with validfrom/validuntil into the ..._history table.
What aspects should I consider? Do you know literature, best practice recommendations etc.?