I run PostgreSQL 9.3.5 on Windows 7, 64-bit.
My data arrives quarterly, in multiple tables (table1, ..., tableN) that are linked, intra-period, by cross-table constraints based on key identifiers. Among other columns, each table has identifiers that persist over time: pfi - persistent feature identifier and ufi - universal feature identifier.
pfi is unique per table (it's exceedingly rare that table1.pfi = table2.pfi.
ufi is unique across all tables and across all time. It's not a hash of the row data, but you could think of it as such.
Each period, in each table, some new pfiare brought into being and some old pfi are retired. Some pfi change attributes. ufi tracks any change to any attribute for a given pfi(row), so to fetch changed (and new) rows for table1 it's simply a matter of:
-- 1st query
select a.*
into vm201512.property_d
from vm201512.property a
where not exists (select 1 from vm201412.property where ufi = a.ufi);
This selects all rows which are either new (new pfi) or changed in at least one column.
About 96% of each table remains unchanged in every respect. Accordingly, in analysing the cross-period changes I build a table that only includes changed and new data. This reduces the table size from ~3.5m rows to ~225k rows: that's a BIG reduction if you subsequently do spatial comparisons with relatively-complex polygons and multiple (spatial and non-spatial) JOINs.
The property table has relatively few columns, so I can identify which elements of the data have changes as follows:
-- 2nd query
create table vm201512.property_d_changes as
select pfi,
case when a.view_pfi=b.view_pfi then 0::int else 1::INT end as view_pfi,
case when a.status=b.status then 0::int else 1::INT end as status,
case when a.property_type=b.property_type then 0::int else 1::INT end as property_type,
-- ... more columns
from vm201512.property_d a -- table created with first query
join vm201412.property b using (pfi);
This gives me a nice table where I can determine precisely what changes happened to a changed (not new) row. I can figure out that pfi 123456 had changes to its propnum and its status; I can figure out how many pfi had changes to their view_pfi - that sort of thing.
Several of the other tables have >50 columns, which makes the case statement unwieldy (I realise it only has to be coded once, but what if the data structure changes?)
Question
With two rows in 2 different tables new.table1, old.table1 where new.table1.pfi = old.table1.pfi and one or more columns different, is there a parsimonious, elegant PostgreSQL statement to figure out the changed columns? Or am I stuck with CASE?
I realise I could write a dynamic function to loop through all columns for a given table, and build the query with CASE statements.