1

The problem

How do I efficiently and idiomatically compare the content of two tables, each in a different DB?

I have 3 DBs (the classic dev, stg and prod). Each DB has a few tables which should be exactly the same regardless of environment, like countries.

I would like to write a script that would compare these tables and alert me when there's a mismatch.

Characteristics

  • There are less than 10 tables to compare
  • The tables are not very large - at most ~100K of data
  • The script is executed from a Jenkins CI machine

Current solution

Currently, I am using pg_dump and comparing the files. I wonder if there is a more idiomatic approach.

Adam Matan
  • 12,079
  • 30
  • 82
  • 96

1 Answers1

2

For such a small amount of data getting the whole lot and performing a text comparison as you are is probably the way I would go too.

For larger data and/or data on remote sites that could be far too cumbersome so you might instead want to compare a checksum of the data in each table with something like:

SELECT ENCODE(
              DIGEST(
                     STRING_AGG(
                                COALESCE(<col1>,'NULL')
                               || COALESCE(<col2>,'NULL')
                               || COALESCE(<col3>,'NULL')
                               ,','
                     ),'sha512'
              ),'base64'
       ) AS table_checksum
FROM <table> ORDER BY <pk>

That way you are reading all the data but only transmitting the resulting checksum.

Also if your databases are on the same machine (unlikely in the case of dev/stg/prod, but...) you could use a cross-database query (see this question) to scan for differences.

In all the above cases you are going to be scanning the whole table(s) in both places.

CAVEAT: I'm a SQL Server person generally and I've translated something I've done there to postgres, the above syntax may need a little work!

David Spillett
  • 32,593
  • 3
  • 50
  • 92