1

I manage a few tables with more than 1 billion records. Daily, their ETLs run, reading data from flat files and inserting a few thousand records on each of them.

Now I need to keep these same tables also on another DB, and of course they need to have the same data. Data will be transferred using Linked Server.

The easiest way would be to truncate the destination tables, select on source inserting to destination, and then rebuild its index. But I don't like the idea of copying the same data every day.

Another solution I see is to query both source and destination tables, distincting by date dimension, and then copy only records from dates that are missing. I'd then create a monitor to see if the amount of records per day are equal and highlight not matching days.

Is this second solution good enough, or is there a best practice for this situation?

Aaron Bertrand
  • 181,950
  • 28
  • 405
  • 624
Hikari
  • 1,603
  • 12
  • 28
  • 42

1 Answers1

2

If I had to manage this, I'd look at what, if anything, modifies the data in the tables outside of the ETL process.

If nothing else besides your ETL modifies the data, I would simply update the ETL process to insert the finished data at both locations (and likewise carry out whatever index maintenance you're doing in both places).

If something else updates this data, but only one one server, then transactional replication is probably the most lightweight way to get the data to the secondary server. Even if the data isn't being modified outside of the ETL, then this wouldn't be a terrible alternative to modifying the ETL process to update two targets. It sounds like a relatively small percentage of data that's being inserted daily.

If the data is being modified on both servers, then you'll probably want to consider merge replication. The simplicity of this will largely depend on if the table has any identity columns.

db2
  • 9,708
  • 4
  • 37
  • 58