I have set up a Change Data Capture (CDC) pipeline using PeerDB to mirror tables from a PostgreSQL standby read replica to ClickHouse.
• The PostgreSQL database contains terabytes of data.
• The initial snapshot of the existing data needs to be loaded into ClickHouse.
• PeerDB is configured to pull from the standby read replica.
Questions:
- How long will the initial snapshot take? Are there any benchmarks or estimations based on database size?
- Will the initial snapshot affect the standby PostgreSQL server’s performance?
• Since it is a read replica, will PeerDB’s snapshot queries (e.g., COPY, SELECT * FROM) put significant load on it?
• Would it impact replication lag from the primary database? - Are there any best practices to optimize the initial snapshot process to minimize impact on the standby server?