1

We are experimenting with patroni Multi Datacenter HA with postgres16 using replication slots. At high level for building the remote cluster from scratch , we are by running base backup, while active is experiencing writes .

We use replication slot for hot standby via patroni . Here is example of active cluster config


postgresql:
  parameters:
    archive_command: /bin/true
    archive_mode: 'on'
    checkpoint_completion_target: '0.9'
    checkpoint_timeout: 300
    default_statistics_target: 100
    effective_cache_size: 12GB
    effective_io_concurrency: 2
    fsync: 'on'
    full_page_writes: 'on'
    hot_standby: true
    hot_standby_feedback: true
    idle_in_transaction_session_timeout: 10min
    log_min_duration_statement: 10000
    log_min_error_statement: ERROR
    log_min_messages: WARNING
    log_temp_files: 4000
    maintenance_work_mem: 3GB
    max_connections: 512
    max_parallel_maintenance_workers: 4
    max_parallel_workers: 10
    max_parallel_workers_per_gather: 2
    max_prepared_transactions: 1024
    max_replication_slots: 10
    max_slot_wal_keep_size: 20GB
    max_wal_senders: 10
    max_wal_size: 2GB
    max_worker_processes: 10
    min_wal_size: 512MB
    random_page_cost: 2
    shared_buffers: 8GB
    synchronous_commit: remote_write
    track_functions: all
    track_io_timing: 'on'
    wal_buffers: 16MB
    wal_compression: 'on'
    wal_keep_segments: 128
    wal_level: replica
    work_mem: 16MB
  use_pg_rewind: true
  use_slots: true
retry_timeout: 14

What we see in experiments is, once PG is started post basebackup via patroni, it moves to patroni leader/running state , then we apply standby cluster config


curl -X PATCH http://localhost:8008/config \
     -H "Content-Type: application/json" \
     -d '{
           "standby_cluster": {
             "host": "standby.example.com",
             "port": 5432,
             "primary_slot_name": "standby_slot"
           }
         }' \
     --data-urlencode "force=true"

This results in pg rewind , followed by cluster moving to Standby leader / streaming state most of the time. But at other times, it moves to Standby leader/ starting state at the end of rewind .What we see in pg logs is something similar to below

2024-08-19 20:02:00 UTC:::@:[762363]:LOG:  contrecord is requested by 93/4C000028
2024-08-19 20:02:00 UTC:::@:[762363]:LOG:  waiting for WAL to become available at 93/4C000040

Further analyzing lsn in replication slot indicates this record is generated post base backup time,was archived by leader just after base backup. In standby cluster , the wal file associated with lsn is present in pg_wal dir , but database is stuck on starting up , it never goes to streaming state and current state of replication slot is not active. Why wal file not being sent to hot standby despite being present on active cluster , to enable it move it to streaming state ?

Ganesh
  • 121
  • 3

2 Answers2

1

The key issue turned out to be intermediate state transition post bootstrap, where the Database transitions to leader state , then standby leader config in patroni is applied. This causes pg rewind, where timeline convergence and redo log seem to have issues. We modified the steps

  1. Stop postgres

  2. patronictl remove cluster

  3. Clean up DCS config

  4. Apply standby cluster patroni to let bootstrap go through without intermediate transition to leader state

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    synchronous_mode: false
    slots:
      repslot:
        type: 'physical'
    standby_cluster:
      create_replica_methods:
      - basebackup
      host: 192.168.6.60
      port: 5432
      primary_slot_name: repslot
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: 'replica'
        hot_standby: 'on'
        synchronous_commit: 'local'
        wal_keep_segments: 960
        wal_compression: 'on'
        archive_mode: 'on'
        archive_command: '/bin/true'
        fsync: 'on'
$PATH=/usr/lib/postgresql/16/bin:$PATH ./patroni.py postgres1.yml:

2024-08-26 09:27:40,341 INFO: Selected new etcd server http://localhost:2379 2024-08-26 09:27:40,344 INFO: No PostgreSQL configuration items changed, nothing to reload. 2024-08-26 09:27:40,347 INFO: Lock owner: None; I am postgresql1 2024-08-26 09:27:40,350 INFO: trying to bootstrap a new standby leader pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_479100" pg_basebackup: write-ahead log end point: 0/4000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed 2024-08-26 09:27:45,056 INFO: replica has been created using basebackup 2024-08-26 09:27:45,058 INFO: bootstrapped clone from remote member postgresql://127.0.0.1:5432 2024-08-26 09:27:45,242 INFO: postmaster pid=479106 localhost:5433 - no response 2024-08-26 09:27:45.263 CEST [479106] LOG: starting PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit 2024-08-26 09:27:45.263 CEST [479106] LOG: listening on IPv4 address "127.0.0.1", port 5433 2024-08-26 09:27:45.277 CEST [479106] LOG: listening on Unix socket "../.s.PGSQL.5433" 2024-08-26 09:27:45.294 CEST [479110] LOG: database system was interrupted; last known up at 2024-08-26 09:27:44 CEST 2024-08-26 09:27:45.715 CEST [479110] LOG: entering standby mode 2024-08-26 09:27:45.715 CEST [479110] LOG: starting backup recovery with redo LSN 0/4000028, checkpoint LSN 0/4000060, on timeline ID 1 2024-08-26 09:27:45.738 CEST [479110] LOG: redo starts at 0/4000028 2024-08-26 09:27:45.746 CEST [479110] LOG: completed backup recovery with redo LSN 0/4000028 and end LSN 0/4000100 2024-08-26 09:27:45.746 CEST [479110] LOG: consistent recovery state reached at 0/4000100 2024-08-26 09:27:45.746 CEST [479106] LOG: database system is ready to accept read-only connections 2024-08-26 09:27:45.766 CEST [479113] LOG: started streaming WAL from primary at 0/5000000 on timeline 1 localhost:5433 - accepting connections localhost:5433 - accepting connections 2024-08-26 09:27:46,271 WARNING: Could not activate Linux watchdog device: Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog' 2024-08-26 09:27:46,271 INFO: establishing a new patroni heartbeat connection to postgres 2024-08-26 09:27:46,292 INFO: initialized a new cluster 2024-08-26 09:27:56,288 INFO: no action. I am (postgresql1), the standby leader with the lock

The above seem to fix the issue.

Ganesh
  • 121
  • 3
0

There are several potential causes to see this issue, so it would be good to have more information or log files content. For example:

Did you check the privileges/owner of the WAL files or directory?

$ ls -lah /path/to/wal/file

What is the status of the replication slot on the primary server?

$ sudo -u postgres psql -c 'SELECT * FROM pg_replication_slots;'

Did you check network connectivity between the primary and standby servers? Is it working correctly?

Is checkpoint_timeout and wal_keep_size configured correctly to keep the files?

sudo -u postgres psql -c 'SHOW checkpoint_timeout;'
sudo -u postgres psql -c 'SHOW wal_keep_size;'
Severalnines
  • 419
  • 1
  • 5