1

I have a host machine running a postgresql docker container listening and exposing port 5432 on the host. I had a set of scripts that upload a large set of records from a CSV using PySpark and psycopg2 into the postgres container. When I modified the hosts IP address from 192.168.1.0/24 range to 10.0.0.1/24 range the import scripts runs until 28,038 records and then times out. I've created a new postgresql container and the same thing happens. I also destroyed the bridge network ls-network that was attached to the container and this behavior still happens.

Any thoughts on where I could troubleshoot this issue? Other than destroying the network and re-creating it, I have no other thoughts on why this behavior is occurring. (FYI: the same scripts and data run perfectly fine on an EC2 instance, and were working correctly prior to the IP migration of the host).

Peter Turner
  • 1,482
  • 4
  • 18
  • 39
Philoxopher
  • 111
  • 1

1 Answers1

1

PostgreSQL has a statement_timeout setting that defines how long a query is allowed to run before it's canceled. If the import process is hitting this timeout, it could be causing the connection to drop. Check the current setting with:

SHOW statement_timeout;

You can increase this value (in milliseconds) or set it to 0 (no timeout)

Autocommit Mode: If you're doing large transactions, you might also want to enable autocommit to prevent psycopg2 from trying to handle very large commits at once:

conn.autocommit = True

Consider increasing the memory and CPU limits in your Docker configuration