I'm trying to dump a table with 50M records to a file, and my goal is to reduce the time in which this action is performed. I usually use the COPY metrics TO 'metrics.csv' DELIMITER ',' CSV; This could take like an hour in the best cases. I'm also interested in exporting the data in some plain format(avoid using pd_dump directories).
One of the ideas is to somehow access this table by a condition or cursor that splits the whole table into equal size pieces, so you can perform for example 2 copy queries at the same time reducing the time to the half.
Example:
COPY (SELECT * FROM metrics WHERE id < 25000000) TO 'metrics_1.csv' DELIMITER ',' CSV;
COPY (SELECT * FROM metrics WHERE id >= 25000000) TO 'metrics_2.csv' DELIMITER ',' CSV;
Could partial indexes created on those conditions help?
Any ideas a good way to implement this partial copy dumps of a table? Is there any other solution to dump this table faster?
Postgresql 11 / 100GB RAM / 20 cores.
After some parallelization with COPY IO bounds doesn't seem to be the bottleneck.
