0

Hi on an EC2 server I'm using the following to gzip SQL from another server:

mysqldump -h $HOST -u $UNAME -p$PWORD --single-transaction $1 | gzip -5 > $2_`date +%d`.sql.gz

At the moment the SQL data is 560 Mb and here is information from "free":

             total       used       free     shared    buffers     cached
Mem:       2049568    1731356     318212        360     144328     529472
-/+ buffers/cache:    1057556     992012
Swap:            0          0          0

I was wondering how it would work if I had 1 Gb or 2 Gb of SQL data? Does it do gzip while it is receiving the data to minimize RAM usage? Or does it get the whole SQL data first then gzip it?

1 Answers1

2

Other than the dump on the remote system, this command can use surprising little memory. Mysqldump can page the data files into memory as needed. Indexes are unlikely to be used, so not all the blocks in the data files need to be read. Besides the extra I/O to read blocks in from disk, there may be additional I/O to replace them in buffers. As this is happening on another system, the local impact is only a small amount of memory for network data buffers, and the small amount mysqldump needs to construct the output.

mysqldump -h $HOST -u $UNAME -p$PWORD --single-transaction $1 

On Linux/Unix platforms the pipe will only use enough memory to block buffer the data to gzip. gzip will buffer some data to enable it to optimize the compression. Higher compression values may buffer more data, and will require more CPU. The data from mysqldump is highly compressible as there are long repeated strings in the output.

| gzip -5

The redirection will again block buffer data before writing it to disk. Much of this will use memory recorded as buffers. Modern file systems may keep several seconds worth of data in memory, before flushing the data to disk.

> $2_`date +%d`.sql.gz

Data buffers will remain in the pool after they have been written. These blocks will be quickly reclaimed if the system has a need for free memory. I've used systems where it would take several days to fill memory to the point where data buffers needed to be reclaimed. While it appears you are using slightly over half the available memory excluding buffers and cache, the buffers and cache can contribute significantly to performance.

If this was happening all on one server, this might force block from mysql files out of the buffer pool. These would have beed served from memory rather than disk when a read was requested. This might slow down the mysql performance until the blocks are replace. However, on the remote system actively read blocks may be forced out as the data is read into memory.

BillThor
  • 28,293
  • 3
  • 39
  • 70