10

I'm trying to transfer about 100k files totaling 90gb. Right now I'm using rsync daemon but its slow 3.4mb/s and I need to do this a number of times. I'm wondering what options do I have that would max out a 100mbit connection over the internet and be very reliable.

MDMarra
  • 101,323

7 Answers7

11

Have you considered Sneakernet? With large data sets overnight shipping is often going to be faster and cheaper than transferring via the Internet.

ceejayoz
  • 33,432
11

How? Or TL;DR

The fastest method I've found is a combination of tar, mbuffer and ssh.

E.g.:

tar zcf - bigfile.m4p | mbuffer -s 1K -m 512 | ssh otherhost "tar zxf -"

Using this I've achieved sustained local network transfers over 950 Mb/s on 1Gb links. Replace the paths in each tar command to be appropriate for what you're transferring.

Why? mbuffer!

The biggest bottleneck in transferring large files over a network is, by far, disk I/O. The answer to that is mbuffer or buffer. They are largely similar but mbuffer has some advantages. The default buffer size is 2MB for mbuffer and 1MB for buffer. Larger buffers are more likely to never be empty. Choosing a block size which is the lowest common multiple of the native block size on both the target and destination filesystem will give the best performance.

Buffering is the thing that makes all the difference! Use it if you have it! If you don't have it, get it! Using (m}?buffer plus anything is better than anything by itself. it is almost literally a panacea for slow network file transfers.

If you're transferring multiple files use tar to "lump" them together into a single data stream. If it's a single file you can use cat or I/O redirection. The overhead of tar vs. cat is statistically insignificant so I always use tar (or zfs -send where I can) unless it's already a tarball. Neither of these is guaranteed to give you metadata (and in particular cat will not). If you want metadata, I'll leave that as an exercise for you.

Finally, using ssh for a transport mechanism is both secure and carries very little overhead. Again, the overhead of ssh vs. nc is statistically insignificant.

bahamat
  • 6,433
8

You mention "rsync," so I assume you are using Linux:

Why don't you create a tar or tar.gz file? Network transfer time of one big file is faster than many small ones. You could even compress it if you wish...

Tar with no compression:

On the source server:

tar -cf file.tar /path/to/files/

Then on the receiving end:

cd /path/to/files/
tar -xf /path/to/file.tar

Tar with compression:

On the source server:

tar -czf file.tar.gz /path/to/files/

Then on the receiving end:

cd /path/to/files/
tar -xzf /path/to/file.tar.gz

You would simply use rsync to do the actual transfer of the (tar|tar.gz) files.

Soviero
  • 4,426
5

You could try the tar and ssh trick described here:

tar cvzf - /wwwdata | ssh root@192.168.1.201 "dd of=/backup/wwwdata.tar.gz"

this should be rewritable to the following:

tar cvzf - /wwwdata | ssh root@192.168.1.201 "tar xvf -"

You'd lose the --partial features of rsync in the process, though. If the files don't change very frequently, living with a slow initial rsync could be highly worth-while as it will go much faster in the future.

warren
  • 19,297
2

You can use various compression options of rsync.

-z, --compress              compress file data during the transfer
     --compress-level=NUM    explicitly set compression level
     --skip-compress=LIST    skip compressing files with suffix in LIST

compression ratio for binary files is very low, so you can skip those files using --skip-compress e.g. iso, already archived and compressed tarballs etc.

0

My best solution after a fair it of testing, is using rsync (tell me what files are different between 2 directories) to tar to zstd (compress with option level 2 works for me) to mbuffer (to max out the network on my 1gb lan). And then reverse that on the receiving side.

On receiving side:

mbuffer -s 128k -m 1G -I8080 | tar -vx -C /zos25/z --use-compress-program=zstdmt

On sending side: (run cmd from /zos25/z)

rsync --info=name --out-format="%n" -ainAXEtp /zos25/z/ root@ipaddress:/zos25/z | tar -I 'zstdmt -2' -cvf - -T - | mbuffer -s 128k -m 1G -O ipaddress:8080

I ran ssh-keygen on both sides and copied keys so the ssh didn't ask for a password.

I have 4 way laptops at either end so CPU is not an issue, the network is the bottleneck, I get 70mb/s with data that's compressed about 80%. So its pretty efficient. I would recommend you play with zstdmt -2 value to see if greater or lesser compression effects thruput. A faster network could cope with lesser compression (zstdmt -1).

Lunar
  • 103
-5

I'm a big fan of SFTP. I use SFTP to transfer media from my main computer to my server. I get good speeds, over LAN.

SFTP is reliable, I'd give that a shot, as it's easy to set up, and it could be faster in some cases.

Tillman32
  • 137
  • 4