rsync: the one copy command to rule them all

rsync is a command line utility available for UNIX and OSX that copies files. And it does it well. In fact, I have it set as an alias for the regular cp command in my ~/.bashrc:

alias cp='rsync -ae ssh'

This means that whenever I type cp file other_file it gets translated to rsync -ae ssh file other_file.

Basic usage

In its most basic form, rsync behaves just like cp. However, it comes with a huge list of command line arguments (you should go and read man rsync if you get the chance) that unlock neat features. Here are my favorites:

rsync --progress -h

With these flags, a progress bar will be shown during the transfer (-h makes it human readable) so you will get an estimate of when the transfer of a big file will finish.

rsync -e ssh

With this, rsync will use SSH to copy files across machines! For example:

rsync -e ssh files_to_copy

or if you have configured your SSH properly (see a previous post) it simply becomes:

rsync -e ssh files_to_copy remote:where/you/want/it/

Of course, it can also download files from a remote machine:

rsync -e ssh remote:files_to_copy ./where/you/want/it/

A nice feature of rsync is that it will detect whether two files are identical, so it will not bother to re-copy them. This means you can usually be lazy and just copy whole directories.

Another neat flag is:

rsync --append huge_file.dat some/other/place/huge_file.dat

The --append flag will make rsync detect whether part of the file has been copied earlier and start where it left off. This is great when downloading large files overĀ a shaky wifi connection. More than once I’veĀ started a download during a meeting/lecture/talk, closed my laptop when the talk is over, and later resumed the download in my office.

And finally, this is an important one as well:

rsync -a -e ssh files_to_copy remote:where/you/want/it

The -a flag means that rsync will do its best to set the correct owner and group for the remote files, as well as preserve the file permissions on it. This is almost always what you want.

A caveat

By default, rsync will detect whether the destination file already exists. In the case where you are transferring the file over a network, rsync will attempt to be clever and only transmit the differences between the source and destination file. This can speed up things tremendously. However, when transferring large (multiple GB’s) files over a very fast network connection, the computation of the differences between the files becomes a performance bottleneck and you will notice very low transfer speeds (only several MB/s). In this case, tell rsync to don’t bother trying to be clever and just transfer the entire file with the -W flag:

rsync -W -e ssh big_file.dat

Set it as an alias

Nobody would expect you to actually remember and type out the above commands. Just set an alias for some scenarios that occur often by appending the following to your ~/.bashrc:

alias cp='rsync -ae ssh'
alias cpv='rsync -vhae ssh --progress'
alias cpa='rsync -vhae ssh --progress --append'

With this in place, you can just treat rsync as an improved version of the cp command:

$ cpa remote:/data/big_recording.fif .
receiving incremental file list
  1,138,341,699 100%   81.11MB/s    0:00:13 (xfr#1, to-chk=0/1)
sent 30 bytes  received 1,138,480,768 bytes  84,331,910.96 bytes/sec
total size is 1,138,341,699  speedup is 1.00

Would you like to know more?