Want to transfer the maximum data with the minimum overhead. The more data you pack into fewer files, the faster the transfer will be. tar can be used to improve performance for many small files. Compression can also speed data transfer if the data is not already compressed and can be compressed significantly.
GlobalOnline/GridFTP - fastest for streaming data transfer - requires IT assistance
bbcp - fastest, most convenient single-node method for transferring data from SLAC - Linux only
bbcp does not encrypt the data stream - uses ssh for authentication bbcp file user@remotesite:/destination
bbcp -w 2M -s 10 -c 1 file user@remotesite:/destination window size 2M, 10 streams, compression value 1
Note: bbcp is very slow at copying deep directory tress of small files. Use tar or the named pipes option "-N io". bbcp -w 2M -s 10 -N io 'tar -cv -O /w2 ' remotehost:'tar -c /nffs/w2 -xf - '
bbcp requires a range of at least 8 tcp ports to be open through the firewall
FDT - Fast Data Transfer see: http://monalisa.cern.ch/FDT/ from Caltech
FDT is an Application for Efficient Data Transfers which is capable of reading and writing at disk speed over wide area networks (with standard TCP). It is written in Java, runs an all major platforms and it is easy to use. Does not encrypt the data stream.
FDT can be used to stream a large set of files across the network, so that a large dataset composed of thousands of files can be sent or received at full speed, without the network transfer restarting between files. May be slower than bbcp.
Secure Copy mode:
In this mode the server will be started on the remote system automatically by the local FDT client using SSH.
java -jar fdt.jar -P 8 -ss 2M -r /home/localuser/local/data remoteuser@remote_address:/home/remoteusers/destination_dir
By default FDT will use port 54321 on the remote host which must be open. The -P option specifies how many parallel streams to use. The default is 4. The -ss option specifies the window size. The -r option (recursive mode) is to copy the entire directory and all of its children. These options can be adjusted to provide better performance.
Users have reported 2 x speed of standard scp. Transferring 10GB (318 x 32MB files) Sector 23 to University of Michigan: scp: 16m14s FDT: 8m50s bbcp: 8m12s
For Windows: java -jar fdt.jar -pull -c winHost -d /localdir -r -fl fileList For Windows use the -fl options to place names of files in a file
Sample Results: disk-to-disk testing from Berkeley, CA to Argonne, IL (near Chicago). RTT = 53 ms, network capacity = 10Gbps, RAID = 4 disks, RAID Level-0. Note that to get more than 1 Gbps (125 MB/s) disk to disk requires RAID.
scp 140 Mbps (17.5 MB/s)
HPN patched scp, 1 disk 760 Mbps (95 MB/s)
HPN patched scp, RAID disk 1.2 Gbps (150 MB/s)
GridFTP, 1 stream, 1 disk 760 Mbps (95 MB/s)
GridFTP, 1 stream, RAID disk 1.4 Gbps (175 MB/s)
GridFTP, 4 streams, RAID disk 5.4 Gbps (675 MB/s)
GridFTP, 8 streams, RAID disk 6.6 Gbps (825 MB/s)
Say NO to scp:
scp, sftp and rsync perform poorly on a WAN scp is 10x slower than single stream GridFTP, and 50x slower than parallel GridFTP
Berkeley, CA to Argonne, IL (near Chicago).
RTT = 53 ms, network capacity = 10Gbps.
scp 140 Mbps
HPN patched scp 1.2 Gbps
GridFTP, 1 stream 1.4 Gbps
GridFTP, 4 streams 5.4 Gbps
GridFTP, 8 streams 6.6 Gbps
HPN Patch for SSH/SCP from Pittsburgh Supercomputing Center:
Patch set designed to remove a networking bottleneck in the base OpenSSH code. Significant performance increase. https://www.psc.edu/
To see if your end-to-end path is jumbo clean: ping -M do -s 8972 ip_address header math: 20 bytes IP + 8 bytes ICMP + 8972 bytes payload = 9000