How to transfer large data via network

本文探讨了多种高效数据传输方法,包括使用bbcp、lftp、FastDataTransfer等工具提升跨广域网的数据传输速度,分析了压缩、并行传输及避免冗余传输的重要性。

1. Executive Summary

If you have to transfer data, transfer only that which is necessary. If you unavoidably have TBs to transfer regularly, consider having your institution set up a GridFTP node.

If GridFTP is not available, a very easy user-side transfer approach is using a Globus Online endpoint. While the Globus technology is free, subscription support provides more functionality, but also the possibility of service interruption. Depending on cost relative to Globus, Aspera may be very effective as well, providing extremely fast data transfer, albeit requiring a licensed server. The fastest, easiest, user-mode, node-to-node method (that remains free) to move data for Linux and MacOSX is with bbcp. Note that it is quite sensitive to tuning which may limit its ease for naive users. An exception is for extremely large directory trees for which bbcp is inefficient due to time required for building the directory tree. In that case, rsync may be an easier choice, although bbcpoffers a named-pipe option which can use an external app to do the recursive operation. lftp is a less sophisticated, but more widely available alternative to bbcp.

For first-time transfers of multi-GB directory trees containing 10,000s of files, the use of tar & netcat seems to be the fastest way to move the data. tnc is a Perl wrapper (see below)that helps in this regard.

If you use Windows, fdt is Java-based and will run there as well.

Note that bbcp and the similar bbftp can require considerable tuning to extract maximum bandwidth. If these applications do not work at expected rates, ESNet’s Guide to Bulk Data Transfer over a WAN is an excellent summary of the deeper network issues. (Thanks to Rob Wells for the link change info.)

And everyone should know how to use rsync, which is available on most *nix sytems and should be the default fallback for most data transfers. Parallel wrappers for rsync exist which can speed up large transfers, especially over WANs. Read more about this below.

2. What Data Where

2.1. qdirstat

The elegant, Qt-only, open source qdirstat (latest iteration of the original, beautiful, but dependency-ridden kdirstat) and its ports to MacOSX Disk Inventory X and Windows Windirstat) are quick ways to visualize what’s taking up space on your disk so you can either exclude the unwanted data that needs to be copied or delete it to make more space. All of these are fully native GUI applications that show disk space utilization by file type and directory structure.qdirstat screenshot

2.2. gt5

Unlike qdirstat above which requires graphics mode, gt5 (Linux only, altho Win10 now supports Ubuntu Linux utils) is a very slick, simple, fast terminal app which allows you to recursively identify large dirs and cursor your way thru them.

3. Problems with moving data across WANs

This may not be the information you’re looking for, but it helps to form the mental picture of what’s happening to your data as it flies across the wires. If you already know the diffs between TCP and UDP and how and why ping times are important, please feel free to skip down to the more immediately useful bits.

We all need to transfer data, and the amount of that data is increasing as the world gets more digital.

The usual methods of transferring data (scphttp and ftp utilities such as curl or wget) work fine when your data is in the MB or even GB range, but when you have very large collections of data there are some tricks that are worth mentioning, especially if you are transferring them across Wide Area Networks (WANs)

3.1. Packet latency

ping times are a measure of the roundtrip time RTT it takes for a packet to reach a destination and have an acknowledgement return. For example, I’m writing this on a laptop in Irvine CA. The ping time to my home wireless router is about 1ms. Because my ISP is Cox, a ping to a computer down the hill at UCI (moo) is more than 10x that at about 15ms, since according to traceroute, that ping has to travel thru 15 devices to LA and back.

And a ping across the country to Nova Scotia (7200 roundtrip miles by road) takes more than 100x as long, about 108ms, across at least 20 devices. Since that time would allow an unimpeded photon to travel about 16,000 miles, the rest of the delay is due to cable and device delays, which effectively slow the communications to about 40% of the speed of light.

3.2. TCP

The RTT above is important because the TCP protocol works by verifying the arrival of each packet, which requires a network round trip for each packet sent. When the ping times increase, obviously the number of packets that can be verified per time period decreases so TCP works slower over greater distances. This is bad. Not to belabor the point, but if you send a serial stream of TCP packets (FTP, rsync or almost any of the protocols mentioned in this doc), the rate at which you can send them, receive verification, and send another decreases as the ping time increases. This constraint is known as the Bandwidth-delay product and is a major component of why that shiny new 100Gb network switch yields such lousy performance over long distances.

There are a few ways to bypass or improve on this problem. You can compress your data before sending it, in effect sending more data in the same packet. You can try to increasing the packet size known as the (the Maximum Transmission Unit (MTU)). The problem with the latter is that many commodity Internets set MTUs fairly low (1500 bytes), altho high speed devices or academic networks often allow the use of jumbo frames which are MTUs with a payload of up to 9000 bytes. LAN MTUs can be as high as 64KB, allowing much more efficient transfer.

3.3. UDP

You can also skip the TCP protocol entirely and use the UDP protocol. This is a much less reliable mechanism for data transmission since it does not verify packet delivery or order. However, the underlying hardware for modern networks have gotten so reliable that UDP is again gaining use by wrapping a unreliable protocol with sideband TCP integrity checking so that very large (64K and larger) packets can be sent very quickly. GridFTP and its Globus variants Connect/MultiConnect , Aspera, Signiant, RBUDP, Tsunami, and other data transfer mechanisms use (or can use) this approach. However none of those are very easy to set up and use on an ad hocbasis (and Aspera & Signiant are not free). As a side note, Google has released its QUIC (Quick UDP Internet Connections) code for testing here, altho it currently seems alpha stage.

3.4. Parallel TCP

The last way of addressing the inherent limitation of long distance data transfer is to send multiple streams of TCP packets simultaneously. This parallel transfer of data is increasingly being used by a variety of applications or wrappers of existing applications. Google’s parallel composite transfer using gsutil, the much easier rclone (see also Using rclone to push data to your Google Drive).

NB: gsutil and rclone operate only with relatively blobby object filesystems on both ends; the rsync family works with POSIX filesystems and can thus do real syncing operations. Here’s a more extensive description of the differences.

bbcp and parsyncfp, and fpsync (part of the fpart pkg) all use this mechanism. The best one depends on your network, your endpoints, and what control you have over those endpoints. rclone was designed to transfer data to the Amazon, Google, and other clouds and supports many of those authentication protocols; parsyncfp and fpsync are essentially data-balancing parallel rsyncs, which assume a shell account on both ends of the network and the ability to set up ssh keysbbcp can adjust number of TCP streams and packet window sizes to increase bandwidth considerably.

Note that both rclone and gsutil (and its parent toolkit boto) can handle cloud authentication protocols, whereas both rsync/parsyncfp and bbcp use ssh to authenticate connections. Also, bbcp does not compress or encrypt its data stream(s) unless requested via flag or a pipe to an external program.

Note

A note about transferring Zillions Of Tiny (ZOT) files

Altho much big data is showing up in very large files (10s or 1000s of GB each), there is a lot of traffic in small files, often generated by naive users who are creating many (100K to 1,000K) such files in a single analytical run. (Trinity, I’m looking at you.)

It’s worth a few words about the size and number of files. A file on a disk is characterized not only by its contents but by the file descriptor itself. Each file requires the lookup and examination of an inode structure to find out where the disk blocks of that file are kept. Obviously if you have 1GB of data in 1 file, it will be accessible much more quickly than if you have to look up 1 million files of 1000 bytes each. This has implications when you’re transferring data on an active system. You usually want to transfer the maximum data with the minimum overhead, so if your files are large, it will transfer more rapidly. Here’s an example.

A Mail dir on my laptop contains 95MB of information in 32,304 files and dirs. It takes 12s to move to a remote server over 1GbE when being copied file by file. It takes about 3s to store all the files and dirs in an uncompressed tar file but then takes only 5s for the single file that contains all that data to transfer to the same server over the same connection. This difference is accentuated as the number of files increases and the network hop-count increases.

The more data you can pack into fewer files, the faster your transfer will be. Obviously if it’s a few files over a private, fast, direct-attached filesystem, it won’t be significant, but when you’re moving ZOTfiles over a Wide Area Network or even across networked filesystems, it can make a huge difference.

4. Compression & Encryption

Whether to compress and/or encrypt your data in transit depends on the cost of doing so. For a modern desktop or laptop computer, the CPU(s) are usually not doing much of anything so the cost incurred in doing the compression/encryption is generally not even noticed. However on an otherwise loaded machine, it can be significant, so it depends on what has to be done at the same time. Compression can reduce the amount of data that needs to be transmitted considerably if the data is of a type that is compressible (text, XML, uncompressed images and music), however progressively such data is already compressed on the disk (in the form of jpeg or mp3 compression), and compressing already compressed data yields little improvement. Some compression utilities try to detect already-compressed data and skip it, so there’s often no penalty in requesting compression, but some utilities (like the popular Linux archiving tar) will not detect it correctly and waste lots of time trying.

As an extreme example, here’s the timing of making a tar archive of a large directory that consists of mostly already compressed data, using compression or not.

Using compression:

$ time tar -czpf /bduc/data.tar.gz /data
tar: Removing leading `/' from member names

real    201m38.540s
user    95m32.114s
sys     7m13.807s

tar file = 84,284,016,900 bytes

NOT using compression:

$ time tar -cpf /bduc/data.tar /data
tar: Removing leading `/' from member names

real    127m13.404s
user    0m43.579s
sys     5m35.437s

tar file = 86,237,952,000

It took more than 74 minutes (about 58%) longer using compression which gained us about 2GB less storage (2.3% decrease in size.) YMMV.

Note

Parallel compression/decompression

There are now parallel compression/decompression routines that will, for large files, help substantially, by using all the available CPU cores to do the compression.

From the same author as gzip comes pigz/unpigz (probably already in your repository) that is a near-drop-in replacement for gzip/gunzip. There is a also parallel bzip2 engine called pbzip2 that is a near-drop-in replacement for bzip2. For very large jobs there is also an MPI-capable bzip2 utility. The pigz compression accelerates on a per-file basis, so compressing ZOT files will not give you much of a speedup, but if you pass large files thru pigz, you’ll get close-to-perfect scaling.

Similarly, there is a computational cost to encrypting and decrypting a text, but less so than with compression.scp and sftp use ssh to do the underlying encryption and it does a very good job, but like the other single-TCP-stream utilities like curl and wget, it will only be able to push so much thru a connection.

5. Avoiding data transfer

The most efficient way to transfer data is not to transfer it at all. There are a number of utilities that can be used to assist in NOT transferring data. Some of them are listed below.

6. rsync

rsync, from the fertile mind of Andrew (samba) Tridgell, is an application that will synchronize 2 directory trees, transferring only blocks which are different. rsync deserves its own section - it’s one of the most elegant utilities you’ll find in computer science.

The open source rsync is included by default with almost all Linux and MacOSX distributions. Versions of rsync exist for Windows as well, via CygwinDeltaCopy, and others.

Note

rsync vs bbcp

bbcp can act similarly to rsync but will only checksum entire files, not blocks, so for sub-GB transfers, rsync is probably a better choice in general. For very large files or directory trees, bbcp may be a better choice due to its multi-stream protocol and therefore better bandwidth utilization.

Note also that rsync is often used with ssh as the remote shell protocol. If this is the case and you’re using it to transfer large amounts of data, note that there is an old known sshbug with the static flow control buffers that cripples it for large data transfers. There is a well-maintained patch for ssh that addresses this at the High Performance SSH/SCP page. This is well worth checking if you use rsync or scp for large transfers.

For example, if you had recently added some songs to your 120 GB MP3 collection and you wanted to refresh the collection to your backup machine, instead of sending the entire collection over the network, rsync would detect and send only the new songs.

For example, the first time rsync is used to transfer a directory tree, there will be no speedup.

$ rsync -av ~/FF moo:~
building file list ... done
FF/
FF/6vxd7_10_2.pdf
FF/Advanced_Networking_SDSC_Feb_1_minutes_HJM_fw.doc
FF/Amazon Logitech $30 MIR MX Revolution mouse.pdf
FF/Atbatt.com_receipt.gif
FF/BAG_bicycle_advisory_group.letter.doc
FF/BAG_bicycle_advisory_group.letter.odt
 ...

sent 355001628 bytes  received 10070 bytes  11270212.63 bytes/sec
total size is 354923169  speedup is 1.00

but a few minutes later after adding danish_wind_industry.html to the FF directory

$ rsync -av ~/FF moo:~
building file list ... done
FF/
FF/danish_wind_industry.html

sent 63294 bytes  received 48 bytes  126684.00 bytes/sec
total size is 354971578  speedup is 5604.05

So the synchronization has a speedup of 5600-fold relative to the initial transfer.

Even more efficiently, if you had a huge database to back up and you had recently modified it so that most of the bits were identical, rsync would send only the blocks that contained the differences.

Here’s a modest example using a small binary database file:

$ rsync -av mlocate.db moo:~
building file list ... done
mlocate.db

sent 13580195 bytes  received 42 bytes  9053491.33 bytes/sec
total size is 13578416  speedup is 1.00

After the transfer, I update the database and rsync it again:

$ rsync -av mlocate.db moo:~
building file list ... done
mlocate.db

sent 632641 bytes  received 22182 bytes  1309646.00 bytes/sec
total size is 13614982  speedup is 20.79

There are many utilities based on rsync that are used to synchronize data on 2 sides of a connection by only transmitting the differences. The backup utility BackupPC is one.

6.1. Parallel rsyncs

There are a few parallel wrappers for rsync which can trmendously increase the speed at which large, deep directory trees are transferred, especially over WANs. I’ll describe them separately below.

6.1.1. parsyncfp

parsyncfp can often increase the speed of a transfer by parallelizing the transfer. Especially if you are running into LongFat Network problems (long RTTs, suboptimal TCP windows), using parsyncfp can ameliorate some of the inefficiency. As well, if there is an imbalance in the disk speed or network, you can use parsyncfp to optimize the transfer, while still limiting the system load on the transmitting host and network (it will suspend rsync processes if the load goes too high).

It uses Ganael LaPlanche’s fpart utility (see immediately below) to chunk files together so that transfers can start immediately without waiting for complete recursive descent of the directory tree. On multi-TB dirs, this cataloging can take hours and even days. It is otherwise similar to it’s parent parsync (now deprecated; please don’t use it), but hasn’t been completely ported to the Mac.

6.1.2. fpsync

Part of the fpart utility mentioned above and below, it’s a shell script that’s similar to parsyncfp, but less complicated. It leverages Ganael’s elegant fpart utility to enormously speed up the transfers of large dir trees. It’s also included as part of the fpart utility. fpart and fpsync run on Linux and BSD-based systems, including the Mac.

Note

File Partitioning Utilities

For this kind of load-balancing, 2 utilities should be noted:

  • fpart, a file partitioning tool collects file info and divides them into N chunkfiles, based on a number of criteria. The author of fpart Ganael LAPLANCHE Ganael LaPlanche has written a very good article: PARALLÉLISEZ FILE TRANSFERSdescribing many of the problems (and some good solutions) about large-scale data transfer. This article is in French, but Google does a decent job in translating.

  • kdirstat-cache-writer, included with and used by the fabulous kdirstat, which is a directory recursion tool that gathers size info about all the files in a tree. This was used in the first version of the above-mentioned parsyncfp to balance the transfer load, until I switched to the fpart partitioner, above. The new, pure-Qt version of kdirstat, called qdirstat, uses a near-identical utility called qdirstat-cache-writer, included in the above qdirstat source tree.

6.2. More rsync examples

Command to rsync data from UCI’s HPC cluster to a remote backup server.

Where we will transfer the dir tacg-4.6.0-src to user happy’s account on the server circus.tent.uci.edu in the dir ~/HPC-backups. In the example below, we have to enter a password. In the 2nd example, we’ve set up passwordless ssh.

# first time:

$ rsync -av tacg-4.6.0-src happy@circus.tent.uci.edu:~/HPC-backups
happy@circus.tent.uci.edu's password: [xxxxxxxxxx]
sending incremental file list
tacg-4.6.0-src/
tacg-4.6.0-src/AUTHORS
tacg-4.6.0-src/COPYING
 ...
tacg-4.6.0-src/tacgi4/tacgi4.pl.in
tacg-4.6.0-src/test/
tacg-4.6.0-src/test/testtacg.pl
sent 2668172 bytes  received 1613 bytes  410736.15 bytes/sec
total size is 2662985  speedup is 1.00

# note the speedup = 1
# second time:

$ rsync -av tacg-4.6.0-src happy@circus.tent.nac.uci.edu:~/HPC-backups
happy@circus.tent.nac.uci.edu's password: [xxxxxxxxxx]
sending incremental file list

sent 1376 bytes  received 18 bytes  398.29 bytes/sec
total size is 2662985  speedup is 1910.32

# note the speedup = 1910X 1st one.

6.2.1. and also …

Here I modify the command to:

# the following 'touch' command freshens the date on all C source files in that dir
$ touch tacg-4.6.0-src/*.c

# generate a datestamp, so a second log doesn't overwrite the previous one
$ DD=`date +"%T_%F" | sed 's/:/./g'`

# !! VERY IMPORTANT !!  The following command DELETES ALL THE FILES in the local (HPC-side) dir tree
# (tho it does leave the tree structure behind).  If you don't want to delete the local files,
# don't include the option '--remove-source-files'

$ rsync -avz --remove-source-files tacg-4.6.0-src  happy@circus.tent.uci.edu:~/HPC-backups \
2> backup_logs/rsync_${DD}.log &

In the above example, there was no output to the screen. All the STDOUT was captured by the bash redirection command:

 ... 2> backup_logs/rsync_${DD}.log

so it now resides in the backup-logs file.

$ cat backup_logs/rsync_12.46.58_2014-04-08.log
sending incremental file list
tacg-4.6.0-src/Cutting.c
tacg-4.6.0-src/GelLadSumFrgSits.c
...
tacg-4.6.0-src/seqio.c
tacg-4.6.0-src/tacg.c

sent 1966 bytes  received 10232 bytes  2710.67 bytes/sec
total size is 2662985  speedup is 218.31
Note

MacOSX

rsync is included with MacOSX as well but because of the Mac’s twisted history of using the using the AppleSingle/AppleDouble file format (remember those Resource forkproblems?), the version of rsync (2.6.9) shipped with OSX versions up to Leopard will not handle older Mac-native files correctly. However, rsync version 3.x will apparently do the conversions correctly.

7. BitTorrent Sync

(placeholder/reminder)

8. Unison

Unison is a slightly different take on transmitting only changes. It uses a bi-directional sync algorithm to unifyfilesystems across a network. Native versions exist for Windows as well as Linux/Unix and it is usually available from the standard Linux repositories.

From a Ubuntu or Debian machine, to install it would require:

$ sudo apt-get install unison

9. Fast Data Transfer Utilities

9.1. bbcp

bbcp seems to be a very similar utility to bbftp below, with the exception that it does not require a remote server running. In this behavior, it’s much more like scp in that data transfer requires only user-executable copies (preferably the same version) on both sides of the connection. Short of access to a GridFTP site, bbcp appears to be the fastest, most convenient single-node method for transferring data.

Note

bbcp does not encrypt the data stream

Unless you use an external encryption utility via bbcp’s named pipes option, bbcp does notencrypt the data stream. It uses ssh to set up the authentication but not to encrypt the data stream. You can use a utility like ccrypt to encrypt/decrypt the network stream. Thanks to Dennis Yang for pointing this out.

The author, Andrew Hanushevsky has made a number of precompiled binaries available as well as access to the bbcp git treegit clone http://www.slac.stanford.edu/~abh/bbcp/bbcp.git Somebody at Caltech has written up a very nice bbcp HOWTO.

The code compiled & installed easily with one manual intervention

curl http://www.slac.stanford.edu/~abh/bbcp/bbcp.tgz |tar -xzf -
cd bbcp
make

# edit Makefile to change line 18 to: LIBZ       =  /usr/lib/libz.a
make
# there is no *install* stanza in the distributed 'Makefile'
cp bin/your_arch/bbcp ~/bin   # if that's where you store your personal bins.
hash -r   # or 'rehash' if using cshrc
# bbcp now ready to use.

bbcp can act very much like scp for simple usage:

$ time bbcp  file.633M   user@remotehost.subnet.uci.edu:/high/perf/raid/file
real    0m9.023s

The file transferred in under 10s for a 633MB file, giving >63MB/s on a Gb net. Note that this is over our very fast internal campus backbone. That’s pretty good, but the transfer rate is sensitive to a number of things and can be tuned considerably. If you look at all the bbcp options, it’s obvious that bbcp was written to handle lots of exceptions.

If you increase the number of streams (-s) from the default 4 (as above), you can squeeze a bit more bandwidth from it as well:

$ bbcp -P 10 -w 2M -s 10 file.4.2G hjm@remotehost.subnet.uci.edu:/userdata/hjm/
bbcp: Creating /userdata/hjm/file.4.2G
bbcp: At 081210 12:48:18 copy 20% complete; 89998.2 KB/s
bbcp: At 081210 12:48:28 copy 41% complete; 89910.4 KB/s
bbcp: At 081210 12:48:38 copy 61% complete; 89802.5 KB/s
bbcp: At 081210 12:48:48 copy 80% complete; 88499.3 KB/s
bbcp: At 081210 12:48:58 copy 96% complete; 84571.9 KB/s

or almost 85MB/s for 4.2GB which is very good sustained transfer.

Even traversing the CENIC net from UCI to SDSC is fairly good:

$ time bbcp -P 2 -w 2M -s 10 file.633M   user@machine.sdsc.edu:~/test.file

bbcp: Source I/O buffers (61440K) > 25% of available free memory (200268K); copy may be slow
bbcp: Creating ./test.file
bbcp: At 081205 14:24:28 copy 3% complete; 23009.8 KB/s
bbcp: At 081205 14:24:30 copy 11% complete; 22767.8 KB/s
bbcp: At 081205 14:24:32 copy 20% complete; 25707.1 KB/s
bbcp: At 081205 14:24:34 copy 33% complete; 29374.4 KB/s
bbcp: At 081205 14:24:36 copy 41% complete; 28721.4 KB/s
bbcp: At 081205 14:24:38 copy 52% complete; 29320.0 KB/s
bbcp: At 081205 14:24:40 copy 61% complete; 29318.4 KB/s
bbcp: At 081205 14:24:42 copy 72% complete; 29824.6 KB/s
bbcp: At 081205 14:24:44 copy 81% complete; 29467.3 KB/s
bbcp: At 081205 14:24:46 copy 89% complete; 29225.5 KB/s
bbcp: At 081205 14:24:48 copy 96% complete; 28454.3 KB/s

real    0m26.965s

or almost 30MB/s.

When making the above test, I noticed the disks to and from which the data was being written can have a large effect on the transfer rate. If the data is not (or cannot be) cached in RAM, the transfer will eventually require the data to be read from or written to the disk. Depending on the storage system, this may slow the eventual transfer if the disk I/O cannot keep up with the the network. On the systems that I used in the example above, I saw this effect when I transferred the data to the /home partition (on a slow IDE disk - see below) rather than the higher performance RAID system that I used above.

$ time bbcp -P 2  file.633M  user@remotehost.subnet.uci.edu:/home/user/nother.big.file
bbcp: Creating /home/user/nother.big.file
bbcp: At 081205 13:59:57 copy 19% complete; 76545.0 KB/s
bbcp: At 081205 13:59:59 copy 43% complete; 75107.7 KB/s
bbcp: At 081205 14:00:01 copy 58% complete; 64599.1 KB/s
bbcp: At 081205 14:00:03 copy 59% complete; 48997.5 KB/s
bbcp: At 081205 14:00:05 copy 61% complete; 39994.1 KB/s
bbcp: At 081205 14:00:07 copy 64% complete; 34459.0 KB/s
bbcp: At 081205 14:00:09 copy 66% complete; 30397.3 KB/s
bbcp: At 081205 14:00:11 copy 69% complete; 27536.1 KB/s
bbcp: At 081205 14:00:13 copy 71% complete; 25206.3 KB/s
bbcp: At 081205 14:00:15 copy 72% complete; 23011.2 KB/s
bbcp: At 081205 14:00:17 copy 74% complete; 21472.9 KB/s
bbcp: At 081205 14:00:19 copy 77% complete; 20206.7 KB/s
bbcp: At 081205 14:00:21 copy 79% complete; 19188.7 KB/s
bbcp: At 081205 14:00:23 copy 81% complete; 18376.6 KB/s
bbcp: At 081205 14:00:25 copy 83% complete; 17447.1 KB/s
bbcp: At 081205 14:00:27 copy 84% complete; 16572.5 KB/s
bbcp: At 081205 14:00:29 copy 86% complete; 15929.9 KB/s
bbcp: At 081205 14:00:31 copy 88% complete; 15449.6 KB/s
bbcp: At 081205 14:00:33 copy 91% complete; 15039.3 KB/s
bbcp: At 081205 14:00:35 copy 93% complete; 14616.6 KB/s
bbcp: At 081205 14:00:37 copy 95% complete; 14278.2 KB/s
bbcp: At 081205 14:00:39 copy 98% complete; 13982.9 KB/s

real    0m46.103s

You can see how the transfer rate decays as it approaches the write capacity of the /home disk.

bbcp can recursively copy directories with the -r flag. Like rsync, it first has to build a file list to send to the receiver, but unlike rsync, it doesn’t tell you that it’s doing that, so unless you use the -D (debug) flag, it looks like it has just hung. The time required to build the file list is of course proportional to the complexity of the recursive directory scan. It can also do incremental copies like rsync with the -a -k flags, which also allow it to recover from failed transfers.

Note that bbcp is very slow at copying deep directory trees of small files. If you need to copy such trees, you should first tar up the trees and use bbcp to copy the tarball. Such an approach will increase the transfer speed enormously.

The most recent version of bbcp can use the -N named pipes option to use external programs or pipes to feed the network stream. This allows you to specify an external program such as tar to provide the data stream for bbcp. Like this:

bbcp -P 2 -w 2M -s 10  -N io 'tar -cv -O /w2 ' remotehost:'tar -C /nffs/w2 -xf - '

The above command uses bbcp’s named pipe option for both input and output (-N io) to take tar’s output from STDOUT (tar’s -O option), and using the above-described options to stream the tar’s output to bbcp to the remotehost where tar is invoked to decompose the bytestream and write it to the new location (-C /nffs/w2)

NB: the original bbcp help page on this option has (as of May09,2013) a typo or 2. The above example is correct and works.

NB: I have occasionally seen this error when using bbcp:

time bbcp -P 10 -w 2M -s 8 root@bduc-login.nacs.uci.edu:/home/testing.tar.gz .
bbcp: Accept timed out on port 5031
bbcp: Unable to allocate more than 0 of 8 data streams.
Killed by signal 15.

If you get this error, add the "-z" option to your command line (right after bbcp). ie"

time bbcp -z -P 10 -w 2M -s 8 root@bduc-login.nacs.uci.edu:/home/testing.tar.gz .
#  .......^^

9.2. bbftp

bbftp is a modification of the FTP protocol that enables you to open multiple simultaneous TCP streams to transfer data. It therefore allows you to sometimes bypass per-TCP restrictions that result from badly configured intervening machines.

In order to use it, you 'll need a bbftp client and server. Most places that receive large amounts of data (SDSC, NCAR, other supercomputer centers, Teragrid nodes) will already have a bbftp server running, but you can also compile and run the server yourself.

The more usual case is to run only the client. It builds very easily on Linux with just the typical curl/untar, cd, ./configure, make, make install dance:

$ curl http://doc.in2p3.fr/bbftp/dist/bbftp-client-3.2.0.tar.gz |tar -xzvf -
$ cd bbftp-client-3.2.0/bbftpc/
$ ./configure --prefix=/usr/local
$ make -j3
$ sudo make install

Using bbftp is more complicated than the usual ftp client because it has its own syntax:

To send data to a server:

$ bbftp -s -e 'put file.154M  /gpfs/mangalam/big.file' -u mangalam -p 10 -V tg-login1.sdsc.teragrid.org
Password:
>> COMMAND : put file.154M /gpfs/mangalam/big.file
<< OK
160923648 bytes send in 7.32 secs (2.15e+04 Kbytes/sec or 168 Mbits/s)


the arguments mean:
-s  use ssh encryption
-e  'local command'
-E  'remote command' (not used above, but often used to cd on the remote system)
-u  'user_login'
-p  # use # parallel TCP streams
-V  be verbose

The data was sent at 21MB/s to SDSC thru 10 parallel TCP streams (but well below the peak bandwidth of about 120MB/s on a Gb network)

To get data from a server:

$ bbftp -s -e 'get /gpfs/mangalam/big.file from.sdsc' -u mangalam -p 10 -V tg-login1.sdsc.teragrid.org
Password:
>> COMMAND : get /gpfs/mangalam/big.file from.sdsc
<< OK
160923648 bytes got in 3.46 secs (4.54e+04 Kbytes/sec or 354 Mbits/s)

I was able to get the data at 45MB/s, about half of the theoretical maximum.

As a comparison, because the remote reciever is running an old (2.4) kernel which does not handle dynamic TCP window scaling, scp is only able to manage 2.2MB/s to this server:

$ scp  file.154M mangalam@tg-login1.sdsc.teragrid.org:/gpfs/mangalam/junk
Password:
file.154M                                  100%  153MB   2.2MB/s   01:10

9.3. lftp

lftp is a simple but capable FTP replacement that can use multiple TCP streams like bbcp, resulting in better performance than vanilla FTP or other single stream mechanisms like scp. One restriction is that the multi-stream approach only works in get mode, so if you’re trying to upload data (put mode), it works only as well as a single stream approach. It will also do mirroring so if you’re trying to mirror an entire website or file tree, it can do that, much like the wget -m -p <website_head>.

In my testing over a 1Gb connection, lftp was about 5%-10% slower than bbcp on getting data (same number of streams with cache cleared each time) and noticeably slower on sending data. Both bbcp and lftp appear to be transferring to local cache and on transferring files smaller than the free RAM, will spend several seconds after the transfer is supposedly complete in syncing the data to disk.

#Getting a file over 4 streams
lftp -e 'pget -n 4 sftp://someone@host:/path/to/file'

9.4. Fast Data Transfer (fdt)

Fast Data Transfer is an application for moving data quickly writ in Java so it can theoretically run on any platform. The performance results on the web page are very impressive, but in local tests, it was slower than bbcp and the startup time for Java (as well as its failure to work in scp mode (couldn’t find the fdt.jar, even tho it was in the CLASSPATH, required you to explicitly start the receiving FDT server (not hard - see below, but another step)) argue somewhat against it.

Starting the server is easy; it starts by default in server mode:

java -jar ./fdt.jar
# usual Java verbosity omitted

The client uses the same jarfile but a different syntax:

java -jar ./fdt.jar -ss 1M -P 10 -c remotehost.domain.uci.edu  ~/file.633M  -d /userdata/hjm

# where
# -ss 1M  ..... sets the TCP SO_SND_BUFFER size to 1 MB
# -P 10 ....... uses 10 parallel streams (default is 1)
# -c host ..... defines the remote host
# -d dir ...... sets the remote dir

The speed is certainly impressive. Much more than scp:

# scp done over the same net, about the same time

$ scp file.4.2G  remotehost.domain.uci.edu:~
hjm@remotehost's password: ***********
 file.4.2G                   100% 4271MB  25.3MB/s   02:49
                                          ^^^^^^^^
# using the default 1 stream:
$ java -jar fdt.jar -c remotehost.domain.uci.edu ../file.4.2G -d /userdata/hjm/
(transferred in 86s for *53MB/s*)

# with 10 streams and a larger buffer:
$ java -jar fdt.jar -P 10 -bs 1M -c remotehost.domain.uci.edu ../file.4.2G -d /userdata/hjm/
(transferred in 68s for *66MB/s* with 10 streams)

But fdt is slower than bbcp. The following test was done at about the same time between the same hosts:

bbcp -P 10 -w 2M -s 10 file.4.2G hjm@remotehost.domain.uci.edu:/userdata/hjm/
bbcp: Creating /userdata/hjm/file.4.2G
bbcp: At 081210 12:48:18 copy 20% complete; 89998.2 KB/s
bbcp: At 081210 12:48:28 copy 41% complete; 89910.4 KB/s
bbcp: At 081210 12:48:38 copy 61% complete; 89802.5 KB/s
bbcp: At 081210 12:48:48 copy 80% complete; 88499.3 KB/s
bbcp: At 081210 12:48:58 copy 96% complete; 84571.9 KB/s

9.5. Globus Online, Gobus Connect & Globus Connect MultiUser

These are fairly new (mid-2011) approaches that claim to provide easy access to GridFTP-like speeds, reliable transfers, and No IT required, using the Globus Tookit infrastructure, which is an enormous and enormously complex set of APIs for authenticating users and distributing data aorund the world. Globus Connect and its more ambitious Globus Connect MultiUser sibling are attempts to make using the Globus mechanicals less horrific for users. In this it largely succeeds from the users' POV - those who are already part of a Globus/Grid node and who have specific requirements to transfer TBs of data on a regular basis and who have the endpoints set up for them. Otherwise it’s somewhat clunky since you have to explicitly set up endpoints beforehand and too complicated to set up unless you’re Linux-enhanced (ie. you do ssh public key exchange, and globus MyProxy configs in your sleep).

The latest iteration of this technology is a web interface that once set up allows you to initiate and monitor large data transfers between defined endpoints fairly easily. The process to install the software to your own system and add yourself to the system is fairly straightforward. Just follow the instructions for the different platforms.

The problem with this approach is that it’s a large amount of work for a small amount of advantage relative to bbcp. However, the Multiuser version allows all the users of a server or cluster to take advantage of this protocol with no additional effort, a better tradeoff between effort expended and advantages conferred.

The instructions for installing the Multiuser version are a little more elaborate. Herewith, their own devilish details for a sysadmin setting up the Globus Connect MultiUser (Linux-only so far).

The process for setting it up on your endpoint is described on the site, but it may be worthwhile describing the general overview which can be confusing. UCLA’s IDRE also has a setup description (Thanks, Prakashan.)

Snarky Point of Contention: The documentation overuses the word seamlessly which all computer users realize is a contraction for seamlessly if nothing goes wrong and your setup is exactly like mine and monkeys fly out my butt. YMMV.

Using the Globus Connect system requires you to:

  • Register a username with Globus Online. This ID will be used to identify you to the Globus system. It is not related to your username on any hosts you may want to use as endpoints.

  • Register connection endpoints that you will want to send to or receive from. You must of course have a user account on these machines to use them and it helps if you have admin privs on these machines to install the necessary software (see next point). You will have to name your endpoints a combination of your Globus ID and a machine name. It doesn’t have to be the hostname of the client, but that will help to identify it later. You will also have to generate a machine ID string that looks like d9g89270-74ab-4382-beb1-d2882628952a. This ID will have to be used to start the globusconnect process on the client before you can start a transfer. See the Linux section (for example) of the main page.

  • Install the necessary software on the endpoint (client) machines. There are different packages for different clients. You (or your sysadmin) must install the repository info, and then the software itself. This is semi-automated via platform-specific apps see the Globus Connect Downloads in the link above. There are 60-plus packages that make up a Globus client; thank god it’s done automatically. If you want to do it manually, the process for doing so is described here, but I’d recommend trying the automatic installation first.

  • Start the Globus Connect process on the client via the downloaded client software. On Linux, it is provided in the globusconnect-latest.tgz, which unpacks to provide both 32bit and 64bit clients, as well as the top-level bash script globusconnect to start the relevant version. Running globusconnect-X.x/globusconnect will enable the clients to see each other and now, finally you can…

  • Start a Data Transfer by opening the previous link and identifying the nodes you want to transfer between. After that, it’s as easy as using graphical FTP client. Populate the panes with the directories you want to transfer and click on the directional arrow to initiate the transfer.

I’ve gotten 40-50MB/s between UCI and the Broad Institute depending on time of day, system load, and phase of moon.

9.6. GridFTP

If you and your colleagues have to transfer data in the range of multiple GBs and you have to do it regularly, it’s probably worth setting up a GridFTP site. GridFTP is also based on the Globus toolkit and as such shares many of its advantages and frustrations. However, most of the frustrations are on the admin side, so once it’s set up, it becomes fairly easy for users. Because it allows multipoint, multi-stream TCP connections, it can transfer data at multiple GB/s. However, it’s beyond the scope of this simple doc to describe its setup and use, so if this sounds useful, bother your local network guru/sysadmin.

9.7. netcat

netcat (aka nc) is installed by default on most Linux and MacOSX systems. It provides a way of opening TCP or UDP network connections between nodes, acting as an open pipe thru which you can send any data as fast as the connection will allow, imposing no additional protocol load on the transfer. Because of its widespread availability and its speed, it can be used to transmit data between 2 points relatively quickly, especially if the data doesn’t need to be encrypted or compressed (or if it already is).

However, to use netcat, you have to have login privs on both ends of the connection and you need to explicitly set up a listener that waits for a connection request on a specific port from the receiver. This is less convenient to do than simply initiating an scp or rsync connection from one end, but may be worth the effort if the size of the data transfer is very large. To monitor the transfer, you also have to use something like pv (pipeviewer); netcat itself is quite laconic.

How it works: On one end (the sending end, in this case), you need to set up a listening port:

[send_host]: $ pv -pet honkin.big.file | nc -q 1 -l 1234 <enter>

This sends the honkin.big.file thru pv -pet which will display progress, ETA, and time taken. The command will hang, listening (-l) for a connection from the other end. The -q 1 option tells the sender to wait 1s after getting the EOF and then quit.

On the receiving end, you connect to the nc listener

[receive_host] $ nc sender.net.uci.edu 1234 |pv -b > honkin.big.file <enter>

(note: no -p to indicate port on the receiving side). The -b option to pv shows only bytes received.

Once the receive_host command is initiated, the transfer starts, as can be seen by the pv output on the sending side and the bytecount on the receiving side. When it finishes, both sides terminate the connection 1s after getting the EOF.

This arrangement is slightly arcane, but supports the unix tools philosophy which allows you to chain various small tools together to perform a task. While the above example shows the case for a single large file, it can also be modified only slightly to do recursive transfers, using tar, shown here recursively copying the local sgedirectory to the remote host.

9.7.1. tar and netcat

The combination of these 2 crusty relics from the stone age of Unix are remarkably effective for moving data if you don’t need encryption. Since they impose very little protocol overhead to the data, the transfer can run at close to wire speed for large files. Compression can be added with the tar options of -z (gzip) or -j (bzip2).

The setup is not as trivial as with rsync, scp, or bbcp, since it requires commands to be issued at both ends of the connection, but for large transfers, the speed payoff is non-trivial. For example, using a single rsync on a 10Gb private connection, we were getting only about 30MB/s, mostly because of many tiny files. Using tar/netcat, the average speed went up to about 100MB/s. And using multiple tar/netcat combinations to move specific subdirs, we were able to get an average of 500GB/hr, still not great (~14% of theoretical max), but about 5x better than rsync alone.

Note that you can set up the listener on either side. In this example, I’ve set the listener to the receiving side.

In the following example, the receiver is 10.255.78.10; the sender is 10.255.78.2.

First start the listener waiting on port 12378, which will accept the byte-stream and untar it, decompressing as it comes in.

[receive_host] $ nc -l  -p  port_#  | tar -xzf -
#eg
               $ nc -l  -p  12378   | tar -xzf -

# when the command is issued, the prompt hangs, waiting for the sender to start

Then kick off the transfer on the sending side.

[send_host]: $ tar -czvf - dir_target   | nc -s  sender       receiver      port_#
# eg
             $ tar -czvf - fmri_classic | nc -s  10.255.78.2  10.255.78.10  12378

In this case, I’ve added the verbose flag (-v) to the tar command on the sender side so using pv is redundant. It also uses tar’s built-in compression flag (-z) to compress as it transmits. Depending on the bandwidth available to you and the CPUs of the hosts, this may actually slow transmission. As noted above, it’s most effective on bandwidth-limited channels.

You could also bundle the 2 together in a script, using ssh to execute the remote command.

*Oh look! , I have.. *

9.7.2. tnc

From the contraction of tar n netcat. This is not a novel idea (see Varun Patil’s script, but this wrapping tries to be more flexible and comprehensive than most other implementations. Let me know how to make it better.

tnc is a useful tool to move a lot of data over a network if:

  • the data does not need to be encrypted.

  • you have ssh & shell access on both sides of the connection

  • you set up passwordless shared ssh keys (it will try to set them up if you don’t have them)

  • you have a lot of files and or deep dir trees that need to be moved.

  • the data is not partially on the other side of the connection. ie this is a first time data movement. (If the data just needs to be updated, see rsync).

  • you are moving data for backup (tnc will tar and transfer data (optionally compressing it) in one operation, leaving it as such on the other end, unless you request unpacking.

If your use case meets these criteria, it works quite well on most distributions of Linux although there are some versions of some utilities that will cause hiccups. In particular, it will work about 2-10x as fast as scp, depending on network bandwidth, # of files, etc.

The following is the --help output from tnc, until I write up something better. Get tnc herechmod +x it and off you go. Note the warnings.

tnc is a small Perl utility to simplify the use of netcat to transfer
files  between hosts.  It automates a number of the setup commands
that make this very efficient protocol so awful to set up.  The
commandline is meant to ape 'scp', but it can be  much faster than
scp, depending on the connection and types of files.  Typically it's
2x - 6x faster.

tnc WILL attempt to set up 2048bit RSA ssh keys for you if they don't
exist on the  the local host which will require entering the remote
login password 2x to set up  the keys.  If you want to avoid this,
use the  '--askpass' option, which will not set up ssh-keys, but WILL
use established  ssh keys if they already exist. tnc does not work
reliably without shared ssh  keys.

Unless you use the '--unpack' option, files to be transferred (even
1) are concatenated to a tar file which is sent and then left as a
tar file on the other end of  the connection.  ie the endpoint for
both push and pull data transfers will  be *tar files* (compressed
with 'xz' if you use the '--compress' option.)

It can be used to both push and pull data from a remote connection,
using only the  static IP end.

WARNINGS:
- tnc DOES NOT ENCRYPT, nor does it compress data without being
asked,  so it's only meant for NON-SENSITIVE data that can be sent in
the clear.  Remote  connections are initiated via ssh, so the actual
connection setup is encrypted. Because it uses netcat's open ports it
also performs SHA  checksums on the data exchanged to make sure that
the data stream has not  been poisoned or corrupted.

- There are at least 3 versions of netcat in use: nc6, nc.openbsd,
and nc.traditional.  tnc uses the OpenBSD version, the default on
recent  versions of Ubuntu, CentOS, MacOSX and others, but not on
Debian.  On Debian,  you'll have to install it explicitly and symlink
it to '/etc/alternatives/nc'.

- tnc uses the piped subprocess function of bash (tee >(shasum)) to
do  inline SHA hash checking, so if your system lacks a recent bash,
it may fail. Note that identical SHA hashes only verify that the same
bytes exist on both ends.  Premature failure can truncate data, so
the SUMMARY now includes the  # of bytes that were sent over the
network as well.

It supports the following options
--askpass(off) .. use passwords, not ssh keys; don't try to set up
                   ssh keys. If functional ssh keys DO exist, they
                   will be used.
--compress(off) . invokes the '-J' option to tar which uses the 'xz'
                   compressor.
                  Helps significantly on low bandwidth connections
                   (wifi), but generally doesn't help on GbE
                   networks.  Depends on the compressibility of the
                   data on 100Mb connections, but generally not.
                   Endpoint names will be suffixed with '.xz' if this
                   option is used, so don't pre-suffix the name. If
                   you do, 'name.xz' will be changed to 'name.xz.xz'
--port(12345) ... sets the local and remote PORTs to this #.
--help .......... emits this text
--unpack(off) ... unpacks the transferred files at their endpoint.
                   You specify the dir under which it's supposed to
                   be unpacked as the remote target. If there is a
                   writable dir of that name, the stream will be NOT
                   be unpacked into it unless the '--unpack' option
                   is used.
--quiet(off) .... tries to be very silent
--debug(off) .... very verbose about what's going on.

Simple usage:

Pushing data TO a remote server:
 --------------------------------
(sources can be a mixture of files and dirs)
tnc dir1  file1  dir2 file2   user@remotehost:/path/to/remote.tar
 or
tnc --unpack file1 file2 file3  user@remotehost:/path/to/unpack/dir
 or
tnc --compress geo*.dat  user@remotehost:/path/to/remote.tar
(which will leave the remote archive as '/path/to/remote.tar.xz')

Pulling data FROM a remote server:
 ----------------------------------
                     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
tnc  user@remotehost:'~/dir1/  /path/to/file*  ~/dir2/' \
 /local/path/to/tarball

Note the single quotes surrounding the remote file spec.  You can
use this to specify discontinuous files and dirs to pull (and you
must use single quotes to specify the data to pull).

The dynamic display during a transfer is from 'pv' and shows:

 MB    time     instant
sent   elapsed  bandwidth
 ------------------------------
 116MB 0:00:10 [11.3MB/s]

NB: tnc requires a static (or at least identifiable) IP on only
one end of the connection.  If you are at home, connecting thru
a wireless router, and obtaining your IP address dynamically,
tnc should work as well, since the work is initiated from the
(static) IP #.

9.8. Aspera ascp

Updated Jan 06, 2015 After a licensing problem with the existing Broad Institute Globus system stopped us from using the browser-based Globus system to finish the transfer, we still had to transfer about 30TB of data from the Broad Inst to UCI.

We tried using the above mentioned bbcp from there to here, but since I controlled only one side of the connection, I couldn’t tune the transfer to provide more than about 1-3MB/s. The Broad people mentioned that the data was available via Aspera, and I grouchily agreed to try it again (see first review below) via the Linux commandline client ascp that had proved so trying previously.

This time, tho (I had full docs for the client and some experience) it started up and immediately provided a consistent 30-40MB/s over the same network path that I was using with bbcp. A tremendous, startlingimprovement over the default bbcp parameters, especially since it was the same network path over which bbcp was providing about 1/10th the bandwidth. Whether it’s due to autotuning of window size and streams, or some other internal magic, I can’t say. And I also can’t say whether (with control of both endpoints) I could have coerced bbcp to attain a similar bandwidth since I have seen bbcp do longhaul bandwidths of this magnitude. But seeing ascp immediately provide this magnitude of bandwidth, with no user-side tuning is very impressive indeed.

I’ll leave the previous notation in place to show how much ascp has improved.

first post, 2013 Aspera is a commercial company (recently bought by IBM) whose aim is to monetize large scale data transfer across networks. I have no experience with their Windows and Mac clients which may be very good, but their default Linux client, starting about 3years ago is not. Or to be more specific: It can work well, but it may well require a lot of tweaking and adjustment to work well. God knows, I had to. However, I was eventually able to transfer 15TB across the UC with the Linux client, which after the aforementioned tweaking worked ok.

Compared to the above-mentioned bbcp, or the consumer-skinned and smoothed Globus Online, the Aspera Linux client is still crude, difficult to use, poorly documented, and fails repeatedly. I posted some comments to the blog linked above, but here are my suggestions about its use when you can’t use an alternative approach.

When I say that the Linux client is poorly documented, it doesn’t mean that there is no documentation. It means that it is hard to find (no helpful links returned by the search service on their support page), the Documentation (click ascp Usage) is no better than most free software, and that the Examples (click ascp General Examples) are fairly sparse. Additionally, the customer support databases are behind an firewall and are therefore beyond the reach of google. This is certainly Aspera’s right, but it means self-help is essentially impossible.

I will say that opening a support ticket brought rapid (<1 hour from filing the ticket to a human response), and knowledgeable assistance. (Thanks, Bill!). ascp will fill your syslog with a ton of event logs which, if sent to Aspera, will probably allow them to debug the problem.

A non-Aspera employee advised me to use ascp’s parallel copy to speed up the transfer but that was apparently a mistake since that option (tho it appears to work) is usually only useful when the client node is CPU-bound (mine wasn’t) and neither increases overall copy speeds, nor allows you to restart failed transfers (and one of the parallel copies would fail every few minutes, possibly due to the timing issue noted below.) Since I was not able to get the parallel approach to work reliably for more than 30 min at a time, the best approach was to start a single serial process with the Linux ascp client after carefully reading the above blog post and correcting the command as per your needs.

Also, ascp can be very sensitive to timing issues so a modification may have to be made to the configuration file:

cat /root/.aspera/connect/etc/aspera.conf     # the corrected version

<?xml version='1.0' encoding='UTF-8'?>
<CONF version="2">
  <default>
    <transfer>
      <protocol_options>
        <rtt_autocorrect>true</rtt_autocorrect>
      </protocol_options>
    </transfer>
    <file_system>
      <storage_rc>
        <adaptive>
          true
        </adaptive>
      </storage_rc>
    </file_system>
  </default>
</CONF>

So, once that issue was settled, and I stopped trying to parallel copy, the following command worked reliably, maintaining the copy for at least a couple of days until the transfer finished. The transfer was not magically faster than what I would have seen via bbcp, and was considerably slower than a GridFTP transfer, but it did work.

/path/to/ascp  -QT -l 500M -k1 user@remote.source.org:/remote/path /local/path

where:

  • -QT Q enables fair transfer policy, T DISables encryption.

  • -l 500m sets the target transfer rate at 500Mbits/s. This depends on what your connection to the Internet allows and especially what other operations are happening with the interface. ascp seems to be very sensitive to this option and may well crash if it is exceeded.

  • -k1 enables resuming partially transferred files, where the options are (From docs:)

    • 0: Always retransfer the entire file.

    • 1: Check file attributes and resume if the current and original attributes match. (This is probably good enough for most ppl and is MUCH faster than -k2)

    • 2: Check file attributes and do a sparse file checksum; resume if the current and original attributes/checksums match.

    • 3: Check file attributes and do a full file checksum; resume if the current and original attributes/checksums match.

Again, read the docs carefully since the error messages are unhelpful.

protected-mode no port 6379 tcp-backlog 511 timeout 0 tcp-keepalive 300 daemonize no pidfile /var/run/redis_6379.pid loglevel notice logfile "" databases 16 always-show-logo no set-proc-title yes proc-title-template "{title} {listen-addr} {server-mode}" stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename dump.rdb rdb-del-sync-files no dir ./ replica-serve-stale-data yes replica-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-diskless-load disabled repl-disable-tcp-nodelay no replica-priority 100 acllog-max-len 128 requirepass Guyuan@2021 # New users are initialized with restrictive permissions by default, via the # equivalent of this ACL rule 'off resetkeys -@all'. Starting with Redis 6.2, it # is possible to manage access to Pub/Sub channels with ACL rules as well. The # default Pub/Sub channels permission if new users is controlled by the # acl-pubsub-default configuration directive, which accepts one of these values: # # allchannels: grants access to all Pub/Sub channels # resetchannels: revokes access to all Pub/Sub channels # # To ensure backward compatibility while upgrading Redis 6.0, acl-pubsub-default # defaults to the 'allchannels' permission. # # Future compatibility note: it is very likely that in a future version of Redis # the directive's default of 'allchannels' will be changed to 'resetchannels' in # order to provide better out-of-the-box Pub/Sub security. Therefore, it is # recommended that you explicitly define Pub/Sub permissions for all users # rather then rely on implicit default values. Once you've set explicit # Pub/Sub for all existing users, you should uncomment the following line. # # acl-pubsub-default resetchannels # Command renaming (DEPRECATED). # # ------------------------------------------------------------------------ # WARNING: avoid using this option if possible. Instead use ACLs to remove # commands from the default user, and put them only in some admin user you # create for administrative purposes. # ------------------------------------------------------------------------ # # It is possible to change the name of dangerous commands in a shared # environment. For instance the CONFIG command may be renamed into something # hard to guess so that it will still be available for internal-use tools # but not available for general clients. # # Example: # # rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52 # # It is also possible to completely kill a command by renaming it into # an empty string: # # rename-command CONFIG "" # # Please note that changing the name of commands that are logged into the # AOF file or transmitted to replicas may cause problems. ################################### CLIENTS #################################### # Set the max number of connected clients at the same time. By default # this limit is set to 10000 clients, however if the Redis server is not # able to configure the process file limit to allow for the specified limit # the max number of allowed clients is set to the current file limit # minus 32 (as Redis reserves a few file descriptors for internal uses). # # Once the limit is reached Redis will close all the new connections sending # an error 'max number of clients reached'. # # IMPORTANT: When Redis Cluster is used, the max number of connections is also # shared with the cluster bus: every node in the cluster will use two # connections, one incoming and another outgoing. It is important to size the # limit accordingly in case of very large clusters. # # maxclients 10000 ############################## MEMORY MANAGEMENT ################################ # Set a memory usage limit to the specified amount of bytes. # When the memory limit is reached Redis will try to remove keys # according to the eviction policy selected (see maxmemory-policy). # # If Redis can't remove keys according to the policy, or if the policy is # set to 'noeviction', Redis will start to reply with errors to commands # that would use more memory, like SET, LPUSH, and so on, and will continue # to reply to read-only commands like GET. # # This option is usually useful when using Redis as an LRU or LFU cache, or to # set a hard memory limit for an instance (using the 'noeviction' policy). # # WARNING: If you have replicas attached to an instance with maxmemory on, # the size of the output buffers needed to feed the replicas are subtracted # from the used memory count, so that network problems / resyncs will # not trigger a loop where keys are evicted, and in turn the output # buffer of replicas is full with DELs of keys evicted triggering the deletion # of more keys, and so forth until the database is completely emptied. # # In short... if you have replicas attached it is suggested that you set a lower # limit for maxmemory so that there is some free RAM on the system for replica # output buffers (but this is not needed if the policy is 'noeviction'). # # maxmemory <bytes> # MAXMEMORY POLICY: how Redis will select what to remove when maxmemory # is reached. You can select one from the following behaviors: # # volatile-lru -> Evict using approximated LRU, only keys with an expire set. # allkeys-lru -> Evict any key using approximated LRU. # volatile-lfu -> Evict using approximated LFU, only keys with an expire set. # allkeys-lfu -> Evict any key using approximated LFU. # volatile-random -> Remove a random key having an expire set. # allkeys-random -> Remove a random key, any key. # volatile-ttl -> Remove the key with the nearest expire time (minor TTL) # noeviction -> Don't evict anything, just return an error on write operations. # # LRU means Least Recently Used # LFU means Least Frequently Used # # Both LRU, LFU and volatile-ttl are implemented using approximated # randomized algorithms. # # Note: with any of the above policies, when there are no suitable keys for # eviction, Redis will return an error on write operations that require # more memory. These are usually commands that create new keys, add data or # modify existing keys. A few examples are: SET, INCR, HSET, LPUSH, SUNIONSTORE, # SORT (due to the STORE argument), and EXEC (if the transaction includes any # command that requires memory). # # The default is: # # maxmemory-policy noeviction # LRU, LFU and minimal TTL algorithms are not precise algorithms but approximated # algorithms (in order to save memory), so you can tune it for speed or # accuracy. By default Redis will check five keys and pick the one that was # used least recently, you can change the sample size using the following # configuration directive. # # The default of 5 produces good enough results. 10 Approximates very closely # true LRU but costs more CPU. 3 is faster but not very accurate. # # maxmemory-samples 5 # Eviction processing is designed to function well with the default setting. # If there is an unusually large amount of write traffic, this value may need to # be increased. Decreasing this value may reduce latency at the risk of # eviction processing effectiveness # 0 = minimum latency, 10 = default, 100 = process without regard to latency # # maxmemory-eviction-tenacity 10 # Starting from Redis 5, by default a replica will ignore its maxmemory setting # (unless it is promoted to master after a failover or manually). It means # that the eviction of keys will be just handled by the master, sending the # DEL commands to the replica as keys evict in the master side. # # This behavior ensures that masters and replicas stay consistent, and is usually # what you want, however if your replica is writable, or you want the replica # to have a different memory setting, and you are sure all the writes performed # to the replica are idempotent, then you may change this default (but be sure # to understand what you are doing). # # Note that since the replica by default does not evict, it may end using more # memory than the one set via maxmemory (there are certain buffers that may # be larger on the replica, or data structures may sometimes take more memory # and so forth). So make sure you monitor your replicas and make sure they # have enough memory to never hit a real out-of-memory condition before the # master hits the configured maxmemory setting. # # replica-ignore-maxmemory yes # Redis reclaims expired keys in two ways: upon access when those keys are # found to be expired, and also in background, in what is called the # "active expire key". The key space is slowly and interactively scanned # looking for expired keys to reclaim, so that it is possible to free memory # of keys that are expired and will never be accessed again in a short time. # # The default effort of the expire cycle will try to avoid having more than # ten percent of expired keys still in memory, and will try to avoid consuming # more than 25% of total memory and to add latency to the system. However # it is possible to increase the expire "effort" that is normally set to # "1", to a greater value, up to the value "10". At its maximum value the # system will use more CPU, longer cycles (and technically may introduce # more latency), and will tolerate less already expired keys still present # in the system. It's a tradeoff between memory, CPU and latency. # # active-expire-effort 1 ############################# LAZY FREEING #################################### # Redis has two primitives to delete keys. One is called DEL and is a blocking # deletion of the object. It means that the server stops processing new commands # in order to reclaim all the memory associated with an object in a synchronous # way. If the key deleted is associated with a small object, the time needed # in order to execute the DEL command is very small and comparable to most other # O(1) or O(log_N) commands in Redis. However if the key is associated with an # aggregated value containing millions of elements, the server can block for # a long time (even seconds) in order to complete the operation. # # For the above reasons Redis also offers non blocking deletion primitives # such as UNLINK (non blocking DEL) and the ASYNC option of FLUSHALL and # FLUSHDB commands, in order to reclaim memory in background. Those commands # are executed in constant time. Another thread will incrementally free the # object in the background as fast as possible. # # DEL, UNLINK and ASYNC option of FLUSHALL and FLUSHDB are user-controlled. # It's up to the design of the application to understand when it is a good # idea to use one or the other. However the Redis server sometimes has to # delete keys or flush the whole database as a side effect of other operations. # Specifically Redis deletes objects independently of a user call in the # following scenarios: # # 1) On eviction, because of the maxmemory and maxmemory policy configurations, # in order to make room for new data, without going over the specified # memory limit. # 2) Because of expire: when a key with an associated time to live (see the # EXPIRE command) must be deleted from memory. # 3) Because of a side effect of a command that stores data on a key that may # already exist. For example the RENAME command may delete the old key # content when it is replaced with another one. Similarly SUNIONSTORE # or SORT with STORE option may delete existing keys. The SET command # itself removes any old content of the specified key in order to replace # it with the specified string. # 4) During replication, when a replica performs a full resynchronization with # its master, the content of the whole database is removed in order to # load the RDB file just transferred. # # In all the above cases the default is to delete objects in a blocking way, # like if DEL was called. However you can configure each case specifically # in order to instead release memory in a non-blocking way like if UNLINK # was called, using the following configuration directives. lazyfree-lazy-eviction no lazyfree-lazy-expire no lazyfree-lazy-server-del no replica-lazy-flush no # It is also possible, for the case when to replace the user code DEL calls # with UNLINK calls is not easy, to modify the default behavior of the DEL # command to act exactly like UNLINK, using the following configuration # directive: lazyfree-lazy-user-del no # FLUSHDB, FLUSHALL, and SCRIPT FLUSH support both asynchronous and synchronous # deletion, which can be controlled by passing the [SYNC|ASYNC] flags into the # commands. When neither flag is passed, this directive will be used to determine # if the data should be deleted asynchronously. lazyfree-lazy-user-flush no ################################ THREADED I/O ################################# # Redis is mostly single threaded, however there are certain threaded # operations such as UNLINK, slow I/O accesses and other things that are # performed on side threads. # # Now it is also possible to handle Redis clients socket reads and writes # in different I/O threads. Since especially writing is so slow, normally # Redis users use pipelining in order to speed up the Redis performances per # core, and spawn multiple instances in order to scale more. Using I/O # threads it is possible to easily speedup two times Redis without resorting # to pipelining nor sharding of the instance. # # By default threading is disabled, we suggest enabling it only in machines # that have at least 4 or more cores, leaving at least one spare core. # Using more than 8 threads is unlikely to help much. We also recommend using # threaded I/O only if you actually have performance problems, with Redis # instances being able to use a quite big percentage of CPU time, otherwise # there is no point in using this feature. # # So for instance if you have a four cores boxes, try to use 2 or 3 I/O # threads, if you have a 8 cores, try to use 6 threads. In order to # enable I/O threads use the following configuration directive: # # io-threads 4 # # Setting io-threads to 1 will just use the main thread as usual. # When I/O threads are enabled, we only use threads for writes, that is # to thread the write(2) syscall and transfer the client buffers to the # socket. However it is also possible to enable threading of reads and # protocol parsing using the following configuration directive, by setting # it to yes: # # io-threads-do-reads no # # Usually threading reads doesn't help much. # # NOTE 1: This configuration directive cannot be changed at runtime via # CONFIG SET. Aso this feature currently does not work when SSL is # enabled. # # NOTE 2: If you want to test the Redis speedup using redis-benchmark, make # sure you also run the benchmark itself in threaded mode, using the # --threads option to match the number of Redis threads, otherwise you'll not # be able to notice the improvements. ############################ KERNEL OOM CONTROL ############################## # On Linux, it is possible to hint the kernel OOM killer on what processes # should be killed first when out of memory. # # Enabling this feature makes Redis actively control the oom_score_adj value # for all its processes, depending on their role. The default scores will # attempt to have background child processes killed before all others, and # replicas killed before masters. # # Redis supports three options: # # no: Don't make changes to oom-score-adj (default). # yes: Alias to "relative" see below. # absolute: Values in oom-score-adj-values are written as is to the kernel. # relative: Values are used relative to the initial value of oom_score_adj when # the server starts and are then clamped to a range of -1000 to 1000. # Because typically the initial value is 0, they will often match the # absolute values. oom-score-adj no # When oom-score-adj is used, this directive controls the specific values used # for master, replica and background child processes. Values range -2000 to # 2000 (higher means more likely to be killed). # # Unprivileged processes (not root, and without CAP_SYS_RESOURCE capabilities) # can freely increase their value, but not decrease it below its initial # settings. This means that setting oom-score-adj to "relative" and setting the # oom-score-adj-values to positive values will always succeed. oom-score-adj-values 0 200 800 #################### KERNEL transparent hugepage CONTROL ###################### # Usually the kernel Transparent Huge Pages control is set to "madvise" or # or "never" by default (/sys/kernel/mm/transparent_hugepage/enabled), in which # case this config has no effect. On systems in which it is set to "always", # redis will attempt to disable it specifically for the redis process in order # to avoid latency problems specifically with fork(2) and CoW. # If for some reason you prefer to keep it enabled, you can set this config to # "no" and the kernel global to "always". disable-thp yes ############################## APPEND ONLY MODE ############################### # By default Redis asynchronously dumps the dataset on disk. This mode is # good enough in many applications, but an issue with the Redis process or # a power outage may result into a few minutes of writes lost (depending on # the configured save points). # # The Append Only File is an alternative persistence mode that provides # much better durability. For instance using the default data fsync policy # (see later in the config file) Redis can lose just one second of writes in a # dramatic event like a server power outage, or a single write if something # wrong with the Redis process itself happens, but the operating system is # still running correctly. # # AOF and RDB persistence can be enabled at the same time without problems. # If the AOF is enabled on startup Redis will load the AOF, that is the file # with the better durability guarantees. # # Please check https://redis.io/topics/persistence for more information. appendonly yes # The name of the append only file (default: "appendonly.aof") appendfilename "appendonly.aof" # The fsync() call tells the Operating System to actually write data on disk # instead of waiting for more data in the output buffer. Some OS will really flush # data on disk, some other OS will just try to do it ASAP. # # Redis supports three different modes: # # no: don't fsync, just let the OS flush the data when it wants. Faster. # always: fsync after every write to the append only log. Slow, Safest. # everysec: fsync only one time every second. Compromise. # # The default is "everysec", as that's usually the right compromise between # speed and data safety. It's up to you to understand if you can relax this to # "no" that will let the operating system flush the output buffer when # it wants, for better performances (but if you can live with the idea of # some data loss consider the default persistence mode that's snapshotting), # or on the contrary, use "always" that's very slow but a bit safer than # everysec. # # More details please check the following article: # http://antirez.com/post/redis-persistence-demystified.html # # If unsure, use "everysec". # appendfsync always appendfsync everysec # appendfsync no # When the AOF fsync policy is set to always or everysec, and a background # saving process (a background save or AOF log background rewriting) is # performing a lot of I/O against the disk, in some Linux configurations # Redis may block too long on the fsync() call. Note that there is no fix for # this currently, as even performing fsync in a different thread will block # our synchronous write(2) call. # # In order to mitigate this problem it's possible to use the following option # that will prevent fsync() from being called in the main process while a # BGSAVE or BGREWRITEAOF is in progress. # # This means that while another child is saving, the durability of Redis is # the same as "appendfsync none". In practical terms, this means that it is # possible to lose up to 30 seconds of log in the worst scenario (with the # default Linux settings). # # If you have latency problems turn this to "yes". Otherwise leave it as # "no" that is the safest pick from the point of view of durability. no-appendfsync-on-rewrite no # Automatic rewrite of the append only file. # Redis is able to automatically rewrite the log file implicitly calling # BGREWRITEAOF when the AOF log size grows by the specified percentage. # # This is how it works: Redis remembers the size of the AOF file after the # latest rewrite (if no rewrite has happened since the restart, the size of # the AOF at startup is used). # # This base size is compared to the current size. If the current size is # bigger than the specified percentage, the rewrite is triggered. Also # you need to specify a minimal size for the AOF file to be rewritten, this # is useful to avoid rewriting the AOF file even if the percentage increase # is reached but it is still pretty small. # # Specify a percentage of zero in order to disable the automatic AOF # rewrite feature. auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb # An AOF file may be found to be truncated at the end during the Redis # startup process, when the AOF data gets loaded back into memory. # This may happen when the system where Redis is running # crashes, especially when an ext4 filesystem is mounted without the # data=ordered option (however this can't happen when Redis itself # crashes or aborts but the operating system still works correctly). # # Redis can either exit with an error when this happens, or load as much # data as possible (the default now) and start if the AOF file is found # to be truncated at the end. The following option controls this behavior. # # If aof-load-truncated is set to yes, a truncated AOF file is loaded and # the Redis server starts emitting a log to inform the user of the event. # Otherwise if the option is set to no, the server aborts with an error # and refuses to start. When the option is set to no, the user requires # to fix the AOF file using the "redis-check-aof" utility before to restart # the server. # # Note that if the AOF file will be found to be corrupted in the middle # the server will still exit with an error. This option only applies when # Redis will try to read more data from the AOF file but not enough bytes # will be found. aof-load-truncated yes # When rewriting the AOF file, Redis is able to use an RDB preamble in the # AOF file for faster rewrites and recoveries. When this option is turned # on the rewritten AOF file is composed of two different stanzas: # # [RDB file][AOF tail] # # When loading, Redis recognizes that the AOF file starts with the "REDIS" # string and loads the prefixed RDB file, then continues loading the AOF # tail. aof-use-rdb-preamble yes ################################ LUA SCRIPTING ############################### # Max execution time of a Lua script in milliseconds. # # If the maximum execution time is reached Redis will log that a script is # still in execution after the maximum allowed time and will start to # reply to queries with an error. # # When a long running script exceeds the maximum execution time only the # SCRIPT KILL and SHUTDOWN NOSAVE commands are available. The first can be # used to stop a script that did not yet call any write commands. The second # is the only way to shut down the server in the case a write command was # already issued by the script but the user doesn't want to wait for the natural # termination of the script. # # Set it to 0 or a negative value for unlimited execution without warnings. lua-time-limit 5000 ################################ REDIS CLUSTER ############################### # Normal Redis instances can't be part of a Redis Cluster; only nodes that are # started as cluster nodes can. In order to start a Redis instance as a # cluster node enable the cluster support uncommenting the following: # # cluster-enabled yes # Every cluster node has a cluster configuration file. This file is not # intended to be edited by hand. It is created and updated by Redis nodes. # Every Redis Cluster node requires a different cluster configuration file. # Make sure that instances running in the same system do not have # overlapping cluster configuration file names. # # cluster-config-file nodes-6379.conf # Cluster node timeout is the amount of milliseconds a node must be unreachable # for it to be considered in failure state. # Most other internal time limits are a multiple of the node timeout. # # cluster-node-timeout 15000 # A replica of a failing master will avoid to start a failover if its data # looks too old. # # There is no simple way for a replica to actually have an exact measure of # its "data age", so the following two checks are performed: # # 1) If there are multiple replicas able to failover, they exchange messages # in order to try to give an advantage to the replica with the best # replication offset (more data from the master processed). # Replicas will try to get their rank by offset, and apply to the start # of the failover a delay proportional to their rank. # # 2) Every single replica computes the time of the last interaction with # its master. This can be the last ping or command received (if the master # is still in the "connected" state), or the time that elapsed since the # disconnection with the master (if the replication link is currently down). # If the last interaction is too old, the replica will not try to failover # at all. # # The point "2" can be tuned by user. Specifically a replica will not perform # the failover if, since the last interaction with the master, the time # elapsed is greater than: # # (node-timeout * cluster-replica-validity-factor) + repl-ping-replica-period # # So for example if node-timeout is 30 seconds, and the cluster-replica-validity-factor # is 10, and assuming a default repl-ping-replica-period of 10 seconds, the # replica will not try to failover if it was not able to talk with the master # for longer than 310 seconds. # # A large cluster-replica-validity-factor may allow replicas with too old data to failover # a master, while a too small value may prevent the cluster from being able to # elect a replica at all. # # For maximum availability, it is possible to set the cluster-replica-validity-factor # to a value of 0, which means, that replicas will always try to failover the # master regardless of the last time they interacted with the master. # (However they'll always try to apply a delay proportional to their # offset rank). # # Zero is the only value able to guarantee that when all the partitions heal # the cluster will always be able to continue. # # cluster-replica-validity-factor 10 # Cluster replicas are able to migrate to orphaned masters, that are masters # that are left without working replicas. This improves the cluster ability # to resist to failures as otherwise an orphaned master can't be failed over # in case of failure if it has no working replicas. # # Replicas migrate to orphaned masters only if there are still at least a # given number of other working replicas for their old master. This number # is the "migration barrier". A migration barrier of 1 means that a replica # will migrate only if there is at least 1 other working replica for its master # and so forth. It usually reflects the number of replicas you want for every # master in your cluster. # # Default is 1 (replicas migrate only if their masters remain with at least # one replica). To disable migration just set it to a very large value or # set cluster-allow-replica-migration to 'no'. # A value of 0 can be set but is useful only for debugging and dangerous # in production. # # cluster-migration-barrier 1 # Turning off this option allows to use less automatic cluster configuration. # It both disables migration to orphaned masters and migration from masters # that became empty. # # Default is 'yes' (allow automatic migrations). # # cluster-allow-replica-migration yes # By default Redis Cluster nodes stop accepting queries if they detect there # is at least a hash slot uncovered (no available node is serving it). # This way if the cluster is partially down (for example a range of hash slots # are no longer covered) all the cluster becomes, eventually, unavailable. # It automatically returns available as soon as all the slots are covered again. # # However sometimes you want the subset of the cluster which is working, # to continue to accept queries for the part of the key space that is still # covered. In order to do so, just set the cluster-require-full-coverage # option to no. # # cluster-require-full-coverage yes # This option, when set to yes, prevents replicas from trying to failover its # master during master failures. However the replica can still perform a # manual failover, if forced to do so. # # This is useful in different scenarios, especially in the case of multiple # data center operations, where we want one side to never be promoted if not # in the case of a total DC failure. # # cluster-replica-no-failover no # This option, when set to yes, allows nodes to serve read traffic while the # the cluster is in a down state, as long as it believes it owns the slots. # # This is useful for two cases. The first case is for when an application # doesn't require consistency of data during node failures or network partitions. # One example of this is a cache, where as long as the node has the data it # should be able to serve it. # # The second use case is for configurations that don't meet the recommended # three shards but want to enable cluster mode and scale later. A # master outage in a 1 or 2 shard configuration causes a read/write outage to the # entire cluster without this option set, with it set there is only a write outage. # Without a quorum of masters, slot ownership will not change automatically. # # cluster-allow-reads-when-down no # In order to setup your cluster make sure to read the documentation # available at https://redis.io web site. ########################## CLUSTER DOCKER/NAT support ######################## # In certain deployments, Redis Cluster nodes address discovery fails, because # addresses are NAT-ted or because ports are forwarded (the typical case is # Docker and other containers). # # In order to make Redis Cluster working in such environments, a static # configuration where each node knows its public address is needed. The # following four options are used for this scope, and are: # # * cluster-announce-ip # * cluster-announce-port # * cluster-announce-tls-port # * cluster-announce-bus-port # # Each instructs the node about its address, client ports (for connections # without and with TLS) and cluster message bus port. The information is then # published in the header of the bus packets so that other nodes will be able to # correctly map the address of the node publishing the information. # # If cluster-tls is set to yes and cluster-announce-tls-port is omitted or set # to zero, then cluster-announce-port refers to the TLS port. Note also that # cluster-announce-tls-port has no effect if cluster-tls is set to no. # # If the above options are not used, the normal Redis Cluster auto-detection # will be used instead. # # Note that when remapped, the bus port may not be at the fixed offset of # clients port + 10000, so you can specify any port and bus-port depending # on how they get remapped. If the bus-port is not set, a fixed offset of # 10000 will be used as usual. # # Example: # # cluster-announce-ip 10.1.1.5 # cluster-announce-tls-port 6379 # cluster-announce-port 0 # cluster-announce-bus-port 6380 ################################## SLOW LOG ################################### # The Redis Slow Log is a system to log queries that exceeded a specified # execution time. The execution time does not include the I/O operations # like talking with the client, sending the reply and so forth, # but just the time needed to actually execute the command (this is the only # stage of command execution where the thread is blocked and can not serve # other requests in the meantime). # # You can configure the slow log with two parameters: one tells Redis # what is the execution time, in microseconds, to exceed in order for the # command to get logged, and the other parameter is the length of the # slow log. When a new command is logged the oldest one is removed from the # queue of logged commands. # The following time is expressed in microseconds, so 1000000 is equivalent # to one second. Note that a negative number disables the slow log, while # a value of zero forces the logging of every command. slowlog-log-slower-than 10000 # There is no limit to this length. Just be aware that it will consume memory. # You can reclaim memory used by the slow log with SLOWLOG RESET. slowlog-max-len 128 ################################ LATENCY MONITOR ############################## # The Redis latency monitoring subsystem samples different operations # at runtime in order to collect data related to possible sources of # latency of a Redis instance. # # Via the LATENCY command this information is available to the user that can # print graphs and obtain reports. # # The system only logs operations that were performed in a time equal or # greater than the amount of milliseconds specified via the # latency-monitor-threshold configuration directive. When its value is set # to zero, the latency monitor is turned off. # # By default latency monitoring is disabled since it is mostly not needed # if you don't have latency issues, and collecting data has a performance # impact, that while very small, can be measured under big load. Latency # monitoring can easily be enabled at runtime using the command # "CONFIG SET latency-monitor-threshold <milliseconds>" if needed. latency-monitor-threshold 0 ############################# EVENT NOTIFICATION ############################## # Redis can notify Pub/Sub clients about events happening in the key space. # This feature is documented at https://redis.io/topics/notifications # # For instance if keyspace events notification is enabled, and a client # performs a DEL operation on key "foo" stored in the Database 0, two # messages will be published via Pub/Sub: # # PUBLISH __keyspace@0__:foo del # PUBLISH __keyevent@0__:del foo # # It is possible to select the events that Redis will notify among a set # of classes. Every class is identified by a single character: # # K Keyspace events, published with __keyspace@<db>__ prefix. # E Keyevent events, published with __keyevent@<db>__ prefix. # g Generic commands (non-type specific) like DEL, EXPIRE, RENAME, ... # $ String commands # l List commands # s Set commands # h Hash commands # z Sorted set commands # x Expired events (events generated every time a key expires) # e Evicted events (events generated when a key is evicted for maxmemory) # t Stream commands # d Module key type events # m Key-miss events (Note: It is not included in the 'A' class) # A Alias for g$lshzxetd, so that the "AKE" string means all the events # (Except key-miss events which are excluded from 'A' due to their # unique nature). # # The "notify-keyspace-events" takes as argument a string that is composed # of zero or multiple characters. The empty string means that notifications # are disabled. # # Example: to enable list and generic events, from the point of view of the # event name, use: # # notify-keyspace-events Elg # # Example 2: to get the stream of the expired keys subscribing to channel # name __keyevent@0__:expired use: # # notify-keyspace-events Ex # # By default all notifications are disabled because most users don't need # this feature and the feature has some overhead. Note that if you don't # specify at least one of K or E, no events will be delivered. notify-keyspace-events "" ############################### GOPHER SERVER ################################# # Redis contains an implementation of the Gopher protocol, as specified in # the RFC 1436 (https://www.ietf.org/rfc/rfc1436.txt). # # The Gopher protocol was very popular in the late '90s. It is an alternative # to the web, and the implementation both server and client side is so simple # that the Redis server has just 100 lines of code in order to implement this # support. # # What do you do with Gopher nowadays? Well Gopher never *really* died, and # lately there is a movement in order for the Gopher more hierarchical content # composed of just plain text documents to be resurrected. Some want a simpler # internet, others believe that the mainstream internet became too much # controlled, and it's cool to create an alternative space for people that # want a bit of fresh air. # # Anyway for the 10nth birthday of the Redis, we gave it the Gopher protocol # as a gift. # # --- HOW IT WORKS? --- # # The Redis Gopher support uses the inline protocol of Redis, and specifically # two kind of inline requests that were anyway illegal: an empty request # or any request that starts with "/" (there are no Redis commands starting # with such a slash). Normal RESP2/RESP3 requests are completely out of the # path of the Gopher protocol implementation and are served as usual as well. # # If you open a connection to Redis when Gopher is enabled and send it # a string like "/foo", if there is a key named "/foo" it is served via the # Gopher protocol. # # In order to create a real Gopher "hole" (the name of a Gopher site in Gopher # talking), you likely need a script like the following: # # https://github.com/antirez/gopher2redis # # --- SECURITY WARNING --- # # If you plan to put Redis on the internet in a publicly accessible address # to server Gopher pages MAKE SURE TO SET A PASSWORD to the instance. # Once a password is set: # # 1. The Gopher server (when enabled, not by default) will still serve # content via Gopher. # 2. However other commands cannot be called before the client will # authenticate. # # So use the 'requirepass' option to protect your instance. # # Note that Gopher is not currently supported when 'io-threads-do-reads' # is enabled. # # To enable Gopher support, uncomment the following line and set the option # from no (the default) to yes. # # gopher-enabled no ############################### ADVANCED CONFIG ############################### # Hashes are encoded using a memory efficient data structure when they have a # small number of entries, and the biggest entry does not exceed a given # threshold. These thresholds can be configured using the following directives. hash-max-ziplist-entries 512 hash-max-ziplist-value 64 # Lists are also encoded in a special way to save a lot of space. # The number of entries allowed per internal list node can be specified # as a fixed maximum size or a maximum number of elements. # For a fixed maximum size, use -5 through -1, meaning: # -5: max size: 64 Kb <-- not recommended for normal workloads # -4: max size: 32 Kb <-- not recommended # -3: max size: 16 Kb <-- probably not recommended # -2: max size: 8 Kb <-- good # -1: max size: 4 Kb <-- good # Positive numbers mean store up to _exactly_ that number of elements # per list node. # The highest performing option is usually -2 (8 Kb size) or -1 (4 Kb size), # but if your use case is unique, adjust the settings as necessary. list-max-ziplist-size -2 # Lists may also be compressed. # Compress depth is the number of quicklist ziplist nodes from *each* side of # the list to *exclude* from compression. The head and tail of the list # are always uncompressed for fast push/pop operations. Settings are: # 0: disable all list compression # 1: depth 1 means "don't start compressing until after 1 node into the list, # going from either the head or tail" # So: [head]->node->node->...->node->[tail] # [head], [tail] will always be uncompressed; inner nodes will compress. # 2: [head]->[next]->node->node->...->node->[prev]->[tail] # 2 here means: don't compress head or head->next or tail->prev or tail, # but compress all nodes between them. # 3: [head]->[next]->[next]->node->node->...->node->[prev]->[prev]->[tail] # etc. list-compress-depth 0 # Sets have a special encoding in just one case: when a set is composed # of just strings that happen to be integers in radix 10 in the range # of 64 bit signed integers. # The following configuration setting sets the limit in the size of the # set in order to use this special memory saving encoding. set-max-intset-entries 512 # Similarly to hashes and lists, sorted sets are also specially encoded in # order to save a lot of space. This encoding is only used when the length and # elements of a sorted set are below the following limits: zset-max-ziplist-entries 128 zset-max-ziplist-value 64 # HyperLogLog sparse representation bytes limit. The limit includes the # 16 bytes header. When an HyperLogLog using the sparse representation crosses # this limit, it is converted into the dense representation. # # A value greater than 16000 is totally useless, since at that point the # dense representation is more memory efficient. # # The suggested value is ~ 3000 in order to have the benefits of # the space efficient encoding without slowing down too much PFADD, # which is O(N) with the sparse encoding. The value can be raised to # ~ 10000 when CPU is not a concern, but space is, and the data set is # composed of many HyperLogLogs with cardinality in the 0 - 15000 range. hll-sparse-max-bytes 3000 # Streams macro node max size / items. The stream data structure is a radix # tree of big nodes that encode multiple items inside. Using this configuration # it is possible to configure how big a single node can be in bytes, and the # maximum number of items it may contain before switching to a new node when # appending new stream entries. If any of the following settings are set to # zero, the limit is ignored, so for instance it is possible to set just a # max entries limit by setting max-bytes to 0 and max-entries to the desired # value. stream-node-max-bytes 4096 stream-node-max-entries 100 # Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in # order to help rehashing the main Redis hash table (the one mapping top-level # keys to values). The hash table implementation Redis uses (see dict.c) # performs a lazy rehashing: the more operation you run into a hash table # that is rehashing, the more rehashing "steps" are performed, so if the # server is idle the rehashing is never complete and some more memory is used # by the hash table. # # The default is to use this millisecond 10 times every second in order to # actively rehash the main dictionaries, freeing memory when possible. # # If unsure: # use "activerehashing no" if you have hard latency requirements and it is # not a good thing in your environment that Redis can reply from time to time # to queries with 2 milliseconds delay. # # use "activerehashing yes" if you don't have such hard requirements but # want to free memory asap when possible. activerehashing yes # The client output buffer limits can be used to force disconnection of clients # that are not reading data from the server fast enough for some reason (a # common reason is that a Pub/Sub client can't consume messages as fast as the # publisher can produce them). # # The limit can be set differently for the three different classes of clients: # # normal -> normal clients including MONITOR clients # replica -> replica clients # pubsub -> clients subscribed to at least one pubsub channel or pattern # # The syntax of every client-output-buffer-limit directive is the following: # # client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds> # # A client is immediately disconnected once the hard limit is reached, or if # the soft limit is reached and remains reached for the specified number of # seconds (continuously). # So for instance if the hard limit is 32 megabytes and the soft limit is # 16 megabytes / 10 seconds, the client will get disconnected immediately # if the size of the output buffers reach 32 megabytes, but will also get # disconnected if the client reaches 16 megabytes and continuously overcomes # the limit for 10 seconds. # # By default normal clients are not limited because they don't receive data # without asking (in a push way), but just after a request, so only # asynchronous clients may create a scenario where data is requested faster # than it can read. # # Instead there is a default limit for pubsub and replica clients, since # subscribers and replicas receive data in a push fashion. # # Both the hard or the soft limit can be disabled by setting them to zero. client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 # Client query buffers accumulate new commands. They are limited to a fixed # amount by default in order to avoid that a protocol desynchronization (for # instance due to a bug in the client) will lead to unbound memory usage in # the query buffer. However you can configure it here if you have very special # needs, such us huge multi/exec requests or alike. # # client-query-buffer-limit 1gb # In the Redis protocol, bulk requests, that are, elements representing single # strings, are normally limited to 512 mb. However you can change this limit # here, but must be 1mb or greater # # proto-max-bulk-len 512mb # Redis calls an internal function to perform many background tasks, like # closing connections of clients in timeout, purging expired keys that are # never requested, and so forth. # # Not all tasks are performed with the same frequency, but Redis checks for # tasks to perform according to the specified "hz" value. # # By default "hz" is set to 10. Raising the value will use more CPU when # Redis is idle, but at the same time will make Redis more responsive when # there are many keys expiring at the same time, and timeouts may be # handled with more precision. # # The range is between 1 and 500, however a value over 100 is usually not # a good idea. Most users should use the default of 10 and raise this up to # 100 only in environments where very low latency is required. hz 10 # Normally it is useful to have an HZ value which is proportional to the # number of clients connected. This is useful in order, for instance, to # avoid too many clients are processed for each background task invocation # in order to avoid latency spikes. # # Since the default HZ value by default is conservatively set to 10, Redis # offers, and enables by default, the ability to use an adaptive HZ value # which will temporarily raise when there are many connected clients. # # When dynamic HZ is enabled, the actual configured HZ will be used # as a baseline, but multiples of the configured HZ value will be actually # used as needed once more clients are connected. In this way an idle # instance will use very little CPU time while a busy instance will be # more responsive. dynamic-hz yes # When a child rewrites the AOF file, if the following option is enabled # the file will be fsync-ed every 32 MB of data generated. This is useful # in order to commit the file to the disk more incrementally and avoid # big latency spikes. aof-rewrite-incremental-fsync yes # When redis saves RDB file, if the following option is enabled # the file will be fsync-ed every 32 MB of data generated. This is useful # in order to commit the file to the disk more incrementally and avoid # big latency spikes. rdb-save-incremental-fsync yes # Redis LFU eviction (see maxmemory setting) can be tuned. However it is a good # idea to start with the default settings and only change them after investigating # how to improve the performances and how the keys LFU change over time, which # is possible to inspect via the OBJECT FREQ command. # # There are two tunable parameters in the Redis LFU implementation: the # counter logarithm factor and the counter decay time. It is important to # understand what the two parameters mean before changing them. # # The LFU counter is just 8 bits per key, it's maximum value is 255, so Redis # uses a probabilistic increment with logarithmic behavior. Given the value # of the old counter, when a key is accessed, the counter is incremented in # this way: # # 1. A random number R between 0 and 1 is extracted. # 2. A probability P is calculated as 1/(old_value*lfu_log_factor+1). # 3. The counter is incremented only if R < P. # # The default lfu-log-factor is 10. This is a table of how the frequency # counter changes with a different number of accesses with different # logarithmic factors: # # +--------+------------+------------+------------+------------+------------+ # | factor | 100 hits | 1000 hits | 100K hits | 1M hits | 10M hits | # +--------+------------+------------+------------+------------+------------+ # | 0 | 104 | 255 | 255 | 255 | 255 | # +--------+------------+------------+------------+------------+------------+ # | 1 | 18 | 49 | 255 | 255 | 255 | # +--------+------------+------------+------------+------------+------------+ # | 10 | 10 | 18 | 142 | 255 | 255 | # +--------+------------+------------+------------+------------+------------+ # | 100 | 8 | 11 | 49 | 143 | 255 | # +--------+------------+------------+------------+------------+------------+ # # NOTE: The above table was obtained by running the following commands: # # redis-benchmark -n 1000000 incr foo # redis-cli object freq foo # # NOTE 2: The counter initial value is 5 in order to give new objects a chance # to accumulate hits. # # The counter decay time is the time, in minutes, that must elapse in order # for the key counter to be divided by two (or decremented if it has a value # less <= 10). # # The default value for the lfu-decay-time is 1. A special value of 0 means to # decay the counter every time it happens to be scanned. # # lfu-log-factor 10 # lfu-decay-time 1 ########################### ACTIVE DEFRAGMENTATION ####################### # # What is active defragmentation? # ------------------------------- # # Active (online) defragmentation allows a Redis server to compact the # spaces left between small allocations and deallocations of data in memory, # thus allowing to reclaim back memory. # # Fragmentation is a natural process that happens with every allocator (but # less so with Jemalloc, fortunately) and certain workloads. Normally a server # restart is needed in order to lower the fragmentation, or at least to flush # away all the data and create it again. However thanks to this feature # implemented by Oran Agra for Redis 4.0 this process can happen at runtime # in a "hot" way, while the server is running. # # Basically when the fragmentation is over a certain level (see the # configuration options below) Redis will start to create new copies of the # values in contiguous memory regions by exploiting certain specific Jemalloc # features (in order to understand if an allocation is causing fragmentation # and to allocate it in a better place), and at the same time, will release the # old copies of the data. This process, repeated incrementally for all the keys # will cause the fragmentation to drop back to normal values. # # Important things to understand: # # 1. This feature is disabled by default, and only works if you compiled Redis # to use the copy of Jemalloc we ship with the source code of Redis. # This is the default with Linux builds. # # 2. You never need to enable this feature if you don't have fragmentation # issues. # # 3. Once you experience fragmentation, you can enable this feature when # needed with the command "CONFIG SET activedefrag yes". # # The configuration parameters are able to fine tune the behavior of the # defragmentation process. If you are not sure about what they mean it is # a good idea to leave the defaults untouched. # Enabled active defragmentation # activedefrag no # Minimum amount of fragmentation waste to start active defrag # active-defrag-ignore-bytes 100mb # Minimum percentage of fragmentation to start active defrag # active-defrag-threshold-lower 10 # Maximum percentage of fragmentation at which we use maximum effort # active-defrag-threshold-upper 100 # Minimal effort for defrag in CPU percentage, to be used when the lower # threshold is reached # active-defrag-cycle-min 1 # Maximal effort for defrag in CPU percentage, to be used when the upper # threshold is reached # active-defrag-cycle-max 25 # Maximum number of set/hash/zset/list fields that will be processed from # the main dictionary scan # active-defrag-max-scan-fields 1000 # Jemalloc background thread for purging will be enabled by default jemalloc-bg-thread yes # It is possible to pin different threads and processes of Redis to specific # CPUs in your system, in order to maximize the performances of the server. # This is useful both in order to pin different Redis threads in different # CPUs, but also in order to make sure that multiple Redis instances running # in the same host will be pinned to different CPUs. # # Normally you can do this using the "taskset" command, however it is also # possible to this via Redis configuration directly, both in Linux and FreeBSD. # # You can pin the server/IO threads, bio threads, aof rewrite child process, and # the bgsave child process. The syntax to specify the cpu list is the same as # the taskset command: # # Set redis server/io threads to cpu affinity 0,2,4,6: # server_cpulist 0-7:2 # # Set bio threads to cpu affinity 1,3: # bio_cpulist 1,3 # # Set aof rewrite child process to cpu affinity 8,9,10,11: # aof_rewrite_cpulist 8-11 # # Set bgsave child process to cpu affinity 1,10,11 # bgsave_cpulist 1,10-11 # In some cases redis will emit warnings and even refuse to start if it detects # that the system is in bad state, it is possible to suppress these warnings # by setting the following config which takes a space delimited list of warnings # to suppress # # ignore-warnings ARM64-COW-BUG 在里面那边加上bind 0.0.0.0
05-24
### 如何在Redis配置文件中设置`bind`参数为`0.0.0.0` 要在Redis配置文件中设置`bind`参数为`0.0.0.0`,可以通过编辑Redis的配置文件来实现。以下是具体的操作说明: #### 配置文件路径 通常情况下,Redis的配置文件位于安装目录下的`redis.conf`文件中。如果通过包管理器安装,则可能位于`/etc/redis/redis.conf`或其他指定路径。 #### 修改`bind`参数 打开`redis.conf`文件,在其中找到`bind`指令,默认可能是如下形式: ```conf bind 127.0.0.1 ``` 将其修改为以下内容以允许所有IP地址访问: ```conf bind 0.0.0.0 ``` 此更改表示Redis将监听所有的网络接口[^2]。 #### 注意事项 1. **安全性考虑** 如果将`bind`设置为`0.0.0.0`,则意味着任何能够连接到服务器的设备都可以尝试访问Redis实例。因此建议启用密码验证功能以增强安全防护。可以在配置文件中添加或修改以下行: ```conf requirepass your_password_here ``` 替换`your_password_here`为你希望使用的强密码[^3]。 2. **重启服务** 修改完成后保存文件并重新启动Redis服务使更改生效。例如: ```bash src/redis-server ./redis.conf ``` 3. **检查状态** 使用命令查看Redis是否成功绑定到`0.0.0.0`端口: ```bash netstat -anp | grep 6379 ``` 输出应显示类似以下内容: ```plaintext tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN - ``` #### 解决常见问题 如果遇到`(error) NOAUTH Authentication required`错误提示,表明当前Redis实例已启用了身份验证机制而未提供有效凭证。此时需按照前述步骤确认是否正确设置了`requirepass`字段,并确保客户端程序传递了匹配的密码[^3]。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值