rsync快速传输方法

珂玥c

已于 2024-06-19 16:56:18 修改

阅读量1.4k

点赞数 3

文章标签：运维 ubuntu linux

于 2024-06-19 16:54:29 首次发布

本文链接：https://blog.youkuaiyun.com/m0_59029800/article/details/139806923

版权

项目场景：

由于服务器变更，需要快速同步1T的文件到新服务器上。

问题描述：

sudo find . | wc -l 递归查看/shared目录下的所有文件和目录，得到数量为560839。

数量多，时间短，起初使用rsync的命令发现传输很慢，无法按照原定时间完成任务。

搜索了一些资料发现，rsync 不能做并发同步，特别是需要拷贝上T数据时，rsync 一个进程拷贝有很大的瓶颈，不能把存储设备IO性能发挥的最好或者说把存储设备IO跑满。参考了如下的一些参数使用在命令中，然而并没有什么很好的效果。

使用多进程：利用 --parallel 或 -P 参数来启用多进程传输，这可以显著提高大数据集的同步速度。

rsync -avP --delete source/ destination/
压缩数据：使用 -z 参数在传输过程中压缩数据，这可以减少网络传输量，特别是在高延迟的网络环境下。

rsync -avz --delete source/ destination/
增量传输： rsync 默认使用增量传输，只同步变化的部分，而不是整个文件。确保你没有禁用这个特性。
减少校验：使用 -I 参数可以减少 rsync 在同步时的校验次数，这可以提高速度，但可能会增加数据错误的风险。

rsync -avI --delete source/ destination/

解决方案：

使用加速脚本

罗里吧嗦详细备注版本

#!/usr/bin/env bash

# Define source, target, maxdepth and cd to source
#定义 源，目标，最大深度 和cd到源
source="/test1"
target="/test2"
depth=3
cd "${source}"

# Set the maximum number of concurrent rsync threads
#设置最大并发rsync线程数 （应该是进程数吧 但是这个threads 翻译出来是线程 process是进程）

maxthreads=5

# How long to wait before checking the number of rsync threads again
#再次检查rsync线程数需要等待多长时间

sleeptime=5

# Find all folders in the source directory within the maxdepth level
#在maxdepth（最大深度）级别内查找源目录中的所有文件夹  我在本机进行测试 将depth改成1和改成3的效果是一样的 很迷惑 不清楚这是什么原因导致的 如果是层数 那为什么1层和3层是一样的呢？？？不理解
#哦哦 晓得了 因为下面结束循环之后它还进行查找超过最大深度级别的文件 并进行同步了 真是严谨啊

find . -maxdepth ${depth} -type d | while read dir
        # 查看当前目录下 最大深度 为 {3} 的 文件 | 传给 dir这个变量  （while read 就是一次性将文件信息读入并赋值给变量dir）
do
#while read dir ... do ... done < file

       # Make sure to ignore the parent folder
        # 确保忽略父文件夹

       if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
        # if循环 [ ‘输出 这个变量里的内容 | awk  -F指定分隔符 为'/'  NF表示的是浏览记录的域的个数’ -gt 大于 这个深度的数 ]
       then
           # Strip leading dot slash
           # 条带前导点斜杠

           subfolder=$(echo "${dir}" | sed 's@^\./@@g')
                # 将变量 （输出 dir |?? ）的值赋值给subfoleder
           if [ ! -d "${target}/${subfolder}" ]
                # if [ 判断 深度/subfoleder 是否存在]
           then
               # Create destination folder
                # 创建目标文件夹
               mkdir -p "${target}/${subfolder}"
           fi
           # Make sure the number of rsync threads running is below the threshold
           # 确保运行的rsync线程数低于阈值
                while [ `ps -ef | grep -w [r]sync | awk '{print $NF}' | sort -nr | uniq | wc -l` -ge ${maxthreads} ]
                # while 循环 [ ‘查看进程 |查找出 rsync|输出最后一个字段的内容|排序 -大小相反|去重|统计文件行数’ 大于等于 进程数] （-w只显示全字符合的列）
                do
                        echo "Sleeping ${sleeptime} seconds"
                        sleep ${sleeptime}
                done
           # Run rsync in background for the current subfolder and move one to the next one
           # 在后台运行rsync当前子文件夹，并移动到下一个
                        nohup  rsync -avP "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &
                        # rsync 源/subfolder 到 目标/subfolder   nohup就是把输出的东西搞成一个nohup.out的文件 这条命令没有任何的屏幕输出 如果去掉后面这些 它会在源路径下生成这个.out的文件
       fi
done
#while read 变量 .... do .... done < file   read通过输入重定向，把file的第一行所有的内容赋值给变量，循环体内的命令一般包含对变量的处理；然后循环处理file的第二行、第三行...一直到file的最后一行
#内容写进去 一行一行搞出来  就是 读行 然后方便执行吧

# Find all files above the maxdepth level and rsync them as well
#查找所有超过maxdepth级别的文件并对它们进行rsync
find . -maxdepth ${depth} -type f -print0 | rsync -avP --files-from=- --from0 ./ "${target}/"

简易版

#!/usr/bin/env bash

# Define source, target, maxdepth and cd to source
source="/bj-data"
target="/sh-data"
depth=3
cd "${source}"

# Set the maximum number of concurrent rsync threads
maxthreads=30
# How long to wait before checking the number of rsync threads again
sleeptime=5

# Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
       # Make sure to ignore the parent folder
       if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
       then
           # Strip leading dot slash
           subfolder=$(echo "${dir}" | sed 's@^\./@@g')
           if [ ! -d "${target}/${subfolder}" ]
           then
               # Create destination folder
               mkdir -p "${target}/${subfolder}"
           fi
           # Make sure the number of rsync threads running is below the threshold
           	while [ `ps -ef | grep -w [r]sync | awk '{print $NF}' | sort -nr | uniq | wc -l` -ge ${maxthreads} ]
           	do
               		echo "Sleeping ${sleeptime} seconds"
               		sleep ${sleeptime}
           	done
           # Run rsync in background for the current subfolder and move one to the next one
                  rsync -avP "${source}/${subfolder}/" "${target}/${subfolder}/" 
       fi
done

# Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -avP --files-from=- --from0 ./ "${target}/"