Linux 拆分、合并大文件删除指定前/最后n行

最新推荐文章于 2025-11-26 10:36:10 发布

原创最新推荐文章于 2025-11-26 10:36:10 发布 · 1.4k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#Shell #Split #Cat #dd

其它（随笔）专栏收录该内容

26 篇文章

订阅专栏

1. 拆分文件 Split

[houbu@opentsdb1 temp]$ split --help
用法：split [选项]... [输入 [前缀]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes[=FROM]  use numeric suffixes instead of alphabetic;
                                   FROM changes the start value (default 0)
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose		在每个输出文件打开前输出文件特征
      --help		显示此帮助信息并退出
      --version		显示版本信息并退出

SIZE is an integer and optional unit (example: 10M is 10*1024*1024).  Units
are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (powers of 1000).

CHUNKS may be:
N       split into N files based on size of input
K/N     output Kth of N to stdout
l/N     split into N files without splitting lines
l/K/N   output Kth of N to stdout without splitting lines
r/N     like 'l' but use round robin distribution
r/K/N   likewise but only output Kth of N to stdout

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
请向<http://translationproject.org/team/zh_CN.html> 报告split 的翻译错误
要获取完整文档，请运行：info coreutils 'split invocation'
[houbu@opentsdb1 temp]$ 
[houbu@opentsdb1 temp]$ split -a 3 -b 1k -d nohup.out split_ 
[houbu@opentsdb1 temp]$ ll
总用量 336749072
-rwxrwxr-x 1 houbu houbu          283 8月  21 18:07 analysis_report.sh
-rwxrwxr-x 1 houbu houbu         1107 8月  23 15:41 copy_big_file.sh
-rw-rw-r-- 1 houbu houbu  19384602624 8月  23 16:05 nav_test.jtl
-rw-rw-r-- 1 houbu houbu 325446337189 8月  23 11:48 nav_test.jtl_
-rw------- 1 houbu houbu         8850 8月  23 16:04 nohup.out
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_000
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_001
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_002
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_003
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_004
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_005
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_006
-rw-rw-r-- 1 houbu houbu         1024 8月  23 16:05 split_007
-rw-rw-r-- 1 houbu houbu          658 8月  23 16:05 split_008
-rwxrwxr-x 1 houbu houbu          113 8月  21 16:54 unzip_file.sh
[houbu@opentsdb1 temp]$

合并文件

[houbu@opentsdb1 temp]$ cat split_* > new.temp
[houbu@opentsdb1 temp]$ ll -h
总用量 323G
-rwxrwxr-x 1 houbu houbu  283 8月  21 18:07 analysis_report.sh
-rwxrwxr-x 1 houbu houbu 1.1K 8月  23 15:41 copy_big_file.sh
-rw-rw-r-- 1 houbu houbu  20G 8月  23 16:07 nav_test.jtl
-rw-rw-r-- 1 houbu houbu 304G 8月  23 11:48 nav_test.jtl_
-rw-rw-r-- 1 houbu houbu 8.7K 8月  23 16:07 new.temp
-rw------- 1 houbu houbu 8.9K 8月  23 16:06 nohup.out
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_000
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_001
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_002
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_003
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_004
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_005
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_006
-rw-rw-r-- 1 houbu houbu 1.0K 8月  23 16:05 split_007
-rw-rw-r-- 1 houbu houbu  658 8月  23 16:05 split_008
-rwxrwxr-x 1 houbu houbu  113 8月  21 16:54 unzip_file.sh
[houbu@opentsdb1 temp]$

删除指定最后n行

#!/bin/bash

in_file=nav_test.jtl_
out_file=nav_test.jtl

# 计算文件总字节数
#total_size=$(stat -c%s nohup.out)
total_size=$(($(stat -c%s $in_file)-$(tail -n 11 $in_file|wc -c)))

# 设定块的字节大小
#block_size=1024
block_size=$((1024 * 1024 * 1024))

# 初始拷贝第一个块
dd if=$in_file of=$out_file bs=$block_size count=1

#循环拷贝后续内容
current=1
copy_size=$((block_size))
while [ $copy_size -lt $total_size ]
do
        var_size=$((total_size - copy_size))

        if [ $var_size -lt $block_size ]
        then
                #处理尾块               
                echo "dd if=$in_file of=$out_file skip=$copy_size bs=1 count=$var_size conv=notrunc oflag=append"
                dd if=$in_file of=$out_file skip=$copy_size bs=1 count=$var_size conv=notrunc oflag=append

                copy_size=$((copy_size + var_size))
        else
                #处理正常块
                echo "dd if=$in_file of=$out_file skip=$current bs=$block_size count=1 conv=notrunc oflag=append"
                dd if=$in_file of=$out_file skip=$current bs=$block_size count=1 conv=notrunc oflag=append

                copy_size=$((copy_size + block_size))
        fi

        current=$((current + 1))

        echo "copy_size:$copy_size"
done

上面shell脚本中只实现了删除最后11行，请根据需要调整

total_size=$(($(stat -c%s $in_file)-$(tail -n 11 $in_file|wc -c))) 中的 tail -n 对应的参数

删除前n行，修改下面预计

# 初始拷贝第一个块
dd if=$in_file of=$out_file bs=$block_size count=1
改为

skip_size=$(head -n 10 $in_file|wc -c)))

dd if=$in_file of=$out_file skip=$skip_size bs=1 count=$block_size

需要注意的

seek=N skip N obs-sized blocks at start of output
skip=N skip N ibs-sized blocks at start of input

count=N copy only N input blocks

[houbu@opentsdb1 temp]$ dd --help
用法：dd [操作数] ...
　或：dd 选项
Copy a file, converting and formatting according to the operands.

  bs=BYTES        read and write up to BYTES bytes at a time
  cbs=BYTES       convert BYTES bytes at a time
  conv=CONVS      convert the file as per the comma separated symbol list
  count=N         copy only N input blocks
  ibs=BYTES       read up to BYTES bytes at a time (default: 512)
  if=FILE         read from FILE instead of stdin
  iflag=FLAGS     read as per the comma separated symbol list
  obs=BYTES       write BYTES bytes at a time (default: 512)
  of=FILE         write to FILE instead of stdout
  oflag=FLAGS     write as per the comma separated symbol list
  seek=N          skip N obs-sized blocks at start of output
  skip=N          skip N ibs-sized blocks at start of input
  status=LEVEL    The LEVEL of information to print to stderr;
                  'none' suppresses everything but error messages,
                  'noxfer' suppresses the final transfer statistics,
                  'progress' shows periodic transfer statistics

N and BYTES may be followed by the following multiplicative suffixes:
c =1, w =2, b =512, kB =1000, K =1024, MB =1000*1000, M =1024*1024, xM =M
GB =1000*1000*1000, G =1024*1024*1024, and so on for T, P, E, Z, Y.

Each CONV symbol may be:

  ascii     from EBCDIC to ASCII
  ebcdic    from ASCII to EBCDIC
  ibm       from ASCII to alternate EBCDIC
  block     pad newline-terminated records with spaces to cbs-size
  unblock   replace trailing spaces in cbs-size records with newline
  lcase     change upper case to lower case
  ucase     change lower case to upper case
  sparse    try to seek rather than write the output for NUL input blocks
  swab      swap every pair of input bytes
  sync      pad every input block with NULs to ibs-size; when used
            with block or unblock, pad with spaces rather than NULs
  excl		fail if the output file already exists
  nocreat	do not create the output file
  notrunc	不截断输出文件
  noerror	读取数据发生错误后仍然继续
  fdatasync	结束前将输出文件数据写入磁盘
  fsync	类似上面，但是元数据也一同写入

FLAG 符号可以是：

  append	追加模式(仅对输出有意义；隐含了conv=notrunc)
  direct	使用直接I/O 存取模式
  directory	除非是目录，否则 directory 失败
  dsync		使用同步I/O 存取模式
  sync		与上者类似，但同时也对元数据生效
  fullblock	为输入积累完整块(仅iflag)
  nonblock	使用无阻塞I/O 存取模式
  noatime	不更新存取时间
  nocache	丢弃缓存数据
  noctty	不根据文件指派控制终端
  nofollow	不跟随链接文件
  count_bytes  treat 'count=N' as a byte count (iflag only)
  skip_bytes  treat 'skip=N' as a byte count (iflag only)
  seek_bytes  treat 'seek=N' as a byte count (oflag only)

Sending a USR1 signal to a running 'dd' process makes it
print I/O statistics to standard error and then resume copying.

  $ dd if=/dev/zero of=/dev/null& pid=$!
  $ kill -USR1 $pid; sleep 1; kill $pid
  18335302+0 records in
  18335302+0 records out
  9387674624 bytes (9.4 GB) copied, 34.6279 seconds, 271 MB/s

Options are:

      --help		显示此帮助信息并退出
      --version		显示版本信息并退出

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
请向<http://translationproject.org/team/zh_CN.html> 报告dd 的翻译错误
要获取完整文档，请运行：info coreutils 'dd invocation'
[houbu@opentsdb1 temp]$