1. 拆分文件 Split
[houbu@opentsdb1 temp]$ split --help
用法:split [选项]... [输入 [前缀]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N generate suffixes of length N (default 2)
--additional-suffix=SUFFIX append an additional SUFFIX to file names
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic;
FROM changes the start value (default 0)
-e, --elide-empty-files do not generate empty output files with '-n'
--filter=COMMAND write to shell COMMAND; file name is $FILE
-l, --lines=NUMBER put NUMBER lines per output file
-n, --number=CHUNKS generate CHUNKS output files; see explanation below
-u, --unbuffered immediately copy input to output with '-n r/...'
--verbose 在每个输出文件打开前输出文件特征
--help 显示此帮助信息并退出
--version 显示版本信息并退出
SIZE is an integer and optional unit (example: 10M is 10*1024*1024). Units
are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (powers of 1000).
CHUNKS may be:
N split into N files based on size of input
K/N output Kth of N to stdout
l/N split into N files without splitting lines
l/K/N output Kth of N to stdout without splitting lines
r/N like 'l' but use round robin distribution
r/K/N likewise but only output Kth of N to stdout
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
请向<http://translationproject.org/team/zh_CN.html> 报告split 的翻译错误
要获取完整文档,请运行:info coreutils 'split invocation'
[houbu@opentsdb1 temp]$
[houbu@opentsdb1 temp]$ split -a 3 -b 1k -d nohup.out split_
[houbu@opentsdb1 temp]$ ll
总用量 336749072
-rwxrwxr-x 1 houbu houbu 283 8月 21 18:07 analysis_report.sh
-rwxrwxr-x 1 houbu houbu 1107 8月 23 15:41 copy_big_file.sh
-rw-rw-r-- 1 houbu houbu 19384602624 8月 23 16:05 nav_test.jtl
-rw-rw-r-- 1 houbu houbu 325446337189 8月 23 11:48 nav_test.jtl_
-rw------- 1 houbu houbu 8850 8月 23 16:04 nohup.out
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_000
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_001
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_002
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_003
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_004
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_005
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_006
-rw-rw-r-- 1 houbu houbu 1024 8月 23 16:05 split_007
-rw-rw-r-- 1 houbu houbu 658 8月 23 16:05 split_008
-rwxrwxr-x 1 houbu houbu 113 8月 21 16:54 unzip_file.sh
[houbu@opentsdb1 temp]$
合并文件
[houbu@opentsdb1 temp]$ cat split_* > new.temp
[houbu@opentsdb1 temp]$ ll -h
总用量 323G
-rwxrwxr-x 1 houbu houbu 283 8月 21 18:07 analysis_report.sh
-rwxrwxr-x 1 houbu houbu 1.1K 8月 23 15:41 copy_big_file.sh
-rw-rw-r-- 1 houbu houbu 20G 8月 23 16:07 nav_test.jtl
-rw-rw-r-- 1 houbu houbu 304G 8月 23 11:48 nav_test.jtl_
-rw-rw-r-- 1 houbu houbu 8.7K 8月 23 16:07 new.temp
-rw------- 1 houbu houbu 8.9K 8月 23 16:06 nohup.out
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_000
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_001
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_002
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_003
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_004
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_005
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_006
-rw-rw-r-- 1 houbu houbu 1.0K 8月 23 16:05 split_007
-rw-rw-r-- 1 houbu houbu 658 8月 23 16:05 split_008
-rwxrwxr-x 1 houbu houbu 113 8月 21 16:54 unzip_file.sh
[houbu@opentsdb1 temp]$
删除指定最后n行
#!/bin/bash
in_file=nav_test.jtl_
out_file=nav_test.jtl
# 计算文件总字节数
#total_size=$(stat -c%s nohup.out)
total_size=$(($(stat -c%s $in_file)-$(tail -n 11 $in_file|wc -c)))
# 设定块的字节大小
#block_size=1024
block_size=$((1024 * 1024 * 1024))
# 初始拷贝第一个块
dd if=$in_file of=$out_file bs=$block_size count=1
#循环拷贝后续内容
current=1
copy_size=$((block_size))
while [ $copy_size -lt $total_size ]
do
var_size=$((total_size - copy_size))
if [ $var_size -lt $block_size ]
then
#处理尾块
echo "dd if=$in_file of=$out_file skip=$copy_size bs=1 count=$var_size conv=notrunc oflag=append"
dd if=$in_file of=$out_file skip=$copy_size bs=1 count=$var_size conv=notrunc oflag=append
copy_size=$((copy_size + var_size))
else
#处理正常块
echo "dd if=$in_file of=$out_file skip=$current bs=$block_size count=1 conv=notrunc oflag=append"
dd if=$in_file of=$out_file skip=$current bs=$block_size count=1 conv=notrunc oflag=append
copy_size=$((copy_size + block_size))
fi
current=$((current + 1))
echo "copy_size:$copy_size"
done
上面shell脚本中只实现了删除最后11行,请根据需要调整
total_size=$(($(stat -c%s $in_file)-$(tail -n 11 $in_file|wc -c))) 中的 tail -n 对应的参数
删除前n行 ,修改下面预计
# 初始拷贝第一个块
dd if=$in_file of=$out_file bs=$block_size count=1
改为
skip_size=$(head -n 10 $in_file|wc -c)))
dd if=$in_file of=$out_file skip=$skip_size bs=1 count=$block_size
需要注意的
seek=N skip N obs-sized blocks at start of output
skip=N skip N ibs-sized blocks at start of input
count=N copy only N input blocks
[houbu@opentsdb1 temp]$ dd --help
用法:dd [操作数] ...
或:dd 选项
Copy a file, converting and formatting according to the operands.
bs=BYTES read and write up to BYTES bytes at a time
cbs=BYTES convert BYTES bytes at a time
conv=CONVS convert the file as per the comma separated symbol list
count=N copy only N input blocks
ibs=BYTES read up to BYTES bytes at a time (default: 512)
if=FILE read from FILE instead of stdin
iflag=FLAGS read as per the comma separated symbol list
obs=BYTES write BYTES bytes at a time (default: 512)
of=FILE write to FILE instead of stdout
oflag=FLAGS write as per the comma separated symbol list
seek=N skip N obs-sized blocks at start of output
skip=N skip N ibs-sized blocks at start of input
status=LEVEL The LEVEL of information to print to stderr;
'none' suppresses everything but error messages,
'noxfer' suppresses the final transfer statistics,
'progress' shows periodic transfer statistics
N and BYTES may be followed by the following multiplicative suffixes:
c =1, w =2, b =512, kB =1000, K =1024, MB =1000*1000, M =1024*1024, xM =M
GB =1000*1000*1000, G =1024*1024*1024, and so on for T, P, E, Z, Y.
Each CONV symbol may be:
ascii from EBCDIC to ASCII
ebcdic from ASCII to EBCDIC
ibm from ASCII to alternate EBCDIC
block pad newline-terminated records with spaces to cbs-size
unblock replace trailing spaces in cbs-size records with newline
lcase change upper case to lower case
ucase change lower case to upper case
sparse try to seek rather than write the output for NUL input blocks
swab swap every pair of input bytes
sync pad every input block with NULs to ibs-size; when used
with block or unblock, pad with spaces rather than NULs
excl fail if the output file already exists
nocreat do not create the output file
notrunc 不截断输出文件
noerror 读取数据发生错误后仍然继续
fdatasync 结束前将输出文件数据写入磁盘
fsync 类似上面,但是元数据也一同写入
FLAG 符号可以是:
append 追加模式(仅对输出有意义;隐含了conv=notrunc)
direct 使用直接I/O 存取模式
directory 除非是目录,否则 directory 失败
dsync 使用同步I/O 存取模式
sync 与上者类似,但同时也对元数据生效
fullblock 为输入积累完整块(仅iflag)
nonblock 使用无阻塞I/O 存取模式
noatime 不更新存取时间
nocache 丢弃缓存数据
noctty 不根据文件指派控制终端
nofollow 不跟随链接文件
count_bytes treat 'count=N' as a byte count (iflag only)
skip_bytes treat 'skip=N' as a byte count (iflag only)
seek_bytes treat 'seek=N' as a byte count (oflag only)
Sending a USR1 signal to a running 'dd' process makes it
print I/O statistics to standard error and then resume copying.
$ dd if=/dev/zero of=/dev/null& pid=$!
$ kill -USR1 $pid; sleep 1; kill $pid
18335302+0 records in
18335302+0 records out
9387674624 bytes (9.4 GB) copied, 34.6279 seconds, 271 MB/s
Options are:
--help 显示此帮助信息并退出
--version 显示版本信息并退出
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
请向<http://translationproject.org/team/zh_CN.html> 报告dd 的翻译错误
要获取完整文档,请运行:info coreutils 'dd invocation'
[houbu@opentsdb1 temp]$