tar文件格式分析

通过对tar-1.28源码的分析,发现tar文件的不一致源于其保存了文件的创建时间、UID、GID等元信息。每个打包的文件或文件夹前都有一个TAR HEADER,包含文件名、权限、UID/GID等,这些差异会导致生成的TAR文件不一致。在创建过程中,首先写入header信息,包括文件名、权限、所有者等,然后写入数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

tar文件格式分析

[编辑]1.问题

在测试时发现,通过命令“tar -cvf ”创建原数据文件夹的tar文件A,然后拷贝原数据文件夹到另一个地方,通过tar -cvf创建tar文件B,文件A和B会有间隔的数据不一致。很显然,这些不一致的数据与tar文件的格式密切相关,下面基于tar-1.28源码分析一下数据不一致的原因。

[编辑]2.结论

虽然数据文件内容一致,但是文件的创建时间、UID、GID、用户名、用户组名、用户权限、TAR文件HEADER类型的信息都会保存在TAR文件HEADER信息中,所以,如果上述任意一个元素不一致,都会导致生成的TAR文件是不一致的。在生成的TAR文件中,每一个被打包的文件夹和文件的前面都会插入一个TAR HEADER信息,HEADER信息有很多类别,默认为GNU_FORMAT类型,数据结构如下:

25 struct posix_header
26 { /* byte offset */
27 char name[100]; /* 0 */
28 char mode[8]; /* 100 */
29 char uid[8]; /* 108 */
30 char gid[8]; /* 116 */
31 char size[12]; /* 124 */
32 char mtime[12]; /* 136 */
33 char chksum[8]; /* 148 */
34 char typeflag; /* 156 */
35 char linkname[100]; /* 157 */
36 char magic[6]; /* 257 */
37 char version[2]; /* 263 */
38 char uname[32]; /* 265 */
39 char gname[32]; /* 297 */
40 char devmajor[8]; /* 329 */
41 char devminor[8]; /* 337 */
42 char prefix[155]; /* 345 */
43 /* 500 */
44 };

如果文件夹A下面有若干文件,文件夹A的TAR文件的格式如下。

Image:Tar2.png

[编辑]3.测试和代码分析

代码基于TAR-1.28,TAR文件的关键步骤如下:

1.创建测试文件夹和测试文件:

# ll test_dir
total 24
drwxr-xr-x 3 root root 4096 Sep 17 10:26 ./
drwxr-xr-x 4 1000 1000 4096 Sep 17 10:33 ../
-rw-r--r-- 1 root root 7209 Sep 17 10:25 manpage.cp
-rw-r--r-- 1 root root 3902 Sep 17 10:26 manpage.mv
drwxr-xr-x 2 root root 4096 Sep 17 10:26 test_subdir/

[root@slave1 /root/meng-test/tar/tar-1.28/src]
# ll test_dir/test_subdir/
total 20
drwxr-xr-x 2 root root 4096 Sep 17 10:26 ./
drwxr-xr-x 3 root root 4096 Sep 17 10:26 ../
-rw-r--r-- 1 root root 12220 Sep 17 10:26 manpage.fdisk

2.通过GDB调试命令 "tar -cvf test_dir.tar test_dir"
   在create_archive()加入断点,create_archive是创建tar文件的函数 

# gdb ./tar
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/meng-test/tar/tar-1.28/src/tar...done.
(gdb) set args -cvf test_dir.tar test_dir
(gdb) break create_archive
Breakpoint 1 at 0x40d440: file create.c, line 1327.
(gdb) r
Starting program: /root/meng-test/tar/tar-1.28/src/tar -cvf test_dir.tar test_dir
[Thread debugging using libthread_db enabled]

Breakpoint 1, create_archive () at create.c:1327
1327 {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libacl-2.2.49-6.el6.x86_64 libattr-2.4.44-7.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64
(gdb) bt
#0 create_archive () at create.c:1327
#1 0x000000000042460c in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2779
(gdb)

3.进入create_archive后,进入while循环准备写入tar文件,第一个要写入的文件为文件夹test_dir
  dump_file0是负责打包单一文件的函数,如果文件是文件夹的话将递归地写入tar文件。

{
    const char *name;
    while ((name = name_next (1)) != NULL)
    if (!excluded_name (name, NULL))
        dump_file (0, name, name);
}
(gdb) bt

 
  1. 0 dump_file0 (st=0x7fffffffd510, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:164
  1. 1 0x000000000040d3f2 in dump_file (parent=0x0, name=0x66e100 "test_dir", fullname=0x66e100 "test_dir") at create.c:195
  1. 2 0x000000000040d4ad in create_archive () at create.c:1407
  1. 3 0x000000000042460c in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2779

4.进入dump_file0后,先看下union block, 这个512bytes的数据最终将写入每个被打包的文件、文件夹的头部。
   默认的header的类型为oldgnu_header

366 union block
367 {
368 char buffer[BLOCKSIZE];
369 struct posix_header header;
370 struct star_header star_header;
371 struct oldgnu_header oldgnu_header;
372 struct sparse_header sparse_header;
373 struct star_in_header star_in_header;
374 struct star_ext_header star_ext_header;
375 }; 

5.因为我们第一个要打包的文件为文件夹test_dir,所以代码路径为dump_file0->dump_dir->dump_dir0。

(gdb) s
dump_dir0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1105
1105	  bool top_level = ! st->parent;
(gdb) bt
#0  dump_dir0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1105
#1  dump_dir (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1309
#2  dump_file0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1753
#3  0x000000000040d3f2 in dump_file (parent=0x0, name=0x66e100 "test_dir", fullname=0x66e100 "test_dir") at create.c:1955
#4  0x000000000040d4ad in create_archive () at create.c:1407
#5  0x000000000042460c in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2779
(gdb)

6.在dump_dir0中,调用start_header来初始化文件夹test_dir的header文件,start_header与finish_header成对出现。

(gdb) bt
#0 write_header_name (st=0x7fffffffd510) at create.c:721
#1 start_header (st=0x7fffffffd510) at create.c:744
#2  0x000000000040c657 in dump_dir0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1112
#3  dump_dir (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1309
#4  dump_file0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1753
#5  0x000000000040d3f2 in dump_file (parent=0x0, name=0x66e100 "test_dir", fullname=0x66e100 "test_dir") at create.c:1955
#6  0x000000000040d4ad in create_archive () at create.c:1407
#7  0x000000000042460c in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2779
(gdb) s
727		   < strlen (st->file_name))
(gdb)
726	  else if (NAME_FIELD_SIZE - (archive_format == OLDGNU_FORMAT)
(gdb) p archive_format
$8 = GNU_FORMAT
(gdb)

7.在start_heade中调用如下函数向header中写入数据

-write_header_name    -->  向header中写入文件名,即test_dir
-MODE_TO_CHARS        -->   向header中写入文件权限信息,即755
-UID_TO_CHARS              --> 向header中写入文件UID,
-GID_TO_CHARS             --> 向header中写入文件所属组信息,即root
-OFF_TO_CHARS             --> 向header中写入header文件的大小
-TIME_TO_CHARS            --> 向header中写入文件的修改时间
-向header文件中写入MAGIC信息,即“ustar”
-UID_TO_CHARS --> 向header中写入文件所有人信息,即root
-GID_TO_CHARS --> 向header中写入文件所属组信息,即root

(gdb) p (struct star_header)*header
$34 = {name = "test_dir/", '\000' <repeats 90 times>, mode = "0000755", uid = "0000000", gid = "0000000", size = '0' <repeats 11 times>, mtime = "12406170515", chksum = "\000\000\000\000\000\000\000",
typeflag = 48 '0', linkname = '\000' <repeats 99 times>, magic = "ustar ", version = " ", uname = "root", '\000' <repeats 27 times>, gname = "root", '\000' <repeats 27 times>,
devmajor = "\000\000\000\000\000\000\000", devminor = "\000\000\000\000\000\000\000", prefix = '\000' <repeats 130 times>, atime = '\000' <repeats 11 times>, ctime = '\000' <repeats 11 times>}
(gdb) 

8.写入header完成后,调用finish_header加入checksum域到header中,最后将header内容写入tar文件

9.返回dump_dir,遍历下一个文件,即manpage.mv,和上面的打包文件夹文件一样,调用start_header和finish_header写入类似的header文件。
完成header后,与文件夹文件不同的是,将调用dump_regular_file向文件中写入数据文件manpage.mv。

(gdb) bt
#0  dump_regular_file (fd=9, st=0x7fffffffd1c0) at create.c:1034
#1  0x000000000040ccbf in dump_file0 (st=0x5418f0e6, name=0x66f1a0 "manpage.mv", p=0x66f800 "test_dir/manpage.mv") at create.c:1769
#2  0x000000000040d3f2 in dump_file (parent=0x7fffffffd510, name=0x66f1a0 "manpage.mv", fullname=0x66f800 "test_dir/manpage.mv") at create.c:1955
#3  0x000000000040d2b8 in dump_dir0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1216
#4  dump_dir (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1309
#5  dump_file0 (st=0x5418f0f5, name=0x66e100 "test_dir", p=0x66e100 "test_dir") at create.c:1753
#6  0x000000000040d3f2 in dump_file (parent=0x0, name=0x66e100 "test_dir", fullname=0x66e100 "test_dir") at create.c:1955
#7  0x000000000040d4ad in create_archive () at create.c:1407
#8  0x000000000042460c in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2779
(gdb)
(gdb) p *header
$47 = {
  buffer = "test_dir/manpage.mv", '\000' <repeats 81 times>, "0000644\000\060\060\060\060\060\060\060\000\060\060\060\060\060\060\060\000\060\060\060\060\060\060\060\067\064\067\066\000\061\062\064\060\066\061\067\060\064\067\066\000\000\000\000\000\000\000\000\000\060", '\000' <repeats 100 times>, "ustar  \000root", '\000' <repeats 28 times>, "root", '\000' <repeats 210 times>, header = {
    name = "test_dir/manpage.mv", '\000' <repeats 80 times>, mode = "0000644", uid = "0000000", gid = "0000000", size = "00000007476", mtime = "12406170476", chksum = "\000\000\000\000\000\000\000", 
    typeflag = 48 '0', linkname = '\000' <repeats 99 times>, magic = "ustar ", version = " ", uname = "root", '\000' <repeats 27 times>, gname = "root", '\000' <repeats 27 times>, 
    devmajor = "\000\000\000\000\000\000\000", devminor = "\000\000\000\000\000\000\000", prefix = '\000' <repeats 154 times>},

10.完成header后,与文件夹文件不同的是,将调用dump_regular_file向文件中写入数据文件manpage.mv。

11.按照上面的步骤依次完成剩下的文件夹和文件的写入。



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值