COFF格式是windows上目标文件的格式,全称 Common Object File Format
现在windows上的可执行文件格式PE (Protable Executable)和linux上的ELF格式都源于它。
下面我们通过例子的方式来讲解下COFF格式。
先写个小例子
$ cat form.c
int g_init_var = 0x1234;
int g_uninit_var;
extern int Shared;
extern int foo(int I, char *str);
int main(){
static int s_init_var = 0x1235;
static int s_uninit_var;
int loc_init_var = 1;
int loc_uninit_var;
int ret = foo(g_init_var + g_uninit_var + s_init_var + s_uninit_var + loc_init_var + loc_uninit_var + Shared, "xiang");
return ret;
}
编译成coff格式
$ clang form.c -c
$ $ file form.o
form.o: Intel amd64 COFF object file, not stripped, 6 sections, symbol offset=0x1cc, 22 symbols, created Mon Aug 22 08:06:06 2022, 1st section name ".text"
查看段头信息
用llvm-objdump查看段头信息 ( -h Alias for --section-headers)
$ llvm-objdump -h form.o
form.o: file format coff-x86-64
Sections:
Idx Name Size VMA Type
0 .text 0000004c 0000000000000000 TEXT
1 .data 00000008 0000000000000000 DATA
2 .bss 00000008 0000000000000000 BSS
3 .xdata 00000008 0000000000000000 DATA
4 .pdata 0000000c 0000000000000000 DATA
5 .llvm_addrsig 00000006 0000000000000000
这里llvm-objdump打印的比较简要(可能需要加额外参数,我没找到)
我们用objdump把各个段在文件中的偏移也打出来
$ objdump -h form.o
form.o: file format pe-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000053 0000000000000000 0000000000000000 0000012c 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 0000000000000000 0000000000000000 000001c5 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000008 0000000000000000 0000000000000000 00000000 2**2
ALLOC
3 .xdata 00000008 0000000000000000 0000000000000000 000001cd 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .rdata 00000006 0000000000000000 0000000000000000 000001d5 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_DISCARD (COMDAT ??_C@_05NGABEPOA@xiang?$AA@ 10)
5 .pdata 0000000c 0000000000000000 0000000000000000 000001db 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
6 .llvm_addrsig 00000006 0000000000000000 0000000000000000 00000205 2**0
CONTENTS, READONLY, EXCLUDE, NOREAD
Note:COFF中的bss段位置竟如此靠前
各个段的分布图1
? .bss 段的起始地址居然是0,我们在后面可以看到,文件的开头是magic数。
所以这个bss段应该最终改变了位置。
(其实bss段的位置这时还不确定,干脆把起始地址先写成0)
查看段内容
用llvm-objdump查看段内容 ( -s Alias for --full-contents Display the content of each section)
$ llvm-objdump -s form.o
form.o: file format coff-x86-64
Contents of section .text:
0000 4883ec38 c7442434 00000000 c7442430 H..8.D$4.....D$0
0010 01000000 8b0d0000 0000030d 00000000 ................
0020 030d0000 0000030d 00000000 034c2430 .............L$0
0030 034c242c 030d0000 0000488d 15000000 .L$,......H.....
0040 00e80000 00008944 24288b44 24284883 .......D$(.D$(H.
0050 c438c3 .8.
Contents of section .data:
0000 34120000 35120000 4...5...
(对应变量 int g_init_var = 0x1234; static int s_init_var = 0x1235; )
Contents of section .bss:
<skipping contents of bss section at [0000, 0008)>
(为 int g_uninit_var; static int s_uninit_var; 预留了 8字节 空间)
Contents of section .xdata:
0000 01040100 04620000 .....b..
Contents of section .rdata:
0000 7869616e 6700 xiang. // 字符串放在了 .rdata 段
Contents of section .pdata:
0000 00000000 53000000 00000000 ....S.......
Contents of section .llvm_addrsig: // 地址签名 todo
0000 16111314 1215
其中有两个段 xdata 和 pdata 还不大清楚具体的作用。
从 https://docs.microsoft.com/en-us/windows/win32/debug/pe-format 了解到,这两个段与异常信息有关:
.pdata Exception information IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ
.xdata Exception information (free format) IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ
我们对.text中的内容进行反汇编,可以看到它的对应的指令如下
(llvm-objdump -d / --disassemble : Disassemble all executable sections found in the input files)
$ llvm-objdump -d form.o
form.o: file format coff-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 48 83 ec 38 (左边低地址) subq $56, %rsp
4: c7 44 24 34 00 00 00 00 movl $0, 52(%rsp)
c: c7 44 24 30 01 00 00 00 movl $1, 48(%rsp)
14: 8b 0d 00 00 00 00 movl (%rip), %ecx # 0x1a <main+0x1a>
1a: 03 0d 00 00 00 00 addl (%rip), %ecx # 0x20 <main+0x20>
20: 03 0d 00 00 00 00 addl (%rip), %ecx # 0x26 <main+0x26>
26: 03 0d 00 00 00 00 addl (%rip), %ecx # 0x2c <main+0x2c>
2c: 03 4c 24 30 addl 48(%rsp), %ecx
30: 03 4c 24 2c addl 44(%rsp), %ecx
34: 03 0d 00 00 00 00 addl (%rip), %ecx # 0x3a <main+0x3a>
3a: 48 8d 15 00 00 00 00 leaq (%rip), %rdx # 0x41 <main+0x41>
41: e8 00 00 00 00 callq 0x46 <main+0x46>
46: 89 44 24 28 movl %eax, 40(%rsp)
4a: 8b 44 24 28 movl 40(%rsp), %eax
4e: 48 83 c4 38 addq $56, %rsp
52: c3 retq
列出所有信息
从各个段的分布图中我们发现有些地方并不连续,接下来我们用
llvm-readobj -a 来查看该coff文件中一共有哪些东西:
(llvm-readobj -a / --all
Equivalent to setting:
–file-header, --program-headers, --section-headers,
–symbols, --relocations , --dynamic-table, --notes,
–version-info, --unwind, --section-groups and --histogram)
$ llvm-readobj -a form.o
File: form.o
// llvm-project\llvm\include\llvm\Object\COFF.h
// COFFObjectFile::COFFObjectFile(MemoryBufferRef Object)
// Object.Identifier = {Data "C:\\Work\\tests\\form.o" Length=20 }
Format: COFF-x86-64
Arch: x86_64
AddressSize: 64bit
// Magic = Object.Buffer.Data (cast to unsigned char in vs to dump)
// case Magic[0] == 0x64: // x86-64 or ARM64 Windows.
// if (Magic[1] == char(0x86) || Magic[1] == char(0xaa))
// return file_magic::coff_object;
ImageFileHeader {
Machine: IMAGE_FILE_MACHINE_AMD64 (0x8664)
SectionCount: 7
TimeDateStamp: 2022-08-22 08:57:34 (0x630344FE)
PointerToSymbolTable: 0x20B
SymbolCount: 25
StringTableSize: 104
OptionalHeaderSize: 0
Characteristics [ (0x0)
]
}
// llvm/include/llvm/Object/COFF.h
// struct coff_file_header { // cur_offset = 0 size = 20
// support::ulittle16_t Machine; // (unsigned short)*(short*)(mf.Buffer.Data) --> 34404(0x8664)
// support::ulittle16_t NumberOfSections; // (unsigned short)*(short*)(mb.Buffer.Data + 2) --> 6
// support::ulittle32_t TimeDateStamp; // (unsigned int)*(int*)(mb.Buffer.Data + 4) --> 0x630344FE
// support::ulittle32_t PointerToSymbolTable;// (unsigned int)*(int*)(mb.Buffer.Data + 8) --> 0x20B
// support::ulittle32_t NumberOfSymbols; // (unsigned int)*(int*)(mb.Buffer.Data + 12) --> 25
// support::ulittle16_t SizeOfOptionalHeader;// (unsigned short)*(unsigned short*)(mb.Buffer.Data + 16) --> 0
// support::ulittle16_t Characteristics; // (unsigned short)*(unsigned short*)(mb.Buffer.Data + 18) --> 0
//
// bool isImportLibrary() const { return NumberOfSections == 0xffff; }
// };
Sections [ // 段描述符表,也叫段头表,段表
// struct coff_section { // Size = 40
// char Name[COFF::NameSize (8)]; // 段名
// support::ulittle32_t VirtualSize; // 该段被加载到内存后的大小
// support::ulittle32_t VirtualAddress; // 该段被加载到内存后的虚拟地址
// support::ulittle32_t SizeOfRawData; // 该段的原始大小 (在文件中的大小)
// support::ulittle32_t PointerToRawData; // 该原始段的位置 (该段在文件中的位置)
// support::ulittle32_t PointerToRelocations; // 该段的重定位表在文件中的位置
// support::ulittle32_t PointerToLinenumbers; // 该段的行号表在文件中的位置 (debug)
// support::ulittle16_t NumberOfRelocations; // 该段(重定位表中)的重定位项数量
// support::ulittle16_t NumberOfLinenumbers; // 该段(行号表中)的行号数量
// support::ulittle32_t Characteristics; // 标志位,如可读,可执行等
Section {
Number: 1
Name: .text (2E 74 65 78 74 00 00 00)
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 83
PointerToRawData: 0x12C
PointerToRelocations: 0x17F
PointerToLineNumbers: 0x0
RelocationCount: 7
LineNumberCount: 0
Characteristics [ (0x60500020)
IMAGE_SCN_ALIGN_16BYTES (0x500000) // 对齐
IMAGE_SCN_CNT_CODE (0x20) // 代码
IMAGE_SCN_MEM_EXECUTE (0x20000000) // 在内存中可执行
IMAGE_SCN_MEM_READ (0x40000000) // 在内存中可读
]
}
// struct coff_section { // cur_offset = 20 Size = 40
// char Name[COFF::NameSize (8)]; // 段名 (mb.Buffer.Data + 20 ~ 25) = ".text"
// VirtualSize; // 该段被加载到内存后的大小 (mb.Buffer.Data + 28) = 0 (待填)
// VirtualAddress; // 该段被加载到内存后的虚拟地址 (mb.Buffer.Data + 32) = 0 (待填)
// SizeOfRawData; // 该段的原始大小 (在文件中的大小)(mb.Buffer.Data + 36) = 83
// PointerToRawData; // 该原始段的位置 (该段在文件中的位置)(mb.Buffer.Data + 40) = 0x12C (300)
// PointerToRelocations; // 该段的重定位表在文件中的位置 (mb.Buffer.Data + 44) = 0x17F (383)
// PointerToLinenumbers; // 该段的行号表在文件中的位置 (mb.Buffer.Data + 48) = 0 (没有,非debug编译)
// NumberOfRelocations; // 该段(重定位表中)的重定位项数量 (mb.Buffer.Data + 52) = 7
// NumberOfLinenumbers; // 该段(行号表中)的行号数量 (mb.Buffer.Data + 54) = 7
// Characteristics; // 标志位,如可读,可执行等 (mb.Buffer.Data + 56) = 0x60500020
Section { // cur_offset = 60 Size = 40
Number: 2
Name: .data (2E 64 61 74 61 00 00 00) // 段名 (mb.Buffer.Data + 20 ~ 25) = ".data"
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 8
PointerToRawData: 0x1C5
PointerToRelocations: 0x0
PointerToLineNumbers: 0x0
RelocationCount: 0
LineNumberCount: 0
Characteristics [ (0xC0300040)
IMAGE_SCN_ALIGN_4BYTES (0x300000)
IMAGE_SCN_CNT_INITIALIZED_DATA (0x40) // 该段数据初始化过
IMAGE_SCN_MEM_READ (0x40000000)
IMAGE_SCN_MEM_WRITE (0x80000000)
]
}
Section {
Number: 3
Name: .bss (2E 62 73 73 00 00 00 00)
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 8
PointerToRawData: 0x0
PointerToRelocations: 0x0
PointerToLineNumbers: 0x0
RelocationCount: 0
LineNumberCount: 0
Characteristics [ (0xC0300080)
IMAGE_SCN_ALIGN_4BYTES (0x300000)
IMAGE_SCN_CNT_UNINITIALIZED_DATA (0x80) // 该段数据未初始化过
IMAGE_SCN_MEM_READ (0x40000000)
IMAGE_SCN_MEM_WRITE (0x80000000)
]
}
Section {
Number: 4
Name: .xdata (2E 78 64 61 74 61 00 00)
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 8
PointerToRawData: 0x1CD
PointerToRelocations: 0x0
PointerToLineNumbers: 0x0
RelocationCount: 0
LineNumberCount: 0
Characteristics [ (0x40300040)
IMAGE_SCN_ALIGN_4BYTES (0x300000)
IMAGE_SCN_CNT_INITIALIZED_DATA (0x40)
IMAGE_SCN_MEM_READ (0x40000000) // 可以看出该段数据 初始化过且只读
]
}
Section {
Number: 5
Name: .rdata (2E 72 64 61 74 61 00 00) // 只读数据,这里主要是指字符串常量
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 6
PointerToRawData: 0x1D5
PointerToRelocations: 0x0
PointerToLineNumbers: 0x0
RelocationCount: 0
LineNumberCount: 0
Characteristics [ (0x40101040)
IMAGE_SCN_ALIGN_1BYTES (0x100000)
IMAGE_SCN_CNT_INITIALIZED_DATA (0x40)
IMAGE_SCN_LNK_COMDAT (0x1000) // 所谓的COMDAT,是COMDAT段(Section)。
IMAGE_SCN_MEM_READ (0x40000000)
]
}
// 可见与.xdata相比,.rdata 主要多了个IMAGE_SCN_LNK_COMDAT标志.
// COMDAT段被多个目标文件所定义的辅助段。该段的作用是将在多个已编译模块中重复的代码和数据的逻辑块组合在一起。
// COMDAT在C++的虚函数表和模板的编译链接中,起着非常重要的作用。
Section {
Number: 6
Name: .pdata (2E 70 64 61 74 61 00 00)
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 12
PointerToRawData: 0x1DB
PointerToRelocations: 0x1E7
PointerToLineNumbers: 0x0
RelocationCount: 3
LineNumberCount: 0
Characteristics [ (0x40300040)
IMAGE_SCN_ALIGN_4BYTES (0x300000)
IMAGE_SCN_CNT_INITIALIZED_DATA (0x40)
IMAGE_SCN_MEM_READ (0x40000000)
]
}
Section {
Number: 7
Name: .llvm_addrsig (2F 36 32 00 00 00 00 00)
VirtualSize: 0x0
VirtualAddress: 0x0
RawDataSize: 6
PointerToRawData: 0x205
PointerToRelocations: 0x0
PointerToLineNumbers: 0x0
RelocationCount: 0
LineNumberCount: 0
Characteristics [ (0x100800)
IMAGE_SCN_ALIGN_1BYTES (0x100000)
IMAGE_SCN_LNK_REMOVE (0x800) // link时 删除该段的标志
]
}
]
Relocations [
// struct relocation { // size 10
// uint32_t VirtualAddress; // 代码中的位置
// uint32_t SymbolTableIndex;
// uint16_t Type;
// };
Section (1) .text { // cur_offset = 0x17F (383) Size = 28
0x16 IMAGE_REL_AMD64_REL32 g_init_var (17) // IMAGE_REL_AMD64_REL32 = 4
0x1C IMAGE_REL_AMD64_REL32 g_uninit_var (18)
0x22 IMAGE_REL_AMD64_REL32 main.s_init_var (19)
0x28 IMAGE_REL_AMD64_REL32 main.s_uninit_var (20)
0x36 IMAGE_REL_AMD64_REL32 Shared (21)
0x3D IMAGE_REL_AMD64_REL32 ??_C@_05NGABEPOA@xiang?$AA@ (10)
0x42 IMAGE_REL_AMD64_REL32 foo (22)
// 结合文件中的偏移信息,我们发现struct relocation中的数据并没有对齐到数据类型 !(节省空间)
// 比如我们调试 uint16_t Type 发现,IMAGE_REL_AMD64_REL32(4) 居然写在391的位置:
// ? (unsigned char)*(char*)(mb.Buffer.Data + 391) --> '\x4'
// 列出相近数据:
// address: 383 384 385 386 387 388 889 390 391 392 393
// value: 0x16 0 0 0 17 0 0 0 4 0 0x1c
// ? 我们发现下面的 Symbols 总共只列出了17个 Symbol ?SymbolTableIndex 到底是怎么算的?
// todo: check 似乎要把AuxSymbolCount 考虑进去
// VirtualAddress : 结合代码反汇编,我们可以看出第一个位置 0x16 就是g_init_var在代码中的位置:
// 14: 8b 0d “00 00 00 00” movl (%rip), %ecx # 0x1a <main+0x1a>
} ^(0x16)
Section (6) .pdata {
0x0 IMAGE_REL_AMD64_ADDR32NB .text (0)
0x4 IMAGE_REL_AMD64_ADDR32NB .text (0)
0x8 IMAGE_REL_AMD64_ADDR32NB .xdata (6)
}
]
UnwindInformation [
RuntimeFunction {
StartAddress: main (0x0)
EndAddress: main +0x53 (0x4)
UnwindInfoAddress: .xdata (0x8)
UnwindInfo {
Version: 1
Flags [ (0x0)
]
PrologSize: 4
FrameRegister: -
FrameOffset: -
UnwindCodeCount: 1
UnwindCodes [
0x04: ALLOC_SMALL size=56
]
}
}
]
// template <typename SectionNumberType>
// struct coff_symbol { // size 18~20
// union {
// char ShortName[COFF::NameSize];
// StringTableOffset Offset;
// } Name;
// support::ulittle32_t Value;
// SectionNumberType SectionNumber;
// support::ulittle16_t Type;
// uint8_t StorageClass;
// uint8_t NumberOfAuxSymbols;
// };
//
// using coff_symbol16 = coff_symbol<support::ulittle16_t>;
// using coff_symbol32 = coff_symbol<support::ulittle32_t>;
Symbols [
Symbol { // SymTableId 0
Name: .text
Value: 0
Section: .text (1)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 1
AuxSectionDef {
Length: 83
RelocationCount: 7
LineNumberCount: 0
Checksum: 0x7A5AE5D4
Number: 1
Selection: 0x0
}
}
Symbol { // SymTableId 2
Name: .data
Value: 0
Section: .data (2)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 3
AuxSectionDef {
Length: 8
RelocationCount: 0
LineNumberCount: 0
Checksum: 0x2991AFED
Number: 2
Selection: 0x0
}
}
Symbol { // SymTableId 4
Name: .bss
Value: 0
Section: .bss (3)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 5
AuxSectionDef {
Length: 8
RelocationCount: 0
LineNumberCount: 0
Checksum: 0x0
Number: 3
Selection: 0x0
}
}
Symbol { // SymTableId 6
Name: .xdata
Value: 0
Section: .xdata (4)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 7
AuxSectionDef {
Length: 8
RelocationCount: 0
LineNumberCount: 0
Checksum: 0x37887F31
Number: 4
Selection: 0x0
}
}
Symbol { // SymTableId 8
Name: .rdata
Value: 0
Section: .rdata (5)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 9 ?
AuxSectionDef {
Length: 6
RelocationCount: 0
LineNumberCount: 0
Checksum: 0x983C11BC
Number: 5
Selection: Any (0x2)
}
}
Symbol { // SymTableId 10
Name: ??_C@_05NGABEPOA@xiang?$AA@
Value: 0
Section: .rdata (5)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 11
Name: .pdata
Value: 0
Section: .pdata (6)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 12
AuxSectionDef {
Length: 12
RelocationCount: 3
LineNumberCount: 0
Checksum: 0xDBA9F425
Number: 6
Selection: 0x0
}
}
Symbol { // SymTableId 13
Name: .llvm_addrsig
Value: 0
Section: .llvm_addrsig (7)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 1 // SymTableId 14
AuxSectionDef {
Length: 6
RelocationCount: 0
LineNumberCount: 0
Checksum: 0xC4A53851
Number: 7
Selection: 0x0
}
}
Symbol { // SymTableId 15
Name: @feat.00
Value: 0
Section: IMAGE_SYM_ABSOLUTE (-1)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 0
}
Symbol { // SymTableId 16
Name: main
Value: 0
Section: .text (1)
BaseType: Null (0x0)
ComplexType: Function (0x2)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 17
Name: g_init_var
Value: 0
Section: .data (2)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 18
Name: g_uninit_var
Value: 4
Section: .bss (3)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 19
Name: main.s_init_var
Value: 4
Section: .data (2)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 0
}
Symbol { // SymTableId 20
Name: main.s_uninit_var
Value: 0
Section: .bss (3)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: Static (0x3)
AuxSymbolCount: 0
}
Symbol { // SymTableId 21
Name: Shared
Value: 0
Section: IMAGE_SYM_UNDEFINED (0)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 22
Name: foo
Value: 0
Section: IMAGE_SYM_UNDEFINED (0)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: External (0x2)
AuxSymbolCount: 0
}
Symbol { // SymTableId 23
Name: .file
Value: 0
Section: IMAGE_SYM_DEBUG (-2)
BaseType: Null (0x0)
ComplexType: Null (0x0)
StorageClass: File (0x67)
AuxSymbolCount: 1 // SymTableId 24
AuxFileRecord {
FileName: form.c
}
}
]
Addrsig [ // Address signature
Sym: foo (22)
Sym: g_init_var (17)
Sym: main.s_init_var (19)
Sym: main.s_uninit_var (20)
Sym: g_uninit_var (18)
Sym: Shared (21)
]
各段的分布图
最后我们再重新绘制各段的分布图,填补之前的空缺:
各段的分布图2
各段的分布图3