ELF总览

开始之前

之所以打算深入学习ELF文件格式,根本原因是在研读《程序员的自我修养》这本书时受到了启发。想必很多人跟我一样,在学习编程的起始阶段,很多知识都是书本或者是网络直接获取来的,可惜大多时候这些知识有些让人摸不着头脑,究其原因是因为我们没有实实在在地看到它们的存在。同样的,编程过程中遇到的很多问题,也有相当一部分是对编译链接的知识欠缺导致的。

那么学习ELF文件格式有什么好处呢? 最明显的好处是我们可以真真实实地看到我们编写的程序的最终存在形式,我们可以通过学习分析它来加深对程序的理解。

本系列文章,我第一步都是翻译官方文档,接着通过libelf库来编写相应的程序来真实的打印出信息来加强学习。

 

Introduction 简介

This chapter describes the object file format, called ELF (Executable and Linking Format). There are three main types of object files.

本篇我们大致描述介绍ELF文件。总体来说,ELF文件可以细分为三类。

  • A relocatable file holds code and data suitable for linking with other object files to create an executable or a shared object file. 

可重定位文件。可重定位文件中包含链接器需要的代码和数据信息,链接器根据这些信息将多个可重定位文件通过链接过程生成可执行文件或者生成共享库。

  • An executable file holds a program suitable for execution; the file specifies how exec(BA_OS) creates a program's process image.

可执行文件。顾名思义,可执行文件中包含最后需要执行的程序。它的信息用于指导操作系统如何生成最后的进程映像。

  • A shared object file holds code and data suitable for linking in two contexts. First, the link editor may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image. 

共享库。共享库中的代码和数据主要有两个用途。第一,静态链接器(例如GNU ld)利用共享库和可重定位文件生成最终的目标文件(可以是可执行文件,也可以是生成新的共享库)。第二,动态链接器(例如GNU ld-linux.so)利用共享库和可执行文件来生成最终的进程映像。

Created by the assembler and link editor, object files are binary representations of programs intended to be executed directly on a processor. Programs that require other abstract machines, such as shell scripts, are excluded.

经过编译器和链接器一系列处理后的目标文件(可执行文件或者共享库)就是程序最终的二进制表现形式,这些二进制是处理器可以识别的指令序列。注意,某些程序的执行必须通过解释器的帮助才能执行(例如大家熟知的shell脚本),它们不在我们的讨论范畴。

After the introductory material, this chapter focuses on the file format and how it pertains to building programs. Chapter 5 also describes parts of the object file, concentrating on the information necessary to execute a program.

接下来分两部分讨论ELF文件。第一部分,介绍ELF的文件格式,着重介绍有利于帮助生成最终可执行文件的信息。第二部分,同样是介绍ELF的文件格式,不过着重于程序最终执行的信息。

 

File Format 文件格式

Object files participate in program linking (building a program) and program execution (running a program). For convenience and efficiency, the object file format provides parallel views of a file's contents, reflecting the differing needs of those activities. Figure 1-1 shows an object file's organization.

在生成最终程序的链接阶段和程序最终在操作系统中运行这两个过程中都是需要目标文件参与其中的。因此为了方便与高效,目标文件根据这两个过程分别包含不同的信息,以适应每个过程的需要。图1-1是从两个不同的视角来看目标文件的内容,一个是链接时视角,一个是运行时视角。

An ELF header resides at the beginning and holds a ``road map'' describing the file's organization. Sections hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, and so on. Descriptions of special sections appear later in the chapter. Chapter 5 discusses segments and the program execution view of the file.

ELF头部结构(ELF header)在文件的开头,它就像一个“总览图”一样,描述整个文件的组织结构。节(sections)中所包含的信息主要用于程序的链接,这些信息包括:程序指令,程序数据,符号表,重定位信息等等。在本文档的后续部分,我们先着重描述节的信息结构,介绍各个比较重要的节的作用。然后介绍程序运行时需要的段(segments)信息。


A program header table tells the system how to create a process image. Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one. A section header table contains information describing the file's sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, and so on. Files used during linking must have a section header table; other object files may or may not have one.

程序头表(program header table)是用来告诉操作系统如何生成一个进程映像。可执行文件和共享库必须包含一个程序头表,而可重定位文件则是不需要,因为可重定位文件用于程序链接,不需要运行。

节头表(section header table)中的信息用于描述文件中的所有节。节头表由一个一个的表项组成,一个表项对应文件中一个节,用来描述该节的诸多属性,属性包括节名(section name),节的大小(section size)等等。用于链接阶段的可重定位文件和共享库文件中必须包含有节头表。

 

NOTE:Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file.

警告:尽管从图1-1中看到程序头表直接在ELF头部结构的后边,而节头表在所有节的后边,但实际的文件并不需要遵守这个布局。此外,节和段的布局并没有一个规定的顺序,只有ELF文件头部必须是在文件的开始处。

 

Data Representation 数据表示

As described here, the object file format supports various processors with 8-bit bytes and either 32-bit or 64-bit architectures. Nevertheless, it is intended to be extensible to larger (or smaller) architectures. Object files therefore represent some control data with a machine-independent format, making it possible to identify object files and interpret their contents in a common way. Remaining data in an object file use the encoding of the target processor, regardless of the machine on which the file was created.

理论上,ELF文件格式应该支持所有的处理器和平台。因此,ELF文件中肯定有一些是与机器无关的控制信息,解析这些控制信息的方法在所有平台上都是相同的。 而剩余的文件内容则是处理器相关的,例如大小端格式,这些信息是目标平台相关的,与生成ELF文件的平台无关(例如嵌入式开发中,会在X86平台上利用交叉编译环境生成ARM平台的ELF文件)。

All data structures that the object file format defines follow the ``natural'' size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 8-byte alignment for 8-byte objects, 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of 4 or 8, and so forth. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file.

所有的结构都遵循目标平台的对齐要求。因此有时会显示地填充空白空间来促使4字节大小的成员4字节对齐,8字节大小的成员8字节对齐,这样才能保证最终的结构大小是4字节或者8字节的整数倍。此外,整个结构体在文件中也应该适当的对齐。例如,如果结构体中包含有Elf32_Addr的数据成员,那么这个结构体在文件中应该4字节对齐。


For portability reasons, ELF uses no bit-fields.

考虑到可移植性,ELF数据结构中没有使用位域(bit-fields)。

 

总结:

(1)ELF文件主要分为三类,分别为可重定位文件(例如linux平台的.o目标文件),可执行文件,共享库。

(2)ELF文件的内容可以从两个视角来看,一个是链接视角,一个是运行视角。文件中的节(section)是供链接器(这里指静态连接器,例如GNU的ld链接器)用的(链接视角),而段(segment)是供操作系统参考生成最终的进程映像所用的(运行时视角)。

(3)ELF文件的数据结构理论上支持所有的处理器和平台,有时为了适应目标平台,会有相应的对齐要求。

 

以下小程序打印出上文提到的“与机器无关的控制信息”,其实就是ELF Header结构一开始的那个无符号字符数组(关于ELF Header结构后文会详细介绍),为什么呢? 因为单个字符不牵扯到大小端问题,因此在所有平台上都是一样的解析方法。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <elf.h>
#include <sys/mman.h>
#include <errno.h>
#include <error.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

static const char *osabi[] = {
	[0] = "UNIX System V ABI",
	[1] = "HP-UX",
	[2] = "NetBSD.",
	[3] = "Object uses GNU ELF extensions.",
	[4] = "",
	[5] = "",
	[6] = "Sun Solaris.",
	[7] = "IBM AIX.",
	[8] = "SGI Irix.",
	[9] = "FreeBSD.",
	[10] = "Compaq TRU64 UNIX.",
	[11] = "Novell Modesto.",
	[12] = "OpenBSD.",
	[13 ... 63] = "",
	[64] = "ARM EABI",
	[65 ... 96] = "",
	[97] = "ARM",
};

int main(int argc, const char *argv[])
{
	int fd;
	char *elf_ident;
	struct stat file_status;
	int i;

	if (argc != 2) {
		error(EXIT_FAILURE, 0, "Usage: %s file-name", argv[0]);
	}

	if ((fd = open(argv[1], O_RDONLY)) < 0) {
		error(EXIT_FAILURE, errno, "open %s failed", argv[1]);	
	}

	if (fstat(fd, &file_status) < 0) {
		error(EXIT_FAILURE, errno, "get file %s info err", argv[1]);	
	}

	if ((elf_ident = mmap(NULL, (size_t)file_status.st_size, 
				PROT_READ, MAP_PRIVATE, fd, 0)) == MAP_FAILED) {
		error(EXIT_FAILURE, errno, "mmap file %s err", argv[1]);
	}

	(void)fprintf(stdout, "Magic: ");
	for (i = 0; i < EI_NIDENT; i++) {
		(void)fprintf(stdout, "%02x ", elf_ident[i]);
	}
	(void)fprintf(stdout, "\n");

	if (memcmp(elf_ident, ELFMAG, SELFMAG) != 0) {
		error(EXIT_FAILURE, 0, "Invalid file format");
	}
	
	(void)fprintf(stdout, "EI_CLASS is %s\n", 
				elf_ident[EI_CLASS] == ELFCLASS32 ? "32-bit" : "64-bit");
	(void)fprintf(stdout, "EI_DATA is %s\n", 
				elf_ident[EI_DATA] == ELFDATA2LSB ? "little-endian" : "big-endian");
	(void)fprintf(stdout, "EI_VERSION is %d\n", elf_ident[EI_VERSION]);
	(void)fprintf(stdout, "EI_OSABI is %s\n", osabi[(int)elf_ident[EI_OSABI]]);
	(void)fprintf(stdout, "EI_ABIVERSION is %d\n", elf_ident[EI_ABIVERSION]);

	munmap(elf_ident, (size_t)file_status.st_size);
	close(fd);

	exit(EXIT_SUCCESS);
}

以下是使用libelf库获取的信息程序:

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include "libelf.h"
#include "gelf.h"

int main(int argc, const char *argv[])
{
	int i, fd;
	Elf *pelf = NULL;
	char *elf_ident;
	
	if (argc != 2)
		errx(EXIT_FAILURE, "usage: %s file-name", argv[0]);

	if (elf_version(EV_CURRENT) == EV_NONE)
		errx(EXIT_FAILURE, "ELF library initializztion "
			"failed: %s", elf_errmsg(-1));

	if ((fd = open(argv[1], O_RDONLY | O_NONBLOCK, 0)) < 0)
		errx(EXIT_FAILURE, "open \"%s\" failed", argv[1]);

	if (!(pelf = elf_begin(fd, ELF_C_READ, NULL)))
		errx(EXIT_FAILURE, "elf_begin() failed: %s", elf_errmsg(-1));

	if (elf_kind(pelf) != ELF_K_ELF)
		errx(EXIT_FAILURE, "\"%s\" is not an ELF object.", argv[1]);

	// get elf class ()
	if ((i = gelf_getclass(pelf)) == ELFCLASSNONE)
		errx(EXIT_FAILURE, "getclass() failed: %s.", elf_errmsg(-1));
	// print the elf class
	printf("%s: %d-bit ELF object\n", argv[1], 
								(i == ELFCLASS32) ? 32 : 64);
	// get e_elf_identent
	if ((elf_ident = elf_getident(pelf, NULL)) == NULL)
		errx(EXIT_FAILURE, "getelf_identent() failed: %s.", elf_errmsg(-1));
	// print e_elf_identent
	printf("Magic: ");
	for (i = 0; i <= EI_ABIVERSION; i++) {
		printf("0x%02x ", elf_ident[i]);
	}
	printf("\n");
	for (i = 0; i <= EI_ABIVERSION; i++) {
		switch (i) {
			case 0:
				printf("0x%02x ", elf_ident[i]);
				break;
			case 1 ... 3:
				printf("%c ", elf_ident[i]);
				break;
			case 4:
				printf("%d-bit ", elf_ident[i] == ELFCLASS32 ? 32 : 64);
				break;
			case 5:
				printf("%s ", elf_ident[i] == ELFDATA2LSB ? "LSB" : "MSB");
				break;
			case 6:
				printf("%s ", elf_ident[i] == EV_CURRENT ? "EV_CURRENT" : "EV_NONE");
				break;
			case 7:
				printf("%s ", elf_ident[i] == ELFOSABI_NONE ? "ELFOSABI_NONE" : " ");
				break;
			case 8: 
				printf("%d ", 0);
				break;
		}
	}
	printf("\n");

	elf_end(pelf);	
	exit(EXIT_SUCCESS);
}

libelf源码github地址:https://github.com/astrotycoon/elftoolchain_libelf

参考链接:

System V Application Binary Interface

ELF - OSDev Wiki

工具接口标准(TIS)可执行链接格式(ELF)规范

ELF英汉对照版 转载

ELF文件格式详解

ELF Tutorial

The ELF format - how programs look from the inside

可执行文件(ELF)格式的理解

ELF文件格式

ELF Hello World Tutorial

The ELF format - how programs look from the inside

The 101 of ELF Binaries on Linux: Understanding and Analysis

The Art Of ELF: Analysis and Exploitations

ELF Object File Format》(ELF Support

How is a binary executable organized? Let's explore it!

Acronyms relevant to Executable and Linkable Format (ELF)

The ELF file format

Introduction to the ELF Format : The ELF Header (Part I)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值