PLT、GOT ELF重定位流程新手入门
引言
这个文章不涉及到汇编基础,只是从新手角度来写了一篇PLT&GOT文章,了解延迟动态绑定。
x64 ELF重定位
首先编译下面的代码,我们先编译一个x64的,gcc -g main.c -o main
,默认是开了PIE
#include <stdio.h>
int main()
{
printf("hello world!");
return 0;
}
使用gdb调试,我们看看第一次调用printf的流程,gdb main
。
因为我们编译时使用-g
加上了调试符号表,此时我们在gdb中使用l
命令来打印代码查看。
我们先将断点置于第4行。输入b 4
,然后输入r
使其运行到断点处。
此时可以看见RIP运行到了哪里,我们需要使用si
命令一直单步到0x55555555514c <main+19> call printf@plt <printf@plt>
,在第4次si
时,我们进入到了printf@plt
。
此时由于延迟绑定的机制,我们之前并没有执行过printf
,所以此时这个jmp
并不会跳转到真实的printf函数地址
。为了更深入的理解,我们看看这次跳转会跳转到哪里去。你可以直接使用si
让它进行跳转,但是这样你就不会理解为什么汇编代码没有变跳转的地址却不同,新手兄弟们可以跟着我一步一步计算,我们首先看看这个jmp是0xe9
还是0xff25
。使用x/16xb 0x555555555030
(由于gdb在使用x命令时会默认你上一次使用的Format,所以如果你只使用x/16b
它默认会打印成10进制)打印16个字节内容,看看此时RIP的内容。
pwndbg> x/16xb 0x555555555030
0x555555555030 <printf@plt>: 0xff 0x25 0xe2 0x2f 0x00 0x00 0x68 0x00
0x555555555038 <printf@plt+8>: 0x00 0x00 0x00 0xe9 0xe0 0xff 0xff 0xff
我们可以看见jmp指令是0xff25
,偏移是0xe2 0x2f 0x00 0x00
,说明这种相对偏移是针对rip
的(在64位下是RIP 相对寻址
),并且要对这个偏移计算后的相对地址解*号。GDB上面给我显示的汇编代码是jmp qword ptr [rip + 0x2fe2]
,如果你使用当前rip+0x2fe2你会得到一个错误的地址,因为x64下的RIP相对寻址需要跳过当前RIP,也就是其实是下一条指令的地址+0x2fe2。详细看这篇文章REX(Register EXtension) 前缀
0x555555555030+0x6(这段指令的长度0xff 0x25 0xe2 0x2f 0x00 0x0 6个字节) = 0x555555555036
0x555555555036 + 0x2fe2 = 0x555555558018
,记住这个地址,之后这个地址内容会变为真实的printf地址。
我们现在读取一下0x555555558018
的内容。
pwndbg> x/gx 0x555555558018
0x555555558018 <printf@got.plt>: 0x0000555555555036
是不是很好玩,此时存的内容就是它的下一条指令地址,看图。
此时我们先不急着继续往下运行,我们可以继续来了解理论知识。在上面的图中我们可以看到,省略掉JMP,其实之后就是
push 0;
#got[0]: 本ELF动态段(.dynamic段)的装载地址
#got[1]:本ELF的link_map数据结构描述符地址
push qword ptr [rip + 0x2fe2] <_GLOBAL_OFFSET_TABLE_+8>
jmp qword ptr [rip + 0x2fe4] <_dl_runtime_resolve_xsave>
gdb很贴心的帮我们打印出来了这些地址是什么,在进行传参后实际上调用的就是_dl_runtime_resolve_xsave
,此时我们就知道了,原来0是index
为什么这个index是0
呢?你可以使用readelf -r main
读一下
我们看见.rela.plt
中只有一个printf@GLIBC_2.2.5 + 0
,如果你的代码中有其他函数调用,比如puts
,那么它也会出现在上面,这个reloc_index
你可以理解为.rela.plt
的索引。我这里有生成好的,所以你们看看就行。
如果是这个程序,到printf那里,push的就是1了。_dl_runtime_resolve_xsave
函数最终会调用_dl_fixup
,完成对printf@got.plt
的修改。
x86 ELF重定位
我们这次关闭pie,因为随机地址我们无法更好的理解这些section.
gcc -m32 -no-pie -g main.c -o main32
我们注意一下.got.plt的地址,由于关闭了pie,我们可以看到地址了,是0x804bff4
-
got[0]: .dynamic段的装载地址
-
got[1]:link_map数据结构描述符地址
-
got[2]:_dl_runtime_resolve函数的地址
-
往后就是剩下的函数地址
我们使用readelf -r
看一下printf 在第2个
,所看到的的地址是0x804c004
。
我们自己算一下:
0x804bff4 + 0x4 + 0x4 + 0x4 = 0x804C000 跳过got的link_map和_dl_runtime_resolve
0x804C000 就是第一个,以此类推printf的地址是正确的。
gdb调试,和x64的过程相同,看图
区别的点在于调用的并不是_dl_runtime_resolve_xsave
而是_dl_runtime_resolve
,第二个push _GLOBAL_OFFSET_TABLE_+4
就是link_map
这里我们看到下面是push 8
,而不是像x64中看到的那样 0、1、2这种索引了。这就很奇怪了,网上的文章明明说这是索引,确实是这样,其实这里传入的是索引偏移
,具体下面的源码会看到。
而且这个结构体大小在x86下正好是8个字节。那么我猜测这就是push 8
的原因,因为printf正好对应索引1,为0x1*0x8
,为了验证这个猜想,我准备看看puts的汇编代码
。
猜想果然没错,0x2 * 0x8 = 0x10
,后面的过程就差不多了,我们继续往下看
源代码
结构体
linkmap
这里先看一下结构就行,下面会细说
struct link_map
{
/* These first few members are part of the protocol with the debugger.
This is the same format used in SVR4. */
ElfW(Addr) l_addr; /* Difference between the address in the ELF
file and the addresses in memory. */
char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
/* All following members are internal to the dynamic linker.
They may change without notice. */
/* This is an element which is only ever different from a pointer to
the very same copy of this type for ld.so when it is used in more
than one namespace. */
struct link_map *l_real;
/* Number of the namespace this link map belongs to. */
//Lmid_t 是 long int x86下占8字节
Lmid_t l_ns;
struct libname_list *l_libname;
/* Indexed pointers to dynamic section.
[0,DT_NUM) are indexed by the processor-independent tags.
[DT_NUM,DT_NUM+DT_THISPROCNUM) are indexed by the tag minus DT_LOPROC.
[DT_NUM+DT_THISPROCNUM,DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM) are
indexed by DT_VERSIONTAGIDX(tagvalue).
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM) are indexed by
DT_EXTRATAGIDX(tagvalue).
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM) are
indexed by DT_VALTAGIDX(tagvalue) and
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM+DT_ADDRNUM)
are indexed by DT_ADDRTAGIDX(tagvalue), see <elf.h>. */
ElfW(Dyn) *l_info[DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM
+ DT_EXTRANUM + DT_VALNUM + DT_ADDRNUM];
const ElfW(Phdr) *l_phdr; /* Pointer to program header table in core. */
ElfW(Addr) l_entry; /* Entry point location. */
ElfW(Half) l_phnum; /* Number of program header entries. */
ElfW(Half) l_ldnum; /* Number of dynamic segment entries. */
/* Array of DT_NEEDED dependencies and their dependencies, in
dependency order for symbol lookup (with and without
duplicates). There is no entry before the dependencies have
been loaded. */
struct r_scope_elem l_searchlist;
/* We need a special searchlist to process objects marked with
DT_SYMBOLIC. */
struct r_scope_elem l_symbolic_searchlist;
/* Dependent object that first caused this object to be loaded. */
struct link_map *l_loader;
/* Array with version names. */
struct r_found_version *l_versions;
unsigned int l_nversions;
/* Symbol hash table. */
Elf_Symndx l_nbuckets;
Elf32_Word l_gnu_bitmask_idxbits;
Elf32_Word l_gnu_shift;
const ElfW(Addr) *l_gnu_bitmask;
union
{
const Elf32_Word *l_gnu_buckets;
const Elf_Symndx *l_chain;
};
union
{
const Elf32_Word *l_gnu_chain_zero;
const Elf_Symndx *l_buckets;
};
unsigned int l_direct_opencount; /* Reference count for dlopen/dlclose. */
enum /* Where this object came from. */
{
lt_executable, /* The main executable program. */
lt_library, /* Library needed by main executable. */
lt_loaded /* Extra run-time loaded shared object. */
} l_type:2;
unsigned int l_relocated:1; /* Nonzero if object's relocations done. */
unsigned int l_init_called:1; /* Nonzero if DT_INIT function called. */
unsigned int l_global:1; /* Nonzero if object in _dl_global_scope. */
unsigned int l_reserved:2; /* Reserved for internal use. */
unsigned int l_phdr_allocated:1; /* Nonzero if the data structure pointed
to by `l_phdr' is allocated. */
unsigned int l_soname_added:1; /* Nonzero if the SONAME is for sure in
the l_libname list. */
unsigned int l_faked:1; /* Nonzero if this is a faked descriptor
without associated file. */
unsigned int l_need_tls_init:1; /* Nonzero if GL(dl_init_static_tls)
should be called on this link map
when relocation finishes. */
unsigned int l_auditing:1; /* Nonzero if the DSO is used in auditing. */
unsigned int l_audit_any_plt:1; /* Nonzero if at least one audit module
is interested in the PLT interception.*/
unsigned int l_removed:1; /* Nozero if the object cannot be used anymore
since it is removed. */
unsigned int l_contiguous:1; /* Nonzero if inter-segment holes are
mprotected or if no holes are present at
all. */
unsigned int l_symbolic_in_local_scope:1; /* Nonzero if l_local_scope
during LD_TRACE_PRELINKING=1
contains any DT_SYMBOLIC
libraries. */
unsigned int l_free_initfini:1; /* Nonzero if l_initfini can be
freed, ie. not allocated with
the dummy malloc in ld.so. */
/* Collected information about own RPATH directories. */
struct r_search_path_struct l_rpath_dirs;
/* Collected results of relocation while profiling. */
struct reloc_result
{
DL_FIXUP_VALUE_TYPE addr;
struct link_map *bound;
unsigned int boundndx;
uint32_t enterexit;
unsigned int flags;
} *l_reloc_result;
/* Pointer to the version information if available. */
ElfW(Versym) *l_versyms;
/* String specifying the path where this object was found. */
const char *l_origin;
/* Start and finish of memory map for this object. l_map_start
need not be the same as l_addr. */
ElfW(Addr) l_map_start, l_map_end;
/* End of the executable part of the mapping. */
ElfW(Addr) l_text_end;
/* Default array for 'l_scope'. */
struct r_scope_elem *l_scope_mem[4];
/* Size of array allocated for 'l_scope'. */
size_t l_scope_max;
/* This is an array defining the lookup scope for this link map.
There are initially at most three different scope lists. */
struct r_scope_elem **l_scope;
/* A similar array, this time only with the local scope. This is
used occasionally. */
struct r_scope_elem *l_local_scope[2];
/* This information is kept to check for sure whether a shared
object is the same as one already loaded. */
dev_t l_dev;
ino64_t l_ino;
/* Collected information about own RUNPATH directories. */
struct r_search_path_struct l_runpath_dirs;
/* List of object in order of the init and fini calls. */
struct link_map **l_initfini;
/* List of the dependencies introduced through symbol binding. */
struct link_map_reldeps
{
unsigned int act;
struct link_map *list[];
} *l_reldeps;
unsigned int l_reldepsmax;
/* Nonzero if the DSO is used. */
unsigned int l_used;
/* Various flag words. */
ElfW(Word) l_feature_1;
ElfW(Word) l_flags_1;
ElfW(Word) l_flags;
/* Temporarily used in `dl_close'. */
int l_idx;
struct link_map_machine l_mach;
struct
{
const ElfW(Sym) *sym;
int type_class;
struct link_map *value;
const ElfW(Sym) *ret;
} l_lookup_cache;
/* Thread-local storage related info. */
/* Start of the initialization image. */
void *l_tls_initimage;
/* Size of the initialization image. */
size_t l_tls_initimage_size;
/* Size of the TLS block. */
size_t l_tls_blocksize;
/* Alignment requirement of the TLS block. */
size_t l_tls_align;
/* Offset of first byte module alignment. */
size_t l_tls_firstbyte_offset;
#ifndef NO_TLS_OFFSET
# define NO_TLS_OFFSET 0
#endif
#ifndef FORCED_DYNAMIC_TLS_OFFSET
# if NO_TLS_OFFSET == 0
# define FORCED_DYNAMIC_TLS_OFFSET -1
# elif NO_TLS_OFFSET == -1
# define FORCED_DYNAMIC_TLS_OFFSET -2
# else
# error "FORCED_DYNAMIC_TLS_OFFSET is not defined"
# endif
#endif
/* For objects present at startup time: offset in the static TLS block. */
ptrdiff_t l_tls_offset;
/* Index of the module in the dtv array. */
size_t l_tls_modid;
/* Number of thread_local objects constructed by this DSO. */
size_t l_tls_dtor_count;
/* Information used to change permission after the relocations are
done. */
ElfW(Addr) l_relro_addr;
size_t l_relro_size;
unsigned long long int l_serial;
/* Audit information. This array apparent must be the last in the
structure. Never add something after it. */
struct auditstate
{
uintptr_t cookie;
unsigned int bindflags;
} l_audit[0];
};
.dynamic
这是一个2级表,比如下面的DT_SYMTAB
,指向的就是.dynsym
表。
typedef struct
{
Elf32_Sword d_tag; /* Dynamic entry type */
union
{
Elf32_Word d_val; /* Integer value */
Elf32_Addr d_ptr; /* Address value */
} d_un;
} Elf32_Dyn;
.dynsym
这个表是通过.dynamic[DT_SYMTAB]
拿到的,里面记录了符号名称在.dynstr中的索引
等等,具体在下面代码中我写了注释。
typedef struct
{
Elf32_Word st_name; /* Symbol name (string tbl index) */
Elf32_Addr st_value; /* Symbol value */
Elf32_Word st_size; /* Symbol size */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf32_Section st_shndx; /* Section index */
} Elf32_Sym;
.dynstr
符号名称表
.rel.plt
typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;
_dl_fixup
这是我自己调试版本的glic代码(/elf/dl-runtime.c
),你也可以自己下载glibc,我在代码中写了一些注释
#ifndef reloc_offset
# define reloc_offset reloc_arg
# define reloc_index reloc_arg / sizeof (PLTREL)
#endif
DL_FIXUP_VALUE_TYPE
__attribute ((noinline)) ARCH_FIXUP_ATTRIBUTE
_dl_fixup (
# ifdef ELF_MACHINE_RUNTIME_FIXUP_ARGS
ELF_MACHINE_RUNTIME_FIXUP_ARGS,
# endif
struct link_map *l, ElfW(Word) reloc_arg)
{
//D_PTR (l, l_info[DT_SYMTAB]) 拿到.dynamic[DT_SYMTAB].d_ptr
//也就是.dynsym
const ElfW(Sym) *const symtab
= (const void *) D_PTR (l, l_info[DT_SYMTAB]);
//.dynstr
const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
//D_PTR (l, l_info[DT_JMPREL]) 就是.rel.plt
//将.rel.plt地址与reloc_offset相加,拿到.rel.plt中对应的结构(Elf32_Rel)
//我这个版本reloc_arg传入的直接就是偏移,而不是索引下标
const PLTREL *const reloc
= (const void *) (D_PTR (l, l_info[DT_JMPREL]) + reloc_offset);
const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
//l->l_addr 基址
//l->l_addr + reloc->r_offset = 需要修改的got表地址
void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
lookup_t result;
DL_FIXUP_VALUE_TYPE value;
/* Sanity check that we're really looking at a PLT relocation. */
assert (ELFW(R_TYPE)(reloc->r_info) == ELF_MACHINE_JMP_SLOT);
/* Look up the target symbol. If the normal lookup rules are not
used don't look in the global scope. */
if (__builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
{
const struct r_found_version *version = NULL;
if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
{
const ElfW(Half) *vernum =
(const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
version = &l->l_versions[ndx];
if (version->hash == 0)
version = NULL;
}
/* We need to keep the scope around so do some locking. This is
not necessary for objects which cannot be unloaded or when
we are not using any threads (yet). */
int flags = DL_LOOKUP_ADD_DEPENDENCY;
if (!RTLD_SINGLE_THREAD_P)
{
THREAD_GSCOPE_SET_FLAG ();
flags |= DL_LOOKUP_GSCOPE_LOCK;
}
#ifdef RTLD_ENABLE_FOREIGN_CALL
RTLD_ENABLE_FOREIGN_CALL;
#endif
//strtab + sym->st_name(字符串在表中的偏移) = 函数字符串
//通过 函数字符串 符号查找获取libc的基址
result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
version, ELF_RTYPE_CLASS_PLT, flags, NULL);
/* We are done with the global scope. */
if (!RTLD_SINGLE_THREAD_P)
THREAD_GSCOPE_RESET_FLAG ();
#ifdef RTLD_FINALIZE_FOREIGN_CALL
RTLD_FINALIZE_FOREIGN_CALL;
#endif
/* Currently result contains the base load address (or link map)
of the object that defines sym. Now add in the symbol
offset. */
//返回 result->l_addr + sym->st_value,得到查找符号的真实地址
// 一般st_value是0,所以一般其实在_dl_lookup_symbol_x之后就可以拿到地址了
value = DL_FIXUP_MAKE_VALUE (result,
sym ? (LOOKUP_VALUE_ADDRESS (result)
+ sym->st_value) : 0);
}
else
{
/* We already found the symbol. The module (and therefore its load
address) is also known. */
value = DL_FIXUP_MAKE_VALUE (l, l->l_addr + sym->st_value);
result = l;
}
/* And now perhaps the relocation addend. */
value = elf_machine_plt_value (l, reloc, value);
if (sym != NULL
&& __builtin_expect (ELFW(ST_TYPE) (sym->st_info) == STT_GNU_IFUNC, 0))
value = elf_ifunc_invoke (DL_FIXUP_VALUE_ADDR (value));
/* Finally, fix up the plt itself. */
if (__glibc_unlikely (GLRO(dl_bind_not)))
return value;
return elf_machine_fixup_plt (l, result, reloc, rel_addr, value);
}
解析过程分析
从上面的源码分析得出,我们首先要通过.dynamic
拿到.dynsym = .dynamic[DT_SYMTAB].d_ptr
,而.dynamic
除了GOT[0]
,其中link_map->l_info
其实也指向它。为了解析上面的源码,我们还是从link_map
分析。
const ElfW(Sym) *const symtab = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
我们现在去gdb看一下
上面说了got[1] = link_map数据结构描述符地址
,这里gdb也直接给我们显示了地址(push dword ptr [_GLOBAL_OFFSET_TABLE_+4] <0x804bff8>
),所以我们不算了,直接读取一下内容,看看link_map
,注意哦这里是要读取[_GLOBAL_OFFSET_TABLE_+4]
,所以用了*
。
我们可以看见l_addr
一般为0,而l_ld
和l_info
是相同的,网上很多文章也是直接用的l_ld
。
然后通过l_info
拿到DT_SYMTAB
等信息,通过上面IDA的图我们知道DT_STRTAB
,DT_SYMTAB
,DT_PLTGOT
,DT_JMPREL
三个结构体(Elf32_Dyn
)分别位于下标索引8,9,13,16
,对应.dynstr
, .dynsym
, .got.plt
, .rel.plt
我们现在关注.rel.plt
,为了方便阅读,我在下面再贴一次上面的源码,我们通过.rel.plt
地址与reloc_offset
相加,拿到.rel.plt中对应的结构(Elf32_Rel)
。
const uintptr_t pltgot = (uintptr_t) D_PTR (l, l_info[DT_PLTGOT]);
//D_PTR (l, l_info[DT_JMPREL]) 就是.rel.plt
//将.rel.plt地址与reloc_offset相加,拿到.rel.plt中对应的结构(Elf32_Rel)
const PLTREL *const reloc
= (const void *) (D_PTR (l, l_info[DT_JMPREL]) + reloc_offset);
上面分析x86的时候我们得知,x86下push 0x8和push 0x10的是索引偏移,而不是索引下标
,GDB读一下就能了解了。
分别对应的是printf
和puts
,结构两个成员分别是r_offset
和r_info
。
此时我们再看源码片段
//通过Elf32_Rel的index获取Elf32_Sym
//symtab = .dynsym
const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
//l->l_addr 基址
//l->l_addr + reloc->r_offset = 需要修改的got表地址
void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
version, ELF_RTYPE_CLASS_PLT, flags, NULL);
要使用r_info
计算位于.dynsym
的下标,每个 .dynsym
中的结构是一个Elf32_Sym
占16字节
。
比如我们拿上面其中一个做计算,sym = symtab[reloc->r_info>>8]
,
.dynsym = 0x0804820c
0x207>>8 = 2
我们得知sym->st_name = 0x27
,DT_STRTAB = 0x804826c
_dl_lookup_symbol_x
通过符号名称查找得到地址,过程结束。