前言
一次意外执行了 malloc(0x5000)
,结构使用 gdb
调试发现其分配的位置在 TLS
区域,这令我不解(:最后去看了下 malloc
源码和 mmap
源码实现,发现似乎可能是 gdb
插件的问题,乐
更新
这里 gdb
插件没错,最近没事重新对 mmap
进行完整的审计,并且编译了一个内核对其进行了简单的调试,发现相同属性的 vma
会进行合并从而减少 vm_area_struct
的内存开销,当然合并的要求还是比较严苛的:
/*
* Given a mapping request (addr,end,vm_flags,file,pgoff), figure out
* whether that can be merged with its predecessor or its successor.
* Or both (it neatly fills a hole).
*
* In most cases - when called for mmap, brk or mremap - [addr,end) is
* certain not to be mapped by the time vma_merge is called; but when
* called for mprotect, it is certain to be already mapped (either at
* an offset within prev, or at the start of next), and the flags of
* this area are about to be changed to vm_flags - and the no-change
* case has already been eliminated.
*
* The following mprotect cases have to be considered, where AAAA is
* the area passed down from mprotect_fixup, never extending beyond one
* vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
*
* AAAA AAAA AAAA AAAA
* PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPNNNNNN PPPPNNNNXXXX
* cannot merge might become might become might become
* PPNNNNNNNNNN PPPPPPPPPPNN PPPPPPPPPPPP 6 or
* mmap, brk or case 4 below case 5 below PPPPPPPPXXXX 7 or
* mremap move: PPPPXXXXXXXX 8
* AAAA
* PPPP NNNN PPPPPPPPPPPP PPPPPPPPNNNN PPPPNNNNNNNN
* might become case 1 below case 2 below case 3 below
*
* It is important for case 8 that the vma NNNN overlapping the
* region AAAA is never going to extended over XXXX. Instead XXXX must
* be extended in region AAAA and NNNN must be removed. This way in
* all cases where vma_merge succeeds, the moment vma_adjust drops the
* rmap_locks, the properties of the merged vma will be already
* correct for the whole merged range. Some of those properties like
* vm_page_prot/vm_flags may be accessed by rmap_walks and they must
* be correct for the whole merged range immediately after the
* rmap_locks are released. Otherwise if XXXX would be removed and
* NNNN would be extended over the XXXX range, remove_migration_ptes
* or other rmap walkers (if working on addresses beyond the "end"
* parameter) may establish ptes with the wrong permissions of NNNN
* instead of the right permissions of XXXX.
*/
struct vm_area_struct *vma_merge(struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
struct vm_area_struct *area, *next;
int err;
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
*/
if (vm_flags & VM_SPECIAL)
return NULL;
if (prev)
next = prev->vm_next;
else
next = mm->mmap;
area = next;
if (area && area->vm_end == end) /* cases 6, 7, 8 */
next = next->vm_next;
/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
VM_WARN_ON(area && end > area->vm_end);
VM_WARN_ON(addr >= end);
/*
* Can it merge with the predecessor?
*/
if (prev && prev->vm_end == addr &&
mpol_equal(vma_policy(prev), policy) &&
can_vma_merge_after(prev, vm_flags,
anon_vma, file, pgoff,
vm_userfaultfd_ctx)) {
/*
* OK, it can. Can we now merge in the successor as well?
*/
if (next && end == next->vm_start &&
mpol_equal(policy, vma_policy(next)) &&
can_vma_merge_before(next, vm_flags,
anon_vma, file,
pgoff+pglen,
vm_userfaultfd_ctx) &&
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) {
/* cases 1, 6 */
err = __vma_adjust(prev, prev->vm_start,
next->vm_end, prev->vm_pgoff, NULL,
prev);
} else /* cases 2, 5, 7 */
err = __vma_adjust(prev, prev->vm_start,
end, prev->vm_pgoff, NULL, prev);
if (err)
return NULL;
khugepaged_enter_vma_merge(prev, vm_flags);
return prev;
}
/*
* Can this new request be merged in front of next?
*/
if (next && end == next->vm_start &&
mpol_equal(policy, vma_policy(next)) &&
can_vma_merge_before(next, vm_flags,
anon_vma, file, pgoff+pglen,
vm_userfaultfd_ctx)) {
if (prev && addr < prev->vm_end) /* case 4 */
err = __vma_adjust(prev, prev->vm_start,
addr, prev->vm_pgoff, NULL, next);
else { /* cases 3, 8 */
err = __vma_adjust(area, addr, next->vm_end,
next->vm_pgoff - pglen, NULL, next);
/*
* In case 3 area is already equal to next and
* this is a noop, but in case 8 "area" has
* been removed and next was expanded over it.
*/
area = next;
}
if (err)
return NULL;
khugepaged_enter_vma_merge(area, vm_flags);
return area;
}
return NULL;
}
场景复现
#include <stdio.h>
int main() {
malloc(0x5000);
return 0;
}
调试:
wtf
不应该啊,这咋能分配到 tls
区域去了呢?
问题探索
通过对 malloc
源码的查看,发现其超过了 mmap_threshold
,所有最后走的 mmap
映射的一段私有匿名区域(这里大家应该都很熟悉了)所以不应该分配到 tls
区域去啊,在与 henry
师傅交流后,觉得可能与 mmap
的行为有关,于是说干就干,简单地去审了一下 mmap
的源码,最后发现可能是插件的问题
mmap
源码分析就不写了,我也没做笔记,毕竟是开源的,需要时自己去看看就行了,而且我看完 mmap
源码时已经快凌晨 3 点了,就简单说下 mmap
是如何分配虚拟内存的。在内核态,为进程维护了一颗红黑树,其根据地址组织了空闲的 vma(struct vm_area_struct 结构体表示)
,而在我的 ubuntu 22.04
上,文件与匿名映射区是从高地址往低地址增长的,所以 mmap
在寻找虚拟内存区域时是从高地址往低地址扫描的
知道这之后一切都可以解释了,简单调试一下,啥也不执行,看下内存布局:把随机化关了,方便调试
可以看到在 tls
前后存在 0x17e000
和 0x11000
的空间,我们来验证一下:
#include <stdio.h>
#include <sys/mman.h>
int main() {
void *p0 = malloc(0x17e000-0x1000);
void *p1 = mmap(0, 0x11000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
printf("p0: %p\np1: %p\n", p0, p1);
return 0;
}
我的机器上
mmap_threshold = 128kb
,所以malloc(0x11000-0x1000)
会走brk
,所以这里我直接使用mmap
关于这里 malloc
分配为啥要减去 0x1000
请自行审计 sysmalloc
源码(大家应该比较熟悉)
调试结果如下:
可以看到 mmap
的 0x11000
区域就是 tls
下方的区域,malloc
的 0x17e000
就是 tls
前面的那块区域,这里的 gdb
插件把其当作 TLS
了似乎。当然这里可能就是 tls
预留的,我没有具体调试内核,仅仅静态分析了 mmap
的代码。当然我通过分配一些线程局部变量调整了一下 tls
区域的位置,然后也是这样的:
#include <sys/mman.h>
__thread char p0[0x17e000+0x1000];
int main() {
void *p1 = mmap(0, 0x193000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
printf("p0: %p\np1: %p\n", p0, p1);
return 0;
}
调试结果:
所以当我们 malloc
超过 0x17e000-0x1000
时,就会在 libc
前面映射一段空间,也就是大家打 CTF
时说的 malloc
一块大堆块,会使用 mmap
在 libc
前面分配一个堆块,一般网上都是说分配 0x200000
,其实自己简单算一算就知道了,验证一下:
#include <stdio.h>
#include <sys/mman.h>
int main() {
void *p0 = malloc(0x17e000);
printf("p0: %p\n", p0);
return 0;
}
调试结果如下:
一切都是吻合的
一个有趣的 demo
这里给大家看一个 demo
,感兴趣的可以自行研究下背后的原因
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int show() {
FILE *maps_file;
char line[256];
char pathname[256];
maps_file = fopen("/proc/self/maps", "r");
if (maps_file == NULL) {
perror("Failed to open /proc/self/maps");
return 1;
}
while (fgets(line, sizeof(line), maps_file)) {
if (sscanf(line, "%*x-%*x %*s %*x %*x:%*x %*u %s", pathname) == 1) {
if (strcmp(pathname, "[stack]") == 0 || strcmp(pathname, "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2") == 0) {
printf("%s", line);
}
}
}
fclose(maps_file);
return 0;
}
int main() {
uint64_t addr;
show();
puts("==========================================");
scanf("%llx", &addr);
*(uint64_t*)addr = 0xdeadbeef;
puts("==========================================");
show();
return 0;
}
输出如下:
总结
mmap
的源码还是比较复杂的,我只是看了我感兴趣的部分,有兴趣的同学可以自己去看看源码,本来想说搭建环境调调 mmap
的,最后还是算了,太累了,睡觉了(:
当然这个东西本身连技术都不算,之所以记录一下,也算是回答自己刚开始学习时的一些疑问,记得当时看网上说 malloc
要分配 0x20000
才能够保证分配的空间在 libc
前面,其实根本没有问过自己为什么?也没有想过 mmap
底层是如何进行虚拟内存分配的?
当然其实现在来解决这些问题也挺好的,毕竟有一些基础,如果最开始就去看 mmap
的源码,相信我早就被劝退了