目录
一、概述
这篇文章主要介绍地址空间的布局及其建立过程。
二、地址空间
Linux x86_64下内存的初始化流程可以参照下图:
本文主要是对上图绿色部分的描述。
2.1 物理地址空间布局
在执行setup_arch之前,内核通过e820(int 0xe820)获取BIOS存储的物理内存布局信息,这部分信息在e820__memory_setup会输出,下面是我在vmware 8G内存的CentOS上的输出:
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfecffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bfed0000-0x00000000bfefefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000bfeff000-0x00000000bfefffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000bff00000-0x00000000bfffffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffe0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usable
- 因此上述显示usable的为当前系统对应的内存(ram),下面列出来,有四个部分:
[mem 0x0000000000000000-0x000000000009ebff] usable 635K
[mem 0x0000000000100000-0x00000000bfecffff] usable 3143488K (2.998G)
[mem 0x00000000bff00000-0x00000000bfffffff] usable 1024K
[mem 0x0000000100000000-0x000000023fffffff] usable 5242880K (5G)
总计7.999G内存
上述信息的type字段可以参照下面的代码:
static void __init e820_print_type(enum e820_type type)
{
switch (type) {
case E820_TYPE_RAM: /* Fall through: */
case E820_TYPE_RESERVED_KERN: pr_cont("usable"); break;
case E820_TYPE_RESERVED: pr_cont("reserved"); break;
case E820_TYPE_ACPI: pr_cont("ACPI data"); break;
case E820_TYPE_NVS: pr_cont("ACPI NVS"); break;
case E820_TYPE_UNUSABLE: pr_cont("unusable"); break;
case E820_TYPE_PMEM: /* Fall through: */
case E820_TYPE_PRAM: pr_cont("persistent (type %u)", type); break;
default: pr_cont("type %u", type); break;
}
}
也可以使用cat /proc/iomem可以查看更详细的物理内存使用情况,我把上述usable部分也标注出来(对应System RAM标示):
[root@localhost a]# cat /proc/iomem
00000000-00000fff : reserved
00001000-0009ebff : System RAM //usable
0009ec00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000ca000-000cafff : Adapter ROM
000cb000-000cbfff : Adapter ROM
000cc000-000ccfff : Adapter ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-bfecffff : System RAM //usable
01000000-016b93ee : Kernel code /* kernel image */
016b93ef-01b281bf : Kernel data
01cea000-01fe7fff : Kernel bss
27000000-310fffff : Crash kernel
bfed0000-bfefefff : ACPI Tables
bfeff000-bfefffff : ACPI Non-volatile Storage
bff00000-bfffffff : System RAM //usable
c0000000-febfffff : PCI Bus 0000:00
c0000000-c0007fff : 0000:00:0f.0
c0008000-c000bfff : 0000:00:10.0
e5b00000-e5bfffff : PCI Bus 0000:22
...
e8000000-efffffff : 0000:00:0f.0
e8000000-efffffff : vmwgfx probe
f0000000-f7ffffff : PCI MMCONFIG 0000 [bus 00-7f]
f0000000-f7ffffff : reserved
f0000000-f7ffffff : pnp 00:06
fb500000-fb5fffff : PCI Bus 0000:22
...
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : pnp 00:04
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fffe0000-ffffffff : reserved
100000000-23fffffff : System RAM //usable
至此我们了解了物理地址空间的布局,可以看出整个物理地址空间的范围是0~0x23fffffff,大小是9G,这里一定要明确的概念是物理地址空间范围和ram范围不一定是一致的,如在上面描述的情况下,ram的大小是8G,它被分成4部分分布在整个物理地址空间中。而在具体的物理地址空间的访问过程中使用地址译码的方式——不同的物理地址空间的地址会被具体电路上的控制器识别,如果地址落在dram控制器范围内就访问ram,如果落在其他控制器范围内,就由控制器构造访问对应设备的时序,这也就是MMIO的来源。
Linux使用buddy allocator管理ram物理页,当然其管理的大小不能是8G,因为一些ram被预留做他用:如kernel image。在buddy allocator初始化之前,系统也需要分配内存,此时的内存是交由memblock来管理的,memblock是一个轻量级的内存管理模块,本文无意展开这部分叙述,简单来说memblock管理的内存类型分为两个部分memory和reserved,reserved是memory的一个子集,指示系统预留, 不参与内存管理的那部分内存,kernel提供下面两类接口创建/回收memory和reserved类型的内存:
memblock_add / memblock_remove
memblock_reserve / memblock_free
现在来看初始化流程,在setup_arch中首先就是预留kernel空间,范围是_text 到__bss_stop
memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text);
这里注意这部分预留内存是包含内核页表的:
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep _text
ffffffff81000000 T _text
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep init_level4_pgt
ffffffff81a02000 D init_level4_pgt
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep __bss_stop
ffffffff82000000 B __bss_stop
在接下来的流程中有memblock_reserve/memblock_add的调用,目的上将内存纳入memblock管理,可以通过grub中指定memblock=debug打开log,更方便的是使用下面的方式:
cat /sys/kernel/debug/memblock/memory
cat /sys/kernel/debug/memblock/reserved
TIPS: 在CentOS中默认没有上述两个文件,需要打开
CONFIG_ARCH_DISCARD_MEMBLOCK,重新编译内核。
cat /sys/kernel/debug/memblock/memory
0: 0x0000000000001000..0x000000000009dfff
1: 0x0000000000100000..0x00000000bfecffff
2: 0x00000000bff00000..0x00000000bfffffff
3: 0x0000000100000000..0x000000023fffffff
上图展示了memory类型的memblock,可以看到就是对应开始的ram memory(usable),其添加的过程在下面的函数:
void __init e820__memblock_setup(void)
{
int i;
u64 end;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
end = entry->addr + entry->size;
if (end != (resource_size_t)end)
continue;
if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
continue;
memblock_add(entry->addr, entry->size);
}
}
而memblock_reserve调用分散在各处,这里可以对照cat /proc/iomem的结果:
[root@localhost a]# cat /sys/kernel/debug/memblock/reserved
0: 0x0000000000000000..0x000000000000ffff
1: 0x0000000000098000..0x000000000009dfff
2: 0x000000000009ec00..0x00000000000fffff
3: 0x0000000001000000..0x0000000001ff5fff
4: 0x0000000027000000..0x00000000310fffff
5: 0x000000003146f000..0x0000000034a2ffff
6: 0x00000000bbed0000..0x00000000bfecffff
7: 0x00000000bfff8000..0x00000000bfffffff
8: 0x0000000235600000..0x0000000235620fff
9: 0x0000000235640000..0x0000000235660fff
10: 0x0000000235680000..0x00000002356a0fff
11: 0x00000002356c0000..0x00000002356e0fff
12: 0x0000000235700000..0x0000000235720fff
13: 0x0000000235740000..0x0000000235760fff
14: 0x0000000235780000..0x00000002357a0fff
15: 0x00000002357c0000..0x00000002357e0fff
16: 0x0000000235800000..0x0000000235820fff
17: 0x0000000235840000..0x0000000235860fff
18: 0x0000000235880000..0x00000002358a0fff
19: 0x00000002358c0000..0x00000002358e0fff
20: 0x0000000235900000..0x0000000235920fff
21: 0x0000000235940000..0x0000000235960fff
22: 0x0000000235980000..0x00000002359a0fff
23: 0x00000002359c0000..0x00000002359e0fff
24: 0x0000000235a00000..0x0000000235a20fff
25: 0x0000000235a40000..0x0000000235a60fff
26: 0x0000000235a80000..0x0000000235aa0fff
27: 0x0000000235ac0000..0x0000000235ae0fff
28: 0x0000000235b00000..0x0000000235b20fff
29: 0x0000000235b40000..0x0000000235b60fff
30: 0x0000000235b80000..0x0000000235ba0fff
31: 0x0000000235bc0000..0x0000000235be0fff
32: 0x0000000235c00000..0x0000000235c20fff
33: 0x0000000235c40000..0x0000000235c60fff
34: 0x0000000235c80000..0x0000000235ca0fff
35: 0x0000000235cc0000..0x0000000235ce0fff
36: 0x0000000235d00000..0x0000000235d20fff
37: 0x0000000235d40000..0x0000000235d60fff
38: 0x0000000235d80000..0x0000000235da0fff
39: 0x0000000235dc0000..0x0000000235de0fff
40: 0x0000000235e00000..0x0000000235e20fff
41: 0x0000000235e40000..0x0000000235e60fff
42: 0x0000000235e80000..0x0000000235ea0fff
43: 0x0000000235ec0000..0x0000000235ee0fff
44: 0x0000000235f00000..0x0000000235f20fff
45: 0x0000000235f40000..0x0000000235f60fff
46: 0x0000000235f80000..0x0000000235fa0fff
47: 0x0000000235fc0000..0x0000000235fe0fff
48: 0x0000000236000000..0x0000000236020fff
49: 0x0000000236040000..0x0000000236060fff
50: 0x0000000236080000..0x00000002360a0fff
51: 0x00000002360c0000..0x00000002360e0fff
52: 0x0000000236100000..0x0000000236120fff
53: 0x0000000236140000..0x0000000236160fff
54: 0x0000000236180000..0x00000002361a0fff
55: 0x00000002361c0000..0x00000002361e0fff
56: 0x0000000236200000..0x0000000236220fff
57: 0x0000000236240000..0x0000000236260fff
58: 0x0000000236280000..0x00000002362a0fff
59: 0x00000002362c0000..0x00000002362e0fff
60: 0x0000000236300000..0x0000000236320fff
61: 0x0000000236340000..0x0000000236360fff
62: 0x0000000236380000..0x00000002363a0fff
63: 0x00000002363c0000..0x00000002363e0fff
64: 0x0000000236400000..0x0000000236420fff
65: 0x0000000236440000..0x0000000236460fff
66: 0x0000000236480000..0x00000002364a0fff
67: 0x00000002364c0000..0x00000002364e0fff
68: 0x0000000236500000..0x0000000236520fff
69: 0x0000000236540000..0x0000000236560fff
70: 0x0000000236580000..0x00000002365a0fff
71: 0x00000002365c0000..0x00000002365e0fff
72: 0x0000000236600000..0x0000000236620fff
73: 0x0000000236640000..0x0000000236660fff
74: 0x0000000236680000..0x00000002366a0fff
75: 0x00000002366c0000..0x00000002366e0fff
76: 0x0000000236700000..0x0000000236720fff
77: 0x0000000236740000..0x0000000236760fff
78: 0x0000000236780000..0x00000002367a0fff
79: 0x00000002367c0000..0x00000002367e0fff
80: 0x0000000236800000..0x0000000236820fff
81: 0x0000000236840000..0x0000000236860fff
82: 0x0000000236880000..0x00000002368a0fff
83: 0x00000002368c0000..0x00000002368e0fff
84: 0x0000000236900000..0x0000000236920fff
85: 0x0000000236940000..0x0000000236960fff
86: 0x0000000236980000..0x00000002369a0fff
87: 0x00000002369c0000..0x00000002369e0fff
88: 0x0000000236a00000..0x0000000236a20fff
89: 0x0000000236a40000..0x0000000236a60fff
90: 0x0000000236a80000..0x0000000236aa0fff
91: 0x0000000236ac0000..0x0000000236ae0fff
92: 0x0000000236b00000..0x0000000236b20fff
93: 0x0000000236b40000..0x0000000236b60fff
94: 0x0000000236b80000..0x0000000236ba0fff
95: 0x0000000236bc0000..0x0000000236be0fff
96: 0x0000000236c00000..0x0000000236c20fff
97: 0x0000000236c40000..0x0000000236c60fff
98: 0x0000000236c80000..0x0000000236ca0fff
99: 0x0000000236cc0000..0x0000000236ce0fff
100: 0x0000000236d00000..0x0000000236d20fff
101: 0x0000000236d40000..0x0000000236d60fff
102: 0x0000000236d80000..0x0000000236da0fff
103: 0x0000000236dc0000..0x0000000236de0fff
104: 0x0000000236e00000..0x0000000236e20fff
105: 0x0000000236e40000..0x0000000236e60fff
106: 0x0000000236e80000..0x0000000236ea0fff
107: 0x0000000236ec0000..0x0000000236ee0fff
108: 0x0000000236f00000..0x0000000236f20fff
109: 0x0000000236f40000..0x0000000236f60fff
110: 0x0000000236f80000..0x0000000236fa0fff
111: 0x0000000236fc0000..0x0000000236fe0fff
112: 0x0000000237000000..0x0000000237020fff
113: 0x0000000237040000..0x0000000237060fff
114: 0x0000000237080000..0x00000002370a0fff
115: 0x00000002370c0000..0x00000002370e0fff
116: 0x0000000237100000..0x0000000237120fff
117: 0x0000000237140000..0x0000000237160fff
118: 0x0000000237180000..0x00000002371a0fff
119: 0x00000002371c0000..0x00000002371e0fff
120: 0x0000000237200000..0x0000000237220fff
121: 0x0000000237240000..0x0000000237260fff
122: 0x0000000237280000..0x00000002372a0fff
123: 0x00000002372c0000..0x00000002372e0fff
124: 0x0000000237300000..0x0000000237320fff
125: 0x0000000237340000..0x0000000237360fff
126: 0x0000000237380000..0x00000002373a0fff
127: 0x00000002373c0000..0x00000002373e0fff
128: 0x0000000237400000..0x0000000237420fff
129: 0x0000000237440000..0x0000000237460fff
130: 0x0000000237480000..0x00000002374a0fff
131: 0x00000002374c0000..0x00000002374e0fff
132: 0x0000000237500000..0x0000000237520fff
133: 0x0000000237540000..0x0000000237560fff
134: 0x0000000237580000..0x00000002375a0fff
135: 0x00000002375c0000..0x00000002375e0fff
136: 0x0000000237600000..0x000000023f5fffff
137: 0x000000023f7d3000..0x000000023f7d4fff
138: 0x000000023fbd5800..0x000000023fbd5fff
139: 0x000000023ff1f000..0x000000023ff88fff
140: 0x000000023ff89300..0x000000023ff8935f
141: 0x000000023ff89380..0x000000023ff893df
142: 0x000000023ff89400..0x000000023ff8951f
143: 0x000000023ff89540..0x000000023ff89b47
144: 0x000000023ff89b80..0x000000023ff89b87
145: 0x000000023ff8a640..0x000000023ff8aa93
146: 0x000000023ff8aac0..0x000000023ff8aad3
147: 0x000000023ff8ab00..0x000000023ff8bc53
148: 0x000000023ff8bc80..0x000000023ff8bd13
149: 0x000000023ff8bd40..0x000000023ff8bd5f
150: 0x000000023ff8bd80..0x000000023ff8bd9f
151: 0x000000023ff8bdc0..0x000000023ff8bddf
152: 0x000000023ff8be00..0x000000023ff8be67
153: 0x000000023ff8be80..0x000000023ff8bee7
154: 0x000000023ff8bf00..0x000000023ff8bf67
155: 0x000000023ff8bf80..0x000000023ff8bfe7
156: 0x000000023ff8c000..0x000000023ff8d067
157: 0x000000023ff8d080..0x000000023ff8d0e7
158: 0x000000023ff8d100..0x000000023ff8d167
159: 0x000000023ff8d180..0x000000023ff8d1e7
160: 0x000000023ff8d200..0x000000023ff8d267
161: 0x000000023ff8d280..0x000000023ff8d2e7
162: 0x000000023ff8d300..0x000000023ff8d367
163: 0x000000023ff8d380..0x000000023ff8d3e7
164: 0x000000023ff8d400..0x000000023ff8d6d7
165: 0x000000023ff8d700..0x000000023ff8d742
166: 0x000000023ff8d780..0x000000023ff8d7c0
167: 0x000000023ff8d800..0x000000023fffdfff
168: 0x000000023fffe000..0x000000023fffffff
2.2 kernel image 布局
在继续分析初始化流程之前,需要关注关键部分如kernel image在物理地址的布局情况,实际上,这在上面已经标注出了:
00100000-bfecffff : System RAM
01000000-016b93ee : Kernel code /* kernel image */
016b93ef-01b281bf : Kernel data
01cea000-01fe7fff : Kernel bss
27000000-310fffff : Crash kernel
这里也关注一下kernel image的虚拟地址布局,需要注意的是在这个阶段,内核页表对应kernel image的部分已经建立起来了,这里就只是分析一下二者的对应关系及接下来马上要建立的线性映射。
[arch/x86/kernel/vmlinux.lds.S]
. = __START_KERNEL;
text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .;
_stext = .;
/* bootstrapping code */
HEAD_TEXT
TEXT_TEXT...
/* End of text section */
_etext = .;
} :text = 0x9090
链接脚本指定了kernel image关键section的虚拟地址,先来关注kernel image的起点__START_KERNEL
[arch/x86/include/asm/page_types.h]
#define __PHYSICAL_START ALIGN(CONFIG_PHYSICAL_START, \
CONFIG_PHYSICAL_ALIGN)
#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
[arch/x86/include/asm/page_64_types.h]
/*
* Set __PAGE_OFFSET to the most negative possible address +
* PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a
* hypervisor to fit. Choosing 16 slots here is arbitrary, but it's
* what Xen requires.
*/
#define __PAGE_OFFSET_BASE_L5 _AC(0xff10000000000000, UL)
#define __PAGE_OFFSET_BASE_L4 _AC(0xffff880000000000, UL)
#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
#define __PAGE_OFFSET page_offset_base
#else
#define __PAGE_OFFSET __PAGE_OFFSET_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
可以看到在不考虑__PHYSICAL_START情况下,kernel image起始地址是__START_KERNEL_map:0xffffffff80000000,实际中都在此地址上有偏移:
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/.config | grep CONFIG_PHYSICAL_START
CONFIG_PHYSICAL_START=0x1000000
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/.config | grep CONFIG_PHYSICAL_ALIGN
CONFIG_PHYSICAL_ALIGN=0x200000
因此kernel起始地址是0xffffffff81000000,在本实验环境下(这些变量可以通过System.map查看,其位置位于/usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map):
ffffffff81000000 T _text
ffffffff816c72ff T _etext
ffffffff81b3b4c0 D _edata
ffffffff81a1e518 D _brk_end
ffffffff81d04000 B __bss_start
ffffffff82000000 B __bss_stop
ffffffff82429000 B _end
可以看到我们分析的地址和_text是对应的,而且注意到内核起始物理地址是0x1000000,实际上内核页表的kernel image部分也是线性映射的。我们知道,kernel接管OS后,要打开MMU,要看到kernel image,要先建立内核页表。内核页表的建立有多个阶段,页表起始位置由swapper_pg_dir 指定:
[mm/init-mm.c]
#define swapper_pg_dir init_top_pgt
struct mm_struct init_mm = {
.mm_rb = RB_ROOT,
.pgd = swapper_pg_dir,
.mm_users = ATOMIC_INIT(2),
.mm_count = ATOMIC_INIT(1),
.mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem),
.page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns = &init_user_ns,
INIT_MM_CONTEXT(init_mm)
};
此时可以推断页表的布局是(这部分在arch/x86/kernel/head_64.S中建立的):
对应的虚拟地址:
- (511 << 39) + (510 << 30)= 0xffffffff8000000
了解了这一点,继续看cleanup_highmap
/*
* The head.S code sets up the kernel high mapping:
*
* from __START_KERNEL_map to __START_KERNEL_map + size (== _end-_text)
*
* phys_base holds the negative offset to the kernel, which is added
* to the compile time generated pmds. This results in invalid pmds up
* to the point where we hit the physaddr 0 mapping.
*
* We limit the mappings to the region from _text to _brk_end. _brk_end
* is rounded up to the 2MB boundary. This catches the invalid pmds as
* well, as they are located before _text:
*/
void __init cleanup_highmap(void)
{
unsigned long vaddr = __START_KERNEL_map;
unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
pmd_t *pmd = level2_kernel_pgt;
if (max_pfn_mapped)
vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
if (pmd_none(*pmd))
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
}
}
- 参见注释,这段代码将原来的kernel image映射(_text to _end)修改为 _text to _brk_end,方法是将原来区间的PMD清空
2.3 虚拟地址空间
虚拟内存下只看到kernel image还不够,内核页表还要建立对整个物理内存的页表,这样后续分配page或者slab就不会产生缺页问题了,而对于设备的MMIO也可直接使用了,在Documentation/x86/x86_64/mm.txt给出了虚拟地址的映射关系:
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [47:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
可以看到ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0和我们分析的一致。这里我们关注一下:
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
这部分即是线性映射区,这里在不考虑CONFIG_DYNAMIC_MEMORY_LAYOUT的情况下ffff880000000000 对应PAGE_OFFSET,可以用下图表示:
- (272 << 39) + (0 << 30)= 0xffff880000000000
更为一般化的虚拟地址空间可以参考下图,来自【1】
最终init_mem_mapping就是建立线性地址映射关系:
[arch/x86/mm/init.c]
void __init init_mem_mapping(void)
{
unsigned long end;
probe_page_size_mask();
#ifdef CONFIG_X86_64
end = max_pfn << PAGE_SHIFT;
#else
end = max_low_pfn << PAGE_SHIFT;
#endif
init_memory_mapping(0, ISA_END_ADDRESS);
init_trampoline();
if (memblock_bottom_up()) {
unsigned long kernel_end = __pa_symbol(_end);
memory_map_bottom_up(kernel_end, end);
memory_map_bottom_up(ISA_END_ADDRESS, kernel_end);
} else {
memory_map_top_down(ISA_END_ADDRESS, end);
}
#ifdef CONFIG_X86_64
if (max_pfn > max_low_pfn) {
/* can we preseve max_low_pfn ?*/
max_low_pfn = max_pfn;
}
#else
early_ioremap_page_table_range_init();
#endif
load_cr3(swapper_pg_dir);
__flush_tlb_all();
early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
}
- init_memory_mapping创建页表,做线性映射(偏移PAGE_OFFSET)
- load_cr3(swapper_pg_dir)
- __flush_tlb_all
至此,我们可以用下面的图来表示物理地址和虚拟地址的布局和映射关系:
三、参考
【1】深入Linux内核架构