Linux 内存管理(一)——地址空间

本文主要介绍Linux x86_64下地址空间的布局及其建立过程。包括物理地址空间布局,通过e820获取BIOS存储的物理内存布局信息;kernel image布局,分析其虚拟地址布局及与内核页表的对应关系;虚拟地址空间,建立对整个物理内存的页表及线性映射关系。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

一、概述

二、地址空间

2.1 物理地址空间布局

2.2 kernel image 布局

2.3 虚拟地址空间

三、参考


一、概述

这篇文章主要介绍地址空间的布局及其建立过程。

二、地址空间

Linux x86_64下内存的初始化流程可以参照下图:

本文主要是对上图绿色部分的描述。

2.1 物理地址空间布局

在执行setup_arch之前,内核通过e820(int 0xe820)获取BIOS存储的物理内存布局信息,这部分信息在e820__memory_setup会输出,下面是我在vmware 8G内存的CentOS上的输出:

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfecffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bfed0000-0x00000000bfefefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bfeff000-0x00000000bfefffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bff00000-0x00000000bfffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffe0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usable
  • 因此上述显示usable的为当前系统对应的内存(ram),下面列出来,有四个部分:

[mem 0x0000000000000000-0x000000000009ebff] usable       635K

[mem 0x0000000000100000-0x00000000bfecffff] usable          3143488K (2.998G)

[mem 0x00000000bff00000-0x00000000bfffffff]     usable          1024K

[mem 0x0000000100000000-0x000000023fffffff] usable            5242880K (5G)

总计7.999G内存

上述信息的type字段可以参照下面的代码:

static void __init e820_print_type(enum e820_type type)
{
	switch (type) {
	case E820_TYPE_RAM:		/* Fall through: */
	case E820_TYPE_RESERVED_KERN:	pr_cont("usable");			break;
	case E820_TYPE_RESERVED:	pr_cont("reserved");			break;
	case E820_TYPE_ACPI:		pr_cont("ACPI data");			break;
	case E820_TYPE_NVS:		pr_cont("ACPI NVS");			break;
	case E820_TYPE_UNUSABLE:	pr_cont("unusable");			break;
	case E820_TYPE_PMEM:		/* Fall through: */
	case E820_TYPE_PRAM:		pr_cont("persistent (type %u)", type);	break;
	default:			pr_cont("type %u", type);		break;
	}
}

也可以使用cat /proc/iomem可以查看更详细的物理内存使用情况,我把上述usable部分也标注出来(对应System RAM标示):

[root@localhost a]# cat /proc/iomem 
00000000-00000fff : reserved             
00001000-0009ebff : System RAM           //usable
0009ec00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000ca000-000cafff : Adapter ROM
000cb000-000cbfff : Adapter ROM
000cc000-000ccfff : Adapter ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-bfecffff : System RAM           //usable
  01000000-016b93ee : Kernel code      /* kernel image */
  016b93ef-01b281bf : Kernel data
  01cea000-01fe7fff : Kernel bss
  27000000-310fffff : Crash kernel
bfed0000-bfefefff : ACPI Tables
bfeff000-bfefffff : ACPI Non-volatile Storage
bff00000-bfffffff : System RAM           //usable
c0000000-febfffff : PCI Bus 0000:00
  c0000000-c0007fff : 0000:00:0f.0
  c0008000-c000bfff : 0000:00:10.0
  e5b00000-e5bfffff : PCI Bus 0000:22
  ...
  e8000000-efffffff : 0000:00:0f.0
    e8000000-efffffff : vmwgfx probe
  f0000000-f7ffffff : PCI MMCONFIG 0000 [bus 00-7f]
    f0000000-f7ffffff : reserved
      f0000000-f7ffffff : pnp 00:06
  fb500000-fb5fffff : PCI Bus 0000:22
  ...

  fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : pnp 00:04
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
fffe0000-ffffffff : reserved
100000000-23fffffff : System RAM     //usable

至此我们了解了物理地址空间的布局,可以看出整个物理地址空间的范围是0~0x23fffffff,大小是9G,这里一定要明确的概念是物理地址空间范围和ram范围不一定是一致的,如在上面描述的情况下,ram的大小是8G,它被分成4部分分布在整个物理地址空间中。而在具体的物理地址空间的访问过程中使用地址译码的方式——不同的物理地址空间的地址会被具体电路上的控制器识别,如果地址落在dram控制器范围内就访问ram,如果落在其他控制器范围内,就由控制器构造访问对应设备的时序,这也就是MMIO的来源。

Linux使用buddy allocator管理ram物理页,当然其管理的大小不能是8G,因为一些ram被预留做他用:如kernel image。在buddy allocator初始化之前,系统也需要分配内存,此时的内存是交由memblock来管理的,memblock是一个轻量级的内存管理模块,本文无意展开这部分叙述,简单来说memblock管理的内存类型分为两个部分memory和reserved,reserved是memory的一个子集,指示系统预留, 不参与内存管理的那部分内存,kernel提供下面两类接口创建/回收memory和reserved类型的内存:

  • memblock_add / memblock_remove
  • memblock_reserve / memblock_free

现在来看初始化流程,在setup_arch中首先就是预留kernel空间,范围是_text 到__bss_stop

   memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text); 

这里注意这部分预留内存是包含内核页表的:

cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep _text
ffffffff81000000 T _text
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep init_level4_pgt
ffffffff81a02000 D init_level4_pgt
cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map | grep __bss_stop
ffffffff82000000 B __bss_stop

在接下来的流程中有memblock_reserve/memblock_add的调用,目的上将内存纳入memblock管理,可以通过grub中指定memblock=debug打开log,更方便的是使用下面的方式:

  • cat /sys/kernel/debug/memblock/memory
  • cat /sys/kernel/debug/memblock/reserved

TIPS: 在CentOS中默认没有上述两个文件,需要打开CONFIG_ARCH_DISCARD_MEMBLOCK,重新编译内核。

   cat /sys/kernel/debug/memblock/memory 
   0: 0x0000000000001000..0x000000000009dfff
   1: 0x0000000000100000..0x00000000bfecffff
   2: 0x00000000bff00000..0x00000000bfffffff
   3: 0x0000000100000000..0x000000023fffffff

上图展示了memory类型的memblock,可以看到就是对应开始的ram memory(usable),其添加的过程在下面的函数:

void __init e820__memblock_setup(void)
{
	int i;
	u64 end;

	for (i = 0; i < e820_table->nr_entries; i++) {
		struct e820_entry *entry = &e820_table->entries[i];

		end = entry->addr + entry->size;
		if (end != (resource_size_t)end)
			continue;

		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
			continue;

		memblock_add(entry->addr, entry->size);
	}
}

而memblock_reserve调用分散在各处,这里可以对照cat /proc/iomem的结果:

[root@localhost a]# cat /sys/kernel/debug/memblock/reserved 
   0: 0x0000000000000000..0x000000000000ffff
   1: 0x0000000000098000..0x000000000009dfff
   2: 0x000000000009ec00..0x00000000000fffff
   3: 0x0000000001000000..0x0000000001ff5fff
   4: 0x0000000027000000..0x00000000310fffff
   5: 0x000000003146f000..0x0000000034a2ffff
   6: 0x00000000bbed0000..0x00000000bfecffff
   7: 0x00000000bfff8000..0x00000000bfffffff
   8: 0x0000000235600000..0x0000000235620fff
   9: 0x0000000235640000..0x0000000235660fff
  10: 0x0000000235680000..0x00000002356a0fff
  11: 0x00000002356c0000..0x00000002356e0fff
  12: 0x0000000235700000..0x0000000235720fff
  13: 0x0000000235740000..0x0000000235760fff
  14: 0x0000000235780000..0x00000002357a0fff
  15: 0x00000002357c0000..0x00000002357e0fff
  16: 0x0000000235800000..0x0000000235820fff
  17: 0x0000000235840000..0x0000000235860fff
  18: 0x0000000235880000..0x00000002358a0fff
  19: 0x00000002358c0000..0x00000002358e0fff
  20: 0x0000000235900000..0x0000000235920fff
  21: 0x0000000235940000..0x0000000235960fff
  22: 0x0000000235980000..0x00000002359a0fff
  23: 0x00000002359c0000..0x00000002359e0fff
  24: 0x0000000235a00000..0x0000000235a20fff
  25: 0x0000000235a40000..0x0000000235a60fff
  26: 0x0000000235a80000..0x0000000235aa0fff
  27: 0x0000000235ac0000..0x0000000235ae0fff
  28: 0x0000000235b00000..0x0000000235b20fff
  29: 0x0000000235b40000..0x0000000235b60fff
  30: 0x0000000235b80000..0x0000000235ba0fff
  31: 0x0000000235bc0000..0x0000000235be0fff
  32: 0x0000000235c00000..0x0000000235c20fff
  33: 0x0000000235c40000..0x0000000235c60fff
  34: 0x0000000235c80000..0x0000000235ca0fff
  35: 0x0000000235cc0000..0x0000000235ce0fff
  36: 0x0000000235d00000..0x0000000235d20fff
  37: 0x0000000235d40000..0x0000000235d60fff
  38: 0x0000000235d80000..0x0000000235da0fff
  39: 0x0000000235dc0000..0x0000000235de0fff
  40: 0x0000000235e00000..0x0000000235e20fff
  41: 0x0000000235e40000..0x0000000235e60fff
  42: 0x0000000235e80000..0x0000000235ea0fff
  43: 0x0000000235ec0000..0x0000000235ee0fff
  44: 0x0000000235f00000..0x0000000235f20fff
  45: 0x0000000235f40000..0x0000000235f60fff
  46: 0x0000000235f80000..0x0000000235fa0fff
  47: 0x0000000235fc0000..0x0000000235fe0fff
  48: 0x0000000236000000..0x0000000236020fff
  49: 0x0000000236040000..0x0000000236060fff
  50: 0x0000000236080000..0x00000002360a0fff
  51: 0x00000002360c0000..0x00000002360e0fff
  52: 0x0000000236100000..0x0000000236120fff
  53: 0x0000000236140000..0x0000000236160fff
  54: 0x0000000236180000..0x00000002361a0fff
  55: 0x00000002361c0000..0x00000002361e0fff
  56: 0x0000000236200000..0x0000000236220fff
  57: 0x0000000236240000..0x0000000236260fff
  58: 0x0000000236280000..0x00000002362a0fff
  59: 0x00000002362c0000..0x00000002362e0fff
  60: 0x0000000236300000..0x0000000236320fff
  61: 0x0000000236340000..0x0000000236360fff
  62: 0x0000000236380000..0x00000002363a0fff
  63: 0x00000002363c0000..0x00000002363e0fff
  64: 0x0000000236400000..0x0000000236420fff
  65: 0x0000000236440000..0x0000000236460fff
  66: 0x0000000236480000..0x00000002364a0fff
  67: 0x00000002364c0000..0x00000002364e0fff
  68: 0x0000000236500000..0x0000000236520fff
  69: 0x0000000236540000..0x0000000236560fff
  70: 0x0000000236580000..0x00000002365a0fff
  71: 0x00000002365c0000..0x00000002365e0fff
  72: 0x0000000236600000..0x0000000236620fff
  73: 0x0000000236640000..0x0000000236660fff
  74: 0x0000000236680000..0x00000002366a0fff
  75: 0x00000002366c0000..0x00000002366e0fff
  76: 0x0000000236700000..0x0000000236720fff
  77: 0x0000000236740000..0x0000000236760fff
  78: 0x0000000236780000..0x00000002367a0fff
  79: 0x00000002367c0000..0x00000002367e0fff
  80: 0x0000000236800000..0x0000000236820fff
  81: 0x0000000236840000..0x0000000236860fff
  82: 0x0000000236880000..0x00000002368a0fff
  83: 0x00000002368c0000..0x00000002368e0fff
  84: 0x0000000236900000..0x0000000236920fff
  85: 0x0000000236940000..0x0000000236960fff
  86: 0x0000000236980000..0x00000002369a0fff
  87: 0x00000002369c0000..0x00000002369e0fff
  88: 0x0000000236a00000..0x0000000236a20fff
  89: 0x0000000236a40000..0x0000000236a60fff
  90: 0x0000000236a80000..0x0000000236aa0fff
  91: 0x0000000236ac0000..0x0000000236ae0fff
  92: 0x0000000236b00000..0x0000000236b20fff
  93: 0x0000000236b40000..0x0000000236b60fff
  94: 0x0000000236b80000..0x0000000236ba0fff
  95: 0x0000000236bc0000..0x0000000236be0fff
  96: 0x0000000236c00000..0x0000000236c20fff
  97: 0x0000000236c40000..0x0000000236c60fff
  98: 0x0000000236c80000..0x0000000236ca0fff
  99: 0x0000000236cc0000..0x0000000236ce0fff
 100: 0x0000000236d00000..0x0000000236d20fff
 101: 0x0000000236d40000..0x0000000236d60fff
 102: 0x0000000236d80000..0x0000000236da0fff
 103: 0x0000000236dc0000..0x0000000236de0fff
 104: 0x0000000236e00000..0x0000000236e20fff
 105: 0x0000000236e40000..0x0000000236e60fff
 106: 0x0000000236e80000..0x0000000236ea0fff
 107: 0x0000000236ec0000..0x0000000236ee0fff
 108: 0x0000000236f00000..0x0000000236f20fff
 109: 0x0000000236f40000..0x0000000236f60fff
 110: 0x0000000236f80000..0x0000000236fa0fff
 111: 0x0000000236fc0000..0x0000000236fe0fff
 112: 0x0000000237000000..0x0000000237020fff
 113: 0x0000000237040000..0x0000000237060fff
 114: 0x0000000237080000..0x00000002370a0fff
 115: 0x00000002370c0000..0x00000002370e0fff
 116: 0x0000000237100000..0x0000000237120fff
 117: 0x0000000237140000..0x0000000237160fff
 118: 0x0000000237180000..0x00000002371a0fff
 119: 0x00000002371c0000..0x00000002371e0fff
 120: 0x0000000237200000..0x0000000237220fff
 121: 0x0000000237240000..0x0000000237260fff
 122: 0x0000000237280000..0x00000002372a0fff
 123: 0x00000002372c0000..0x00000002372e0fff
 124: 0x0000000237300000..0x0000000237320fff
 125: 0x0000000237340000..0x0000000237360fff
 126: 0x0000000237380000..0x00000002373a0fff
 127: 0x00000002373c0000..0x00000002373e0fff
 128: 0x0000000237400000..0x0000000237420fff
 129: 0x0000000237440000..0x0000000237460fff
 130: 0x0000000237480000..0x00000002374a0fff
 131: 0x00000002374c0000..0x00000002374e0fff
 132: 0x0000000237500000..0x0000000237520fff
 133: 0x0000000237540000..0x0000000237560fff
 134: 0x0000000237580000..0x00000002375a0fff
 135: 0x00000002375c0000..0x00000002375e0fff
 136: 0x0000000237600000..0x000000023f5fffff
 137: 0x000000023f7d3000..0x000000023f7d4fff
 138: 0x000000023fbd5800..0x000000023fbd5fff
 139: 0x000000023ff1f000..0x000000023ff88fff
 140: 0x000000023ff89300..0x000000023ff8935f
 141: 0x000000023ff89380..0x000000023ff893df
 142: 0x000000023ff89400..0x000000023ff8951f
 143: 0x000000023ff89540..0x000000023ff89b47
 144: 0x000000023ff89b80..0x000000023ff89b87
 145: 0x000000023ff8a640..0x000000023ff8aa93
 146: 0x000000023ff8aac0..0x000000023ff8aad3
 147: 0x000000023ff8ab00..0x000000023ff8bc53
 148: 0x000000023ff8bc80..0x000000023ff8bd13
 149: 0x000000023ff8bd40..0x000000023ff8bd5f
 150: 0x000000023ff8bd80..0x000000023ff8bd9f
 151: 0x000000023ff8bdc0..0x000000023ff8bddf
 152: 0x000000023ff8be00..0x000000023ff8be67
 153: 0x000000023ff8be80..0x000000023ff8bee7
 154: 0x000000023ff8bf00..0x000000023ff8bf67
 155: 0x000000023ff8bf80..0x000000023ff8bfe7
 156: 0x000000023ff8c000..0x000000023ff8d067
 157: 0x000000023ff8d080..0x000000023ff8d0e7
 158: 0x000000023ff8d100..0x000000023ff8d167
 159: 0x000000023ff8d180..0x000000023ff8d1e7
 160: 0x000000023ff8d200..0x000000023ff8d267
 161: 0x000000023ff8d280..0x000000023ff8d2e7
 162: 0x000000023ff8d300..0x000000023ff8d367
 163: 0x000000023ff8d380..0x000000023ff8d3e7
 164: 0x000000023ff8d400..0x000000023ff8d6d7
 165: 0x000000023ff8d700..0x000000023ff8d742
 166: 0x000000023ff8d780..0x000000023ff8d7c0
 167: 0x000000023ff8d800..0x000000023fffdfff
 168: 0x000000023fffe000..0x000000023fffffff

2.2 kernel image 布局

在继续分析初始化流程之前,需要关注关键部分如kernel image在物理地址的布局情况,实际上,这在上面已经标注出了:

00100000-bfecffff : System RAM          
  01000000-016b93ee : Kernel code      /* kernel image */
  016b93ef-01b281bf : Kernel data
  01cea000-01fe7fff : Kernel bss
  27000000-310fffff : Crash kernel

这里也关注一下kernel image的虚拟地址布局,需要注意的是在这个阶段,内核页表对应kernel image的部分已经建立起来了,这里就只是分析一下二者的对应关系及接下来马上要建立的线性映射。

[arch/x86/kernel/vmlinux.lds.S]

. = __START_KERNEL;

text :  AT(ADDR(.text) - LOAD_OFFSET) {
        _text = .;
        _stext = .;
        /* bootstrapping code */
        HEAD_TEXT
        TEXT_TEXT 

        ...

        /* End of text section */
        _etext = .;
    } :text = 0x9090

链接脚本指定了kernel image关键section的虚拟地址,先来关注kernel image的起点__START_KERNEL

[arch/x86/include/asm/page_types.h]

#define __PHYSICAL_START	ALIGN(CONFIG_PHYSICAL_START, \
				      CONFIG_PHYSICAL_ALIGN)
#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)

[arch/x86/include/asm/page_64_types.h]

/*
 * Set __PAGE_OFFSET to the most negative possible address +
 * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
 * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
 * what Xen requires.
 */
#define __PAGE_OFFSET_BASE_L5	_AC(0xff10000000000000, UL)
#define __PAGE_OFFSET_BASE_L4	_AC(0xffff880000000000, UL)

#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
#define __PAGE_OFFSET           page_offset_base
#else
#define __PAGE_OFFSET           __PAGE_OFFSET_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */

#define __START_KERNEL_map	_AC(0xffffffff80000000, UL)

可以看到在不考虑__PHYSICAL_START情况下,kernel image起始地址是__START_KERNEL_map:0xffffffff80000000,实际中都在此地址上有偏移:

cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/.config | grep CONFIG_PHYSICAL_START
CONFIG_PHYSICAL_START=0x1000000

cat /usr/src/kernels/3.10.0-693.21.1.el7.x86_64/.config | grep CONFIG_PHYSICAL_ALIGN
CONFIG_PHYSICAL_ALIGN=0x200000

因此kernel起始地址是0xffffffff81000000,在本实验环境下(这些变量可以通过System.map查看,其位置位于/usr/src/kernels/3.10.0-693.21.1.el7.x86_64/System.map):

ffffffff81000000 T _text
ffffffff816c72ff T _etext

ffffffff81b3b4c0 D _edata

ffffffff81a1e518 D _brk_end

ffffffff81d04000 B __bss_start
ffffffff82000000 B __bss_stop

ffffffff82429000 B _end

可以看到我们分析的地址和_text是对应的,而且注意到内核起始物理地址是0x1000000,实际上内核页表的kernel image部分也是线性映射的。我们知道,kernel接管OS后,要打开MMU,要看到kernel image,要先建立内核页表。内核页表的建立有多个阶段,页表起始位置由swapper_pg_dir 指定:

[mm/init-mm.c]

#define swapper_pg_dir init_top_pgt
struct mm_struct init_mm = {
	.mm_rb		= RB_ROOT,
	.pgd		= swapper_pg_dir,
	.mm_users	= ATOMIC_INIT(2),
	.mm_count	= ATOMIC_INIT(1),
	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
	.user_ns	= &init_user_ns,
	INIT_MM_CONTEXT(init_mm)
};

此时可以推断页表的布局是(这部分在arch/x86/kernel/head_64.S中建立的):

对应的虚拟地址:

  • (511 << 39) + (510 << 30)= 0xffffffff8000000 

了解了这一点,继续看cleanup_highmap

/*
 * The head.S code sets up the kernel high mapping:
 *
 *   from __START_KERNEL_map to __START_KERNEL_map + size (== _end-_text)
 *
 * phys_base holds the negative offset to the kernel, which is added
 * to the compile time generated pmds. This results in invalid pmds up
 * to the point where we hit the physaddr 0 mapping.
 *
 * We limit the mappings to the region from _text to _brk_end.  _brk_end
 * is rounded up to the 2MB boundary. This catches the invalid pmds as
 * well, as they are located before _text:
 */
void __init cleanup_highmap(void)
{
	unsigned long vaddr = __START_KERNEL_map;
	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
	pmd_t *pmd = level2_kernel_pgt;

	if (max_pfn_mapped)
		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);

	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
		if (pmd_none(*pmd))
			continue;
		if (vaddr < (unsigned long) _text || vaddr > end)
			set_pmd(pmd, __pmd(0));
	}
}
  • 参见注释,这段代码将原来的kernel image映射(_text to _end)修改为 _text to _brk_end,方法是将原来区间的PMD清空

2.3 虚拟地址空间

虚拟内存下只看到kernel image还不够,内核页表还要建立对整个物理内存的页表,这样后续分配page或者slab就不会产生缺页问题了,而对于设备的MMIO也可直接使用了,在Documentation/x86/x86_64/mm.txt给出了虚拟地址的映射关系:

Virtual memory map with 4 level page tables:

0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [47:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
				    vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole

可以看到ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0和我们分析的一致。这里我们关注一下:

ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory

这部分即是线性映射区,这里在不考虑CONFIG_DYNAMIC_MEMORY_LAYOUT的情况下ffff880000000000 对应PAGE_OFFSET,可以用下图表示:

  • (272 << 39) + (0 << 30)= 0xffff880000000000

更为一般化的虚拟地址空间可以参考下图,来自【1】

最终init_mem_mapping就是建立线性地址映射关系:

[arch/x86/mm/init.c]

void __init init_mem_mapping(void)
{
	unsigned long end;

	probe_page_size_mask();

#ifdef CONFIG_X86_64
	end = max_pfn << PAGE_SHIFT;
#else
	end = max_low_pfn << PAGE_SHIFT;
#endif

	init_memory_mapping(0, ISA_END_ADDRESS);

	init_trampoline();

	if (memblock_bottom_up()) {
		unsigned long kernel_end = __pa_symbol(_end);
		memory_map_bottom_up(kernel_end, end);
		memory_map_bottom_up(ISA_END_ADDRESS, kernel_end);
	} else {
		memory_map_top_down(ISA_END_ADDRESS, end);
	}

#ifdef CONFIG_X86_64
	if (max_pfn > max_low_pfn) {
		/* can we preseve max_low_pfn ?*/
		max_low_pfn = max_pfn;
	}
#else
	early_ioremap_page_table_range_init();
#endif

	load_cr3(swapper_pg_dir);
	__flush_tlb_all();

	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
}
  • init_memory_mapping创建页表,做线性映射(偏移PAGE_OFFSET)
  • load_cr3(swapper_pg_dir)
  • __flush_tlb_all

至此,我们可以用下面的图来表示物理地址和虚拟地址的布局和映射关系:

三、参考

【1】深入Linux内核架构

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值