获得e820表
启动日志中包含了e820的相关信息,这段信息在setup_memory_map()中e820_print_map()打印。
dimes | grep e820
就可以得到
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000ba5b1fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000ba5b2000-0x00000000ba5b8fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000ba5b9000-0x00000000bad8dfff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bad8e000-0x00000000bafb5fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000bafb6000-0x00000000ca8a1fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000ca8a2000-0x00000000ca939fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ca93a000-0x00000000ca977fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000ca978000-0x00000000caa3efff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000caa3f000-0x00000000caffefff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cafff000-0x00000000caffffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cb800000-0x00000000cf9fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000022f5fffff] usable
可以注意,只有标示为usable的内存才是可以用的。
获得memblock信息
memblock的信息默认是不会打印的,当然有时候也会有一些擦边的。如果需要看到完整的memblock信息,需要打开memblock_debug。方法是内核参数上添加“memblock=debug”。
dmesg | grep -A 15 "MEMBLOCK configuration"
注意,在x86平台上,我们一共得到了两次结果。
第一次在e820__memblock_setup()函数中。
[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x00000001f9c4e800 reserved size = 0x0000000014aab27c
[ 0.000000] memory.cnt = 0x7
[ 0.000000] memory[0x0] [0x0000000000001000-0x000000000009cfff], 0x000000000009c000 bytes flags: 0x0
[ 0.000000] memory[0x1] [0x0000000000100000-0x00000000ba5b1fff], 0x00000000ba4b2000 bytes flags: 0x0
[ 0.000000] memory[0x2] [0x00000000ba5b9000-0x00000000bad8dfff], 0x00000000007d5000 bytes flags: 0x0
[ 0.000000] memory[0x3] [0x00000000bafb6000-0x00000000ca8a1fff], 0x000000000f8ec000 bytes flags: 0x0
[ 0.000000] memory[0x4] [0x00000000ca93a000-0x00000000ca977fff], 0x000000000003e000 bytes flags: 0x0
[ 0.000000] memory[0x5] [0x00000000cafff000-0x00000000caffffff], 0x0000000000001000 bytes flags: 0x0
[ 0.000000] memory[0x6] [0x0000000100000000-0x000000022f5fffff], 0x000000012f600000 bytes flags: 0x0
[ 0.000000] reserved.cnt = 0x4
[ 0.000000] reserved[0x0] [0x00000000000fd450-0x00000000000fd6bb], 0x000000000000026c bytes flags: 0x0
[ 0.000000] reserved[0x1] [0x00000000000fd740-0x00000000000fd74f], 0x0000000000000010 bytes flags: 0x0
[ 0.000000] reserved[0x2] [0x0000000010f18000-0x0000000024783fff], 0x000000001386c000 bytes flags: 0x0
[ 0.000000] reserved[0x3] [0x000000002f000000-0x000000003023efff], 0x000000000123f000 bytes flags: 0x0
[ 0.000000] memblock_reserve: [0x000000000009d800-0x00000000000fffff] reserve_bios_regions+0x56/0x58
另一次在numa_register_memblk()函数中。你看差别最重要的是这时的memblock携带了NUMA的信息。
[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x00000001f9c4e800 reserved size = 0x0000000014b29800
[ 0.000000] memory.cnt = 0x7
[ 0.000000] memory[0x0] [0x0000000000001000-0x000000000009cfff], 0x000000000009c000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x1] [0x0000000000100000-0x00000000ba5b1fff], 0x00000000ba4b2000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x2] [0x00000000ba5b9000-0x00000000bad8dfff], 0x00000000007d5000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x3] [0x00000000bafb6000-0x00000000ca8a1fff], 0x000000000f8ec000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x4] [0x00000000ca93a000-0x00000000ca977fff], 0x000000000003e000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x5] [0x00000000cafff000-0x00000000caffffff], 0x0000000000001000 bytes on node 0 flags: 0x0
[ 0.000000] memory[0x6] [0x0000000100000000-0x000000022f5fffff], 0x000000012f600000 bytes on node 0 flags: 0x0
[ 0.000000] reserved.cnt = 0x7
[ 0.000000] reserved[0x0] [0x0000000000000000-0x000000000000ffff], 0x0000000000010000 bytes on node 0 flags: 0x0
[ 0.000000] reserved[0x1] [0x0000000000097000-0x000000000009cfff], 0x0000000000006000 bytes on node 0 flags: 0x0
[ 0.000000] reserved[0x2] [0x000000000009d800-0x00000000000fffff], 0x0000000000062800 bytes on node 0 flags: 0x0
[ 0.000000] reserved[0x3] [0x0000000010f18000-0x0000000024783fff], 0x000000001386c000 bytes on node 0 flags: 0x0
[ 0.000000] reserved[0x4] [0x000000002f000000-0x000000003023efff], 0x000000000123f000 bytes on node 0 flags: 0x0
...
获得每个Zone的内存分布
在启动的log中,我们可以获得系统上Zone的分布情况。该信息在free_area_init_nodes()函数中打印。
dmesg | grep -A "Zone ranges:"
结果如下:
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x000000022f5fffff]
[ 0.000000] Movable zone start for each node
获得内存在node上的分布
在启动的log中,我们可以获得系统上node的分布情况。该信息在free_area_init_nodes()函数中打印。
dmesg | grep -A 8 "node ranges"
结果如下:
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009cfff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x00000000ba5b1fff]
[ 0.000000] node 0: [mem 0x00000000ba5b9000-0x00000000bad8dfff]
[ 0.000000] node 0: [mem 0x00000000bafb6000-0x00000000ca8a1fff]
[ 0.000000] node 0: [mem 0x00000000ca93a000-0x00000000ca977fff]
[ 0.000000] node 0: [mem 0x00000000cafff000-0x00000000caffffff]
[ 0.000000] node 0: [mem 0x0000000100000000-0x000000022f5fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000022f5fffff]
可以看到这两段内存空间是不一致的。
这是因为zone的信息只是一个边界的信息,而node的信息是真实可用的物理内存的信息。
获得zonelist的顺序
这个工作貌似现在内核没有现成的打印数据了。我自己写了一个
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 560eafe8234d..3eb3a00a0dd2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5050,6 +5050,28 @@ static void set_zonelist_order(void)
current_zonelist_order = user_zonelist_order;
}
+static void dump_zonelist(pg_data_t *pgdat)
+{
+ int i;
+ struct zonelist *zonelist;
+
+ pr_info("FALLBACK ZONELIST of node[%d]\n", pgdat->node_id);
+
+ zonelist = &pgdat->node_zonelists[ZONELIST_FALLBACK];
+ for (i = 0; zonelist->_zonerefs[i].zone != NULL; i++) {
+ struct zone *z = zonelist->_zonerefs[i].zone;
+ pr_info("Node[%d]: %s", z->zone_pgdat->node_id, z->name);
+ }
+
+ pr_info("NOFALLBACK ZONELIST of node[%d]\n", pgdat->node_id);
+
+ zonelist = &pgdat->node_zonelists[ZONELIST_FALLBACK];
+ for (i = 0; zonelist->_zonerefs[i].zone != NULL; i++) {
+ struct zone *z = zonelist->_zonerefs[i].zone;
+ pr_info("Node[%d]: %s", z->zone_pgdat->node_id, z->name);
+ }
+}
+
static void build_zonelists(pg_data_t *pgdat)
{
int i, node, load;
@@ -5202,12 +5224,14 @@ static int __build_all_zonelists(void *data)
if (self && !node_online(self->node_id)) {
build_zonelists(self);
+ dump_zonelist(self);
}
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
+ dump_zonelist(pgdat);
}
/*
重新编译安装后,你可以看到
[ 0.000000] FALLBACK ZONELIST of node[0]
[ 0.000000] Node[0]: DMA32
[ 0.000000] Node[0]: DMA
[ 0.000000] Node[1]: Normal
[ 0.000000] Node[1]: DMA32
[ 0.000000] NOFALLBACK ZONELIST of node[0]
[ 0.000000] Node[0]: DMA32
[ 0.000000] Node[0]: DMA
[ 0.000000] Node[1]: Normal
[ 0.000000] Node[1]: DMA32
[ 0.000000] FALLBACK ZONELIST of node[1]
[ 0.000000] Node[1]: Normal
[ 0.000000] Node[1]: DMA32
[ 0.000000] Node[0]: DMA32
[ 0.000000] Node[0]: DMA
[ 0.000000] NOFALLBACK ZONELIST of node[1]
[ 0.000000] Node[1]: Normal
[ 0.000000] Node[1]: DMA32
[ 0.000000] Node[0]: DMA32
[ 0.000000] Node[0]: DMA
这是一个有两个node的系统,所以看到有两个node的数据。有意思