Memory – Part 2: Understanding Process memory

本文详细解析Linux中进程内存的虚拟与物理映射过程,包括从虚拟地址到物理内存的转换,以及如何通过`top`和`pmap`等工具获取关键内存统计数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Memory – Part 2: Understanding Process memory

From Virtual to Physical

In the previous article, we introduced a way to classify the memory a process reclaimed. We used 4 quadrants using two axis: private/shared and anonymous/file-backed. We also evoked the complexity of the sharing mechanism and the fact that all memory is basically reclaimed to the kernel.

Everything we talked about was virtual. It was all about reservation of memory addresses, but a reserved address is not always immediately mapped to physical memory by the kernel. Most of the time, the kernel delays the actual allocation of physical memory until the time of the first access (or the time of the first write in some cases)… and even then, this is done with the granularity of a page (commonly 4KiB). Moreover, some pages may be swapped out after being allocated, that means they get written to disk in order to allow other pages to be put in RAM.

As a consequence, knowing the actual size of physical memory used by a process (known as resident memory of the process) is really a hard game… and the sole component of the system that actually knows about it is the kernel (it’s even one of its jobs). Fortunately, the kernel exposes some interfaces that will let you retrieve some statistics about the system or a specific process. This article enters into the depth of the tools provided by the Linux ecosystem to analyze the memory pattern of processes.

On Linux, those data are exposed through the /proc file-system and more specifically by the content of /proc/[pid]/. These directories (one per process) contain some pseudo-files that are API entry points to retrieve information directly from the kernel. The content of the /proc directory is detailed in proc(5) manual page (this content changes from one Linux version to another).

The human front-end for those API calls are tools such as procps (pstoppmap…). These tools display the data retrieved from the kernel with little to no modification. As a consequence they are good entry points to understand how the kernel classifies the memory. In this article we will analyze the memory-related outputs of top and pmap.

top: process statistics

top is a widely known (and used) tool that allows monitoring the system. It displays one line per process with various columns that may contain CPU related, memory related, or more general information.

When running top, you can switch to the memory view by pressing G3. In that view, you will find, among others, the following columns: %MEMVIRTSWAPRESCODEDATASHR. With the exception of SWAP, all these data are extracted from the file/proc/[pid]/statm that exposes some memory related statistics. That file contains 7 numerical fields: size (mapped to VIRT),resident (mapped to RES), share (mapped to SHR), text (mapped to CODE), lib (always 0 on Linux 2.6+), data (mapped to DATA) anddt (always 0 on Linux 2.6+, mapped to nDrt).

Trivial columns

As you may have guessed, some of these columns are trivial to understand. VIRT is the total size of the virtual address space that has been reserved by the process so far. CODE is the size of the executable code of the binary executed by the process. RESis the resident set size, that is the amount of physical memory the kernel considers assigned to the process. As a direct consequence, %MEM is strictly proportional to RES.

The resident set size is computed by the kernel as the sum of two counters. The first one contains the number of anonymous resident pages (MM_ANONPAGES), the second one is the number of file-backed resident pages (MM_FILEPAGES). Some pages may be considered as resident for more than one process at once, so the sum of the RES may be larger than the amount of RAM effectively used, or even larger than the amount of RAM available on the system.

Shared memory

SHR is the amount of resident sharable memory of the process. If you remember well the classification we made in the previous article, you may suppose this includes all the resident memory of the right column. However as already discussed, some private memory may be shared too. So, in order to understand the actual meaning of that column, we must dig a bit deeper in the kernel.

The SHR column is filled with the shared field of /proc/[pid]/statm which itself is the value of the MM_FILEPAGES counter of the kernel, which is one of the two components of the resident size. This just means that this column contains the amount of file-backed resident memory (thus including quadrant 3 and 4).

That’s pretty cool… however remember quadrant 2: shared anonymous memory does exist… the previous definition only includes file-backed memory… and running the following test program shows that the shared anonymous memory is taken into account in the SHR column:

top indicates 50m in both the RES and the SHR columns1

This is due to a subtlety of the Linux kernel. On Linux, a shared anonymous map is actually file-based. The kernel creates a file in a tmpfs (an instance of /dev/zero). The file is immediately unlinked so it cannot be accessed by any other processes unless they inherited the map (by forking). This is quite clever since the sharing is done through the file layer the same way it’s done for shared file-backed mappings (quadrant 4).

A last point, since private file-backed pages that are modified don’t get synced back to disk, they are not file-backed anymore (the are transferred from MM_FILEPAGES counter to MM_ANONPAGES). As a consequence, they don’t account in the SHR anymore.

Note that the man page of top is wrong since it states that SHR may contain non-resident memory: the amount of shared memory available to a task, not all of which is typically resident. It simply reflects memory that could be potentially shared with other processes.

Data

The meaning of the DATA column is quite opaque. The documentation of top states “Data + Stack”… which does not really help since it does not define “Data”. Thus we’ll need to dig once again into the kernel.

That field is computed by the kernel as a difference between two variables: total_vm which is the same as VIRT and shared_vm.shared_vm is somehow similar to SHR in that it shares the definition of the shareable memory, but instead of only accounting the resident part, it contains the sum of all addressed file-backed memory. Moreover, the count is done at the mapping level, not the page one, thus shared_vm does not have the same subtlety as SHR for the modified private file-backed memory. As a consequence shared_vm is the sum of the quadrants 2, 3 and 4. This means that the difference between total_vm and shared_vmis exactly the content of quadrant 1.

The DATA column contains the amount of reserved private anonymous memory. By definition, the private anonymous memory is the memory that is specific to the program and that holds its data. It can only be shared by forking in a copy-on-write fashion. It includes (but is not limited to) the stacks and the heap2. This column does not contain any piece of information about how much memory is actually used by the program, it just tells us that the program reserved some amount of memory, however that memory may be left untouched for a long time.

A typical example of a meaningless DATA value is what happens when a x86_64 program compiled with Address Sanitizer is launched. ASan works by reserving 16TiB of memory, but only use 1 byte of those terabytes per 8-bytes word of memory actually allocated by the process. As a consequence, the output of top looks like this:

Note that the man page of top is once again wrong since it states that DATA is the amount of physical memory devoted to other than executable code, also known as the ‘data resident set’ size or DRS; and we just saw that DATA has no link at all with resident memory.

Swap

SWAP is somehow different from the other ones. That column is supposed to contain the amount of memory of the process that gets swapped out by the kernel. First of all, the content of that column totally depends on both the version of Linux and the version of top you are running. Prior to Linux 2.6.34, the kernel didn’t expose any per-process statistics about the number of pages that were swapped out. Prior to top 3.3.0, top displayed a totally meaningless information here (but that was in accordance with the man page). However, if you use Linux 2.6.34 or later with top 3.3.0 or later, that count is actually the number of pages that were swapped out.

If your top is too old, the SWAP column is filled with the difference between the VIRT and the RES column. This is totally meaningless because that difference effectively contains the amount of memory that has been swapped out, but it also includes the file-backed pages that get unloaded or the pages that are reserved but untouched (and thus have not been actually allocated yet). Some old Linux distributions still have a top with that buggy SWAP value, among them stands the still widely used RHEL5.

If your top is up-to-date but your kernel is too old, the column will always contain 0, which is not really helpful.

If both your kernel and your top are up-to-date, then the column will contain the value of the field VmSwap of the file/proc/[pid]/status. That is maintained by the kernel as a counter that gets incremented each time a page is swapped out and decremented each time a page get swapped in. As a consequence it is accurate and will provide you with an important piece of information: basically if that value is non-0, this means your system is under memory pressure and the memory of your process cannot fit in RAM.

The man page describes SWAP as the non-resident portion of a task’s address space, which is what was implemented prior to top3.3.0, but has nothing to do with the actual amount of memory that has been swapped out. On earlier versions of top, the man page properly explains what is displayed, however the SWAP naming is not appropriate.

pmap: detailed mapping

pmap is another kind of tool. It goes deeper than top by displaying information about each separate mapping of the process. A mapping, in that view is a range of contiguous pages having the same backend (anonymous or file) and the same access modes.

For each mapping, pmap shows the previously listed options as well as the size of the mapping, the amount of resident pages as well as the amount of dirty pages. Dirty pages are pages that have been written to, but have not been synced back to the underlying file yet. As a consequence, the amount of dirty pages is only meaningful for mappings with write-back, that is shared file-backed mappings (quadrant 4).

The source of pmap data can be found in two human-readable files: /proc/[pid]/maps and /proc/[pid]/smaps. While the first file is a simple list of mappings, the second one is a more detailed version with a paragraph per mapping. smaps is available since Linux 2.6.14, which is old enough to be present on all popular distributions.

pmap usage is simple:

  • pmap [pid]: display the content of the /proc/[pid]/maps, but removes the inode and device columns.
  • pmap -x [pid]: this enriches the output by adding some pieces of information from /proc/[pid]/smaps (RSS and Dirty).
  • since pmap 3.3.4 there are -X and -XX to display even more data but there are Linux specific (moreover this seems to be a bit buggy with recent kernel versions).

Basic content

The pmap utility finds its inspiration in a similar command on Solaris and mimics its behavior. Here is the output of pmap and the content of /proc/[pid]/maps for the small program given as example for shared anonymous memory testing:

There are a few interesting points in that output. First of all, pmap‘s choice is to provide the size of the mappings instead of the ranges of addresses and to add the sum of those sizes at the end. This sum is the VIRT size of top: the sum of all the reserved ranges of addresses in the virtual address space.

Each map is associated with a set of modes:

  • r: if set, the map is readable
  • w: if set, the map is writable
  • x: if set, the map contains executable code
  • s: if set, the map is shared (right column in our previous classification). You can notice that pmap only has the s flag, while the kernel exposes two different flags for  shared (s) and private (p) memory.
  • R: if set, the map has no swap space reserved (MAP_NORESERVE flag of mmap), this means that we can get a segmentation fault by accessing that memory if it has not already been mapped to physical memory and the system is out of physical memory.

The first three flags can be manipulated using the mprotect(2) system call and can be set directly in the mmap call.

The last column is the source of the data. In our example, we can notice that pmap does not keep the kernel-specific details. It has three categories of memory: anonstack and file-backed (with the path of file, and the (deleted) tag if the mapped file has been unlinked). In addition to these categories, the kernel has vdsovsyscall and heap. It’s quite a shame that pmap didn’t keep the heapmark since it’s important for programmers (but that is probably in order to be compatible with its Solaris counterpart).

Concerning that last column, we also see that executable files and shared libraries are mapped privately (but this was already spoiled by the previous article) and that different parts of the same file are mapped differently (some parts are even mapped more than once). This is because executable files contain different sections: textdatarodatabss … each has a different meaning and is mapped differently. We will cover those sections in the next post.

Last (but not least), we can see that our shared anonymous memory is actually implemented as a shared mapping of an unlinked copy of /dev/zero.

Extended content

The output of pmap -x contains two additional columns:

The first one is RSS, it tells us how much the mapping contributes to the resident set size (and ultimately provides a sum that is the total memory consumption of the process). As we can see some mappings are only partially mapped in physical memory. The biggest one (our manual mmap) is totally allocated because we touched every single page.

The second new column is Dirty and contains the number of pages from their source. For shared file-backed mappings, dirty pages can be written back to the underlying file if the kernel feels it has to make some room in RAM or that there are too many dirty pages. In that case the page is marked clean. For all the remaining quadrants, since the backend is either anonymous (so no disk-based back-end) or private (so changes are not available to other processes), unloading the dirty pages requires writing them to the swap file3.

This is only a subset of what the kernel actually exposes. A lot more information is present in the smaps file (which make it a bit too verbose to be readable) as you can see in the following snippet:

Adding pmap‘s output to a bug report is often a good idea.

More?

As you can see, understanding the output of top and other tools requires some knowledge of the operating system you are running. Even if top is available on various systems, each version is specific to the system it runs on. For example, on OS X, you will not find the RESDATASHR… columns, but instead some named RPRVTRSHRDRSIZEVPRVTVSIZE (note that those names are somehow a bit less opaque than Linux ones). If you want to dive a bit deeper in Linux memory management, you can read the mm/ directory of the source tree, or traverse Understand the Linux Kernel.

Because this post is quite long, here is a short summary:

  • top‘s man page cannot be trusted.
  • top‘s RES gives you the actual physical RAM used by your process.
  • top‘s VIRT gives you the actual virtual memory reserved by your process.
  • top‘s DATA gives you the amount of virtual private anonymous memory reserved by your process. That memory may or may not be mapped to physical RAM. It corresponds to the amount of memory intended to store process specific data (not shared).
  • top‘s SHR gives you the subset of resident memory that is file-backed (including shared anonymous memory). It represents the amount of resident memory that may be used by other processes.
  • top‘s SWAP column can only be trusted with recent versions of top (3.3.0) and Linux (2.6.34) and is meaningless in all other cases.
  • if you want details about the memory usage of your process, use pmap.

Memory Types

If you are a fan of htop, note that its content the exact same as top‘s. Its man page is also wrong or sometimes at least unclear.

Next: Allocating memory

In these two articles we have covered subjects that are (we do hope) of interest for both developers and system administrators. In the following articles we will start going deeper in how the memory is managed by the developer.


  1. You can try running the same program with no loop (or with a shorter loop) and see how the kernel delays the loading of the pages in physical memory. 

  2. But we will see later that it only partially contains the data segment of the loaded executables 

  3. Clean private file-backed pages can also be unloaded, therefore the next time they get loaded there content may change depending on whether some writes occurred in the file. This is documented in mmap man page through the sentence: It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region

bookmark the  permalink . memory   pmaps   resident memory   top   virtual memory
Chapter 4: Processor Architecture. This chapter covers basic combinational and sequential logic elements, and then shows how these elements can be combined in a datapath that executes a simplified subset of the x86-64 instruction set called “Y86-64.” We begin with the design of a single-cycle datapath. This design is conceptually very simple, but it would not be very fast. We then introduce pipelining, where the different steps required to process an instruction are implemented as separate stages. At any given time, each stage can work on a different instruction. Our five-stage processor pipeline is much more realistic. The control logic for the processor designs is described using a simple hardware description language called HCL. Hardware designs written in HCL can be compiled and linked into simulators provided with the textbook, and they can be used to generate Verilog descriptions suitable for synthesis into working hardware. Chapter 5: Optimizing Program Performance. This chapter introduces a number of techniques for improving code performance, with the idea being that programmers learn to write their C code in such a way that a compiler can then generate efficient machine code. We start with transformations that reduce the work to be done by a program and hence should be standard practice when writing any program for any machine. We then progress to transformations that enhance the degree of instruction-level parallelism in the generated machine code, thereby improving their performance on modern “superscalar” processors. To motivate these transformations, we introduce a simple operational model of how modern out-of-order processors work, and show how to measure the potential performance of a program in terms of the critical paths through a graphical representation of a program. You will be surprised how much you can speed up a program by simple transformations of the C code. Bryant & O’Hallaron fourth pages 2015/1/28 12:22 p. xxiii (front) Windfall Software, PCA ZzTEX 16.2 xxiv Preface Chapter 6: The Memory Hierarchy. The memory system is one of the most visible parts of a computer system to application programmers. To this point, you have relied on a conceptual model of the memory system as a linear array with uniform access times. In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access times. We cover the different types of RAM and ROM memories and the geometry and organization of magnetic-disk and solid state drives. We describe how these storage devices are arranged in a hierarchy. We show how this hierarchy is made possible by locality of reference. We make these ideas concrete by introducing a unique view of a memory system as a “memory mountain” with ridges of temporal locality and slopes of spatial locality. Finally, we show you how to improve the performance of application programs by improving their temporal and spatial locality. Chapter 7: Linking. This chapter covers both static and dynamic linking, including the ideas of relocatable and executable object files, symbol resolution, relocation, static libraries, shared object libraries, position-independent code, and library interpositioning. Linking is not covered in most systems texts, but we cover it for two reasons. First, some of the most confusing errors that programmers can encounter are related to glitches during linking, especially for large software packages. Second, the object files produced by linkers are tied to concepts such as loading, virtual memory, and memory mapping. Chapter 8: Exceptional Control Flow. In this part of the presentation, we step beyond the single-program model by introducing the general concept of exceptional control flow (i.e., changes in control flow that are outside the normal branches and procedure calls). We cover examples of exceptional control flow that exist at all levels of the system, from low-level hardware exceptions and interrupts, to context switches between concurrent processes, to abrupt changes in control flow caused by the receipt of Linux signals, to the nonlocal jumps in C that break the stack discipline. This is the part of the book where we introduce the fundamental idea of a process, an abstraction of an executing program. You will learn how processes work and how they can be created and manipulated from application programs. We show how application programmers can make use of multiple processes via Linux system calls. When you finish this chapter, you will be able to write a simple Linux shell with job control. It is also your first introduction to the nondeterministic behavior that arises with concurrent program execution. Chapter 9: Virtual Memory. Our presentation of the virtual memory system seeks to give some understanding of how it works and its characteristics. We want you to know how it is that the different simultaneous processes can each use an identical range of addresses, sharing some pages but having individual copies of others. We also cover issues involved in managing and manipulating virtual memory. In particular, we cover the operation of storage allocators such as the standard-library malloc and free operations. CovBryant & O’Hallaron fourth pages 2015/1/28 12:22 p. xxiv (front) Windfall Software, PCA ZzTEX 16.2 Preface xxv ering this material serves several purposes. It reinforces the concept that the virtual memory space is just an array of bytes that the program can subdivide into different storage units. It helps you understand the effects of programs containing memory referencing errors such as storage leaks and invalid pointer references. Finally, many application programmers write their own storage allocators optimized toward the needs and characteristics of the application. This chapter, more than any other, demonstrates the benefit of covering both the hardware and the software aspects of computer systems in a unified way. Traditional computer architecture and operating systems texts present only part of the virtual memory story. Chapter 10: System-Level I/O. We cover the basic concepts of Unix I/O such as files and descriptors. We describe how files are shared, how I/O redirection works, and how to access file metadata. We also develop a robust buffered I/O package that deals correctly with a curious behavior known as short counts, where the library function reads only part of the input data. We cover the C standard I/O library and its relationship to Linux I/O, focusing on limitations of standard I/O that make it unsuitable for network programming. In general, the topics covered in this chapter are building blocks for the next two chapters on network and concurrent programming. Chapter 11: Network Programming. Networks are interesting I/O devices to program, tying together many of the ideas that we study earlier in the text, such as processes, signals, byte ordering, memory mapping, and dynamic storage allocation. Network programs also provide a compelling context for concurrency, which is the topic of the next chapter. This chapter is a thin slice through network programming that gets you to the point where you can write a simple Web server. We cover the client-server model that underlies all network applications. We present a programmer’s view of the Internet and show how to write Internet clients and servers using the sockets interface. Finally, we introduce HTTP and develop a simple iterative Web server. Chapter 12: Concurrent Programming. This chapter introduces concurrent programming using Internet server design as the running motivational example. We compare and contrast the three basic mechanisms for writing concurrent programs—processes, I/O multiplexing, and threads—and show how to use them to build concurrent Internet servers. We cover basic principles of synchronization using P and V semaphore operations, thread safety and reentrancy, race conditions, and deadlocks. Writing concurrent code is essential for most server applications. We also describe the use of thread-level programming to express parallelism in an application program, enabling faster execution on multi-core processors. Getting all of the cores working on a single computational problem requires a careful coordination of the concurrent threads, both for correctness and to achieve high performance翻译以上英文为中文
08-05
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值