The role of the Linux page cache is to speed up access to files on disk. Memory mapped files are read a page at a time and these pages are stored in the page cache. Figure 3.6 shows that the page cache consists of the page_hash_table, a vector of pointers to mem_map_t data structures. (See include/linux/pagemap.h) Each file in Linux is identified by a VFS inode data structure (described in Chapter filesystem-chapter) and each VFS inode is unique and fully describes one and only one file. The index into the page table is derived from the file's VFS inode and the offset into the file. Whenever a page is read from a memory mapped file, for example when it needs to be brought back into memory during demand paging, the page is read through the page cache. If the page is present in the cache, a pointer to the mem_map_t data structure representing it is returned to the page fault handling code. Otherwise the page must be brought into memory from the file system that holds the image. Linux allocates a physical page and reads the page from the file on disk. If it is possible, Linux will initiate a read of the next page in the file. This single page read ahead means that if the process is accessing the pages in the file serially, the next page will be waiting in memory for the process. Over time the page cache grows as images are read and executed. Pages will be removed from the cache as they are no longer needed, say as an image is no longer being used by any process. As Linux uses memory it can start to run low on physical pages. In this case Linux will reduce the size of the page cache. 3.8 Swapping Out and Discarding Pages When physical memory becomes scarce the Linux memory management subsystem must attempt to free physical pages. This task falls to the kernel swap daemon (kswapd). (See kswapd() in mm/vmscan.c) The kernel swap daemon is a special type of process, a kernel thread. Kernel threads are processes have no virtual memory, instead they run in kernel mode in the physical address space. The kernel swap daemon is slightly misnamed in that it does more than merely swap pages out to the system's swap files. Its role is make sure that there are enough free pages in the system to keep the memory management system operating efficiently. The Kernel swap daemon (kswapd) is started by the kernel init process at startup time and sits waiting for the kernel swap timer to periodically expire. Every time the timer expires, the swap daemon looks to see if the number of free pages in the system is getting too low. It uses two variables, free_pages_high and free_pages_low to decide if it should free some pages. So long as the number of free pages in the system remains above free_pages_high, the kernel swap daemon does nothing; it sleeps again until its timer next expires. For the purposes of this check the kernel swap daemon takes into account the number of pages currently being written out to the swap file. It keeps a count of these in nr_async_pages; this is incremented each time a page is queued waiting to be written out to the swap file and decremented when the write to the swap device has completed. free_pages_low and free_pages_high are set at system startup time and are related to the number of physical pages in the system. If the number of free pages in the system has fallen below free_pages_high or worse still free_pages_low, the kernel swap daemon will try three ways to reduce the number of physical pages being used by the system: l Reducing the size of the buffer and page caches, l Swapping out System V shared memory pages, l Swapping out and discarding pages. If the number of free pages in the system has fallen below free_pages_low, the kernel swap daemon will try to free 6 pages before it next runs. Otherwise it will try to free 3 pages. Each of the above methods are tried in turn until enough pages have been freed. The kernel swap daemon remembers which method it was using the last time that it attempted to free physical pages. Each time it runs it will start trying to free pages using this last successful method. After it has free sufficient pages, the swap daemon sleeps again until its timer expires. If the reason that the kernel swap daemon freed pages was that the number of free pages in the system had fallen below free_pages_low, it only sleeps for half its usual time. Once the number of free pages is more than free_pages_low the kernel swap daemon goes back to sleeping longer between checks.
3.8.1
Reducing the Size of the Page and Buffer Caches The pages held in the page and buffer caches are good candidates for being freed into the free_area vector. The Page Cache, which contains pages of memory mapped files, may contain unneccessary pages that are filling up the system's memory. Likewise the Buffer Cache, which contains buffers read from or being written to physical devices, may also contain unneeded buffers. When the physical pages in the system start to run out, discarding pages from these caches is relatively easy as it requires no writing to physical devices (unlike swapping pages out of memory). Discarding these pages does not have too many harmful side effects other than making access to physical devices and memory mapped files slower. However, if the discarding of pages from these caches is done fairly, all processes will suffer equally. (See shrink_map() in mm/filemap.c) Every time the Kernel swap daemon tries to shrink these caches it examines a block of pages in the mem_map page vector to see if any can be discarded from physical memory. The size of the block of pages examined is higher if the kernel swap daemon is intensively swapping; that is if the number of free pages in the system has fallen dangerously low. The blocks of pages are examined in a cyclical manner; a different block of pages is examined each time an attempt is made to shrink the memory map. This is known as the clock algorithm as, rather like the minute hand of a clock, the whole mem_map page vector is examined a few pages at a time. Each page being examined is checked to see if it is cached in either the page cache or the buffer cache. You should note that shared pages are not considered for discarding at this time and that a page cannot be in both caches at the same time. If the page is not in either cache then the next page in the mem_map page vector is examined. Pages are cached in the buffer cache (or rather the buffers within the pages are cached) to make buffer allocation and deallocation more efficient. The memory map shrinking code tries to free the buffers that are contained within the page being examined. If all the buffers are freed, then the pages that contain them are also be freed. If the examined page is in the Linux page cache, it is removed from the page cache and freed. (See free_buffer() in fs/buffer.c) When enough pages have been freed on this attempt then the kernel swap daemon will wait until the next time it is periodically woken. As none of the freed pages were part of any process's virtual memory (they were cached pages), then no page tables need updating. If there were not enough cached pages discarded then the swap daemon will try to swap out some shared pages.
3.8.2
Swapping Out System V Shared Memory Pages System V shared memory is an inter-process communication mechanism which allows two or more processes to share virtual memory in order to pass information amongst themselves. How processes share memory in this way is described in more detail in Chapter IPC-chapter. For now it is enough to say that each area of System V shared memory is described by a shmid_ds data structure. This contains a pointer to a list of vm_area_struct data structures, one for each process sharing this area of virtual memory. The vm_area_struct data structures describe where in each processes virtual memory this area of System V shared memory goes. Each vm_area_struct data structure for this System V shared memory is linked together using the vm_next_shared and vm_prev_shared pointers. Each shmid_ds data structure also contains a list of page table entries each of which describes the physical page that a shared virtual page maps to. The kernel swap daemon also uses a clock algorithm when swapping out System V shared memory pages. Each time it runs it remembers which page of which shared virtual memory area it last swapped out. It does this by keeping two indices, the first is an index into the set of shmid_ds data structures, the second into the list of page table entries for this area of System V shared memory. This makes sure that it fairly victimizes the areas of System V shared memory. (See shm_swap() in ipc/shm.c) As the physical page frame number for a given virtual page of System V shared memory is contained in the page tables of all of the processes sharing this area of virtual memory, the kernel swap daemon must modify all of these page tables to show that the page is no longer in memory but is now held in the swap file. For each shared page it is swapping out, the kernel swap daemon finds the page table entry in each of the sharing processes page tables (by following a pointer from each vm_area_struct data structure). If this processes page table entry for this page of System V shared memory is valid, it converts it into an invalid but swapped out page table entry and reduces this (shared) page's count of users by one. The format of a swapped out System V shared page table entry contains an index into the set of shmid_ds data structures and an index into the page table entries for this area of System V shared memory.
If the page's count is zero after the page tables of the sharing processes have all been modified, the shared page can be written out to the swap file. The page table entry in the list pointed at by the shmid_ds data structure for this area of System V shared memory is replaced by a swapped out page table entry. A swapped out page table entry is invalid but contains an index into the set of open swap files and the offset in that file where the swapped out page can be found. This information will be used when the page has to be brought back into physical memory.
3.8.3
Swapping Out and Discarding Pages The swap daemon looks at each process in the system in turn to see if it is a good candidate for swapping. Good candidates are processes that can be swapped (some cannot) and that have one or more pages which can be swapped or discarded from memory. Pages are swapped out of physical memory into the system's swap files only if the data in them cannot be retrieved another way. (See swap_out() in mm/vmscan.c) A lot of the contents of an executable image come from the image's file and can easily be re-read from that file. For example, the executable instructions of an image will never be modified by the image and so will never be written to the swap file. These pages can simply be discarded; when they are again referenced by the process, they will be brought back into memory from the executable image. Once the process to swap has been located, the swap daemon looks through all of its virtual memory regions looking for areas which are not shared or locked. Linux does not swap out all of the swappable pages of the process that it has selected; instead it removes only a small number of pages. Pages cannot be swapped or discarded if they are locked in memory. (See swap_out_vma() in mm/vmscan.c) The Linux swap algorithm uses page aging. Each page has a counter (held in the mem_map_t data structure) that gives the Kernel swap daemon some idea whether or not a page is worth swapping. Pages age when they are unused and rejuvinate on access; the swap daemon only swaps out old pages. The default action when a page is first allocated, is to give it an initial age of 3. Each time it is touched, it's age is increased by 3 to a maximum of 20. Every time the Kernel swap daemon runs it ages pages, decrementing their age by 1. These default actions can be changed and for this reason they (and other swap related information) are stored in the swap_control data structure. If the page is old (age = 0), the swap daemon will process it further. Dirty pages are pages which can be swapped out. Linux uses an architecture specific bit in the PTE to describe pages this way (see Figure 3.2). However, not all dirty pages are necessarily written to the swap file. Every virtual memory region of a process may have its own swap operation (pointed at by the vm_ops pointer in the vm_area_struct) and that method is used. Otherwise, the swap daemon will allocate a page in the swap file and write the page out to that device. The page's page table entry is replaced by one which is marked as invalid but which contains information about where the page is in the swap file. This is an offset into the swap file where the page is held and an indication of which swap file is being used. Whatever the swap method used, the original physical page is made free by putting it back into the free_area. Clean (or rather not dirty) pages can be discarded and put back into the free_area for re-use. If enough of the swappable processes pages have been swapped out or discarded, the swap daemon will again sleep. The next time it wakes it will consider the next process in the system. In this way, the swap daemon nibbles away at each processes physical pages until the system is again in balance. This is much fairer than swapping out whole processes. 3.9 The Swap Cache When swapping pages out to the swap files, Linux avoids writing pages if it does not have to. There are times when a page is both in a swap file and in physical memory. This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process. So long as the page in memory is not written to, the copy in the swap file remains valid. Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one per physical page in the system. This is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file. If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified. If the page is subsequently modified (by being written to), its entry is removed from the swap cache. When Linux needs to swap a physical page out to a swap file it consults the swap cache and, if there is a valid entry for this page, it does not need to write the page out to the swap file. This is because the page in memory has not been modified since it was last read from the swap file. The entries in the swap cache are page table entries for swapped out pages. They are marked as invalid but contain information which allow Linux to find the right swap file and the right page within that swap file. 3.10 Swapping Pages In The dirty pages saved in the swap files may be needed again, for example when an application writes to an area of virtual memory whose contents are held in a swapped out physical page. Accessing a page of virtual memory that is not held in physical memory causes a page fault to occur. The page fault is the processor signalling the operating system that it cannot translate a virtual address into a physical one. In this case this is because the page table entry describing this page of virtual memory was marked as invalid when the page was swapped out. The processor cannot handle the virtual to physical address translation and so hands control back to the operating system describing as it does so the virtual address that faulted and the reason for the fault. The format of this information and how the processor passes control to the operating system is processor specific. The processor specific page fault handling code must locate the vm_area_struct data structure that describes the area of virtual memory that contains the faulting virtual address. It does this by searching the vm_area_struct data structures for this process until it finds the one containing the faulting virtual address. This is very time critical code and a processes vm_area_struct data structures are so arranged as to make this search take as little time as possible. (See do_page_fault() in arch/i386/mm/fault.c) Having carried out the appropriate processor specific actions and found that the faulting virtual address is for a valid area of virtual memory, the page fault processing becomes generic and applicable to all processors that Linux runs on. The generic page fault handling code looks for the page table entry for the faulting virtual address. If the page table entry it finds is for a swapped out page, Linux must swap the page back into physical memory. The format of the page table entry for a swapped out page is processor specific but all processors mark these pages as invalid and put the information neccessary to locate the page within the swap file into the page table entry. Linux needs this information in order to bring the page back into physical memory. (See do_no_page() in mm/memory.c) At this point, Linux knows the faulting virtual address and has a page table entry containing information about where this page has been swapped to. The vm_area_struct data structure may contain a pointer to a routine which will swap any page of the area of virtual memory that it describes back into physical memory. This is its swapin operation. If there is a swapin operation for this area of virtual memory then Linux will use it. This is, in fact, how swapped out System V shared memory pages are handled as it requires special handling because the format of a swapped out System V shared page is a little different from that of an ordinairy swapped out page. There may not be a swapin operation, in which case Linux will assume that this is an ordinairy page that does not need to be specially handled. It allocates a free physical page and reads the swapped out page back from the swap file. Information telling it where in the swap file (and which swap file) is taken from the the invalid page table entry. (See do_swap_page() in mm/memory.c; shm_swap_in() in ipc/shm.c; swap_in() in mm/page_alloc.c) If the access that caused the page fault was not a write access then the page is left in the swap cache and its page table entry is not marked as writable. If the page is subsequently written to, another page fault will occur and, at that point, the page is marked as dirty and its entry is removed from the swap cache. If the page is not written to and it needs to be swapped out again, Linux can avoid the write of the page to its swap file because the page is already in the swap file. If the access that caused the page to be brought in from the swap file was a write operation, this page is removed from the swap cache and its page table entry is marked as both dirty and writable.
| Linux页缓存的作用是加快从磁盘上存取文件的速度。内存映射文件以每次一页的方式读出,这些页将被放在页缓存中。图3.6显示页缓存由page_hash_table,以及一组指向mem_map_t的指针组成。(参见include/linux/pagemap.h) Linux的每个文件由一VFS inode数据结构表示(请参看“文件系统”一章),并且每个VFS inode是唯一的并且描述一个且仅一个文件。页表中的索引从文件的VFS inode及其在文件中的偏移量派生而来。 当从内存映像文件中读取一页时,例如,按需装载一页回内存时,读操作将通过页缓存进行。如果页在缓存中,一个指向它的mem_map_t指针将被返回给处理页错的代码。否则,这页必须从含有这份映象的文件系统中被读入内存。Linux需分配一页内存,并从磁盘文件中读取这页。(编者注:这两句话直译很坳口,其实它的意思就是如果这页不是在页缓存中,则系统必须把它从存放这页的磁盘文件中读取出来。) 如果可能,Linux将开始读文件中的下一页。向前多读一页意味着如果进程是连续地访问文件,那么下一页将等在内存中。
页缓存将随着映象的读取与执行而渐渐增长。当不再被需要,或说不再被任何进程使用时,这些页将从缓存中移出。Linux使用内存时,会尽量减少物理页的使用。在此种情况下,Linux将减少页缓存的大小。
3.8 页的交换和释放
当空内存变得很少时,Linux内存管理系统必须释放一些页。这任务由内核交换后台程序来完成(kswapd)。(参见mm/vmscan.c中的kswapd()) 内核交换后台程序是一种特殊的进程,是一个内核线程。内核线程是没有虚存的进程,它们在物理地址空间以内核模式运行。“内核交换后台程序”这个名称稍微有点不恰当,因为它不仅仅是把页交换到系统的交换文件中。它这个角色是保证系统有足够的空闲内存而使内存管理系统可以高效地工作。
内核交换后台程序被内核init进程在初始化时启动,并等待内核交换定时器周期性地到期时开始运行。 每次定时器到期,内核交换后台程序就会检查系统中的空页数是否变得太低。交换程序使用两个变量,free_pages_high和 free_pages_low来决定是否它应该释放一些页。只要系统的空页数大于 free_pages_high,内核交换后台程序不做任何事情;它继续休息直到定时器再次到期。在做这项检查时,交换程序计算了正在往交换文件中写的页数。它把这个值保存在nr_async_pages中,每次有一页等待写入交换文件时,此值加1,当操作结束后,此值减1。free_pages_low和free_pages_high在系统开始时被设置,并且与系统物理内存的页数有关。如果系统的空页数小于 free_pages_high 或甚至小于 free_pages_low,内核交换后台程序将尝试 3 种方法以减少系统使用的页数:
l 减少缓冲区和页缓存的大小 l 换出系统 V 的共享页 l 换出并释放一些页
如果系统的空页数小于 free_pages_low,内核交换后台程序在它下次运行以前,将尝试释放6页,否则它将尝试释放3页。上面的方法将依次被使用直到有足够的页被释放。内核交换后台程序将记住上一次它是用什么方法释放内存的,下一次将首先使用这个成功的方法。
在系统有足够的空页后,交换程序将休息直到它的定时器到期。如果上次空页数小于free_pages_low,它只休息一半时间。直到空页数多于 free_pages_low,内核交换后台程序才恢复休息的时间。
3.8.1
减少页缓存和缓冲区的大小
页缓存和缓冲区中的页是被释放到free_area数组里的最佳候选。页缓存保存着内存映像文件,很可能包括了许多占据着系统内存但又没用的页。同样,包含着从物理设备中读写的数据的缓冲区中,也很可能包含许多不需要的数据缓存。当系统的内存页快用完时,从这些缓存丢弃页是相对容易的,因为它们不需要写物理设备(不同于从内存交换页)。丢弃这些页除了使访问物理设备和内存映象文件的速度减慢一些以外,没有其它的副作用。并且,如果从缓存中对页的丢弃是公平的话,那么对各进程的影响是相同的。(参见mm/filemap.c中的shrink_map())
每次内核交换后台程序尝试缩小这些缓存时,它先检查在mem_map页面数组中的页块,看是否有页可以从内存中释放。如果内核交换后台程序经常作交换操作,也就是系统空页数已经非常少了,它会先检查大一些的块。页块会被轮流检查;每次减少缓存时检查一组不同的页块。这被称作“时钟算法”,像钟的分针一样轮流检查mem_map页面数组中的页。
检查一页是看它是否在页缓存或缓冲区中。应该注意共享页在这时候不能被释放,并且一页不能同时在两个缓存中。如果页不在任何一个缓存中,那么就检查mem_map页面数组中的下一页。
页被缓存在缓冲区中(或页内的缓冲区被缓存)是为更有效地分配和回收缓存。缩减内存代码将尝试释放被检查页中的缓冲区。
如果所有的缓冲区都被释放了,那么对应它们的内存也就被释放了。如果被检查的页在Linux页缓存中,它将被从页缓存中移出并释放。(参见fs/buffer.c中的free_buffer()) 当足够的页被释放后,内核交换后台程序将等到下一个周期再运行。因为释放的页都不是任何进程的虚存部份(他们是被缓存的页),所以没有页表记录需要更新。如果没有释放足够的缓存页,那么交换程序将试着释放一些共享页。
3.8.2
交换出系统V的共享页
系统V共享内存是一个进程内通信机制,它允许两个或多个进程共享虚拟内存,以便于在它们之间传递信息。进程间如何通过这种方式共享内存,将会在IPC章中详细描述。现在,只要知道每一块系统V共享内存区域被一个shmid_ds数据结构描述就足够了。这个结构包含一个指向一组vm_area_struct数据结构的指针,每个进程都通过它共享这部分虚存。vm_area_struct数据结构描述了每个进程在各自虚存的哪里共享系统V的这个区域。每个vm_area_struct由vm_next_shared和vm_prev_shared指针相互连接起来。每个shmid_ds数据结构还包括一组页表记录,每个页表记录描述了这些共享页是对应内存中的哪些物理页。
内核交换后台程序也使用时钟算法来换出系统V的共享页。
每次它运行时,它记得上次换出的是哪个共享内存区域中的哪一页。它将其记录在两个索引中,第一个是shmid_ds数据结构的索引,第二个是这段系统V共享内存的页表记录的索引。这保证它公平地对待系统V的所有共享页。(参见ipc/shm.c中的shm_swap())
由于给定的系统V共享内存的物理页号在每一个共享此内存区域的进程的页表中都有记录,内核交换后台程序必须修改所有这些页表,显示页已不在内存中了,而被保存在交换文件中。对于每个换出的共享页,内核交换后台程序查找这共享页在各个进程中的页表记录(顺着每个vm_area_struct的指针)。如果这系统V共享页对应的页表记录是有效的,交换程序将把它改成无效,换出页表项,再将对应这页(共享页)的用户计数器减1。被换出的系统V共享页的页表记录格式中含有一个shmid_ds数据结构的索引,以及一个这段系统V共享内存区域的索引。
如果各进程的页表修改过后,页的计数器变成0,那么这个共享页就可以被写入交换文件了。shmid_ds中指向系统V共享内存页的页表记录将被换出页表记录所替换。一个换出页表记录是无效的,但它包含一组打开的交换文件的索引,以及换出页面在交换文件中的偏移量。当这页面重新被载入物理内存时,这些信息会被使用到。
3.8.3 换出及释放的页
交换程序轮流检查系统中每一个进程,看它们是不是被用来交换的好的候选。
好的候选是那些能被(有的不能)换出的进程以及那些能从内存中换出或释放若干页的进程。只有包含的数据不能从其它地方得到的页,才会从物理内存中交换到系统的交换文件中。(参见mm/vmscan.c中的swap_out())
可执行映像的许多内容是可以从映像文件中读出并且可以很容易的重新读出来的。例如,一段映像的可执行指令决不会被映象修改,所以决不会被写进交换文件。这些页可以简单的被丢弃;当他们再被进程调用时,他们将被从可执行映像中重新读入内存。
一旦确定了换出的进程,交换程序将检查它所有的虚拟内存区域,找出不是共享或被加锁的区域。
Linux并不换出它所选择进程的所有可交换页;相反它仅移出其中的一小部份。
如果页在内存中被锁住了,它们就不能被换出或释放。(参见mm/vmscan.c中的swap_out_vma()) Linux交换算法使用页的年龄(aging)。每页有一个记数器(保存在mem_map_t数据结构中),告诉交换程序是否应将它移出。当页不使用时页会变老;当被访问时,会变年轻。交换程序仅仅移出衰老的页。缺省状态下,当一页被分配时,起始年龄是3,每次它被访问,它的年龄将增加3,最大值为20。每次内核交换后台程序运行时,它把所有页的年龄数减1。这些缺省操作都能被改变,它们(以及一些其它的与交换相关的信息)被存储在swap_control数据结构中。
如果页是旧的(age = 0),交换程序就进一步处理它。脏页也可以被移出。Linux用PTE中的特定位来标示(见3.2图)。然而,并非所有的脏页必须被写进交换文件。进程的每个虚存区域都可以有它们自己的交换操作(由vm_area_struct中的vm_ops指针指出),这个特定的操作将被调用。否则,交换程序将在交换文件上分配一页,并将那页写到磁盘上。
页对应的页表记录将被改为无效,但包含了它在交换文件中的信息,它将指出是它存在于哪个交换文件,并且偏移量是多少。无论采取什么交换方法,原来的物理页将被放回free_area。乾净的(或者not dirty)的页可以直接丢弃放并放回free_area以备后用。
如果有足够的页被换出或释放,交换程序就又开始休息。下一次它运行时,它将检查系统中的下一个进程。这样,交换程序一点一点地将每个进程都移出几页,直到系统达到一个平衡,这比移出一整个进程来的公平。
3.9 交换缓存 当将页移入交换文件中时,如果不是必要,Linux总是避免进行写页操作。有时一页既在交换文件中,又在内存中。这种情况是由于这页本来被移到了交换文件中,后又因为被调用,又被重新读入内存。只要在内存中的页没被写过, 在交换文件中的拷贝仍然是有效。
Linux使用交换缓存来记录这些页。交换缓存是一个页表记录链表,每条记录对应一页。每条页表记录描述被换出的页在哪个交换文件中及其在文件中的位置。如果一个交换缓存记录非零,表示在交换文件中的那页没被修改过,如果页被修改了(被写),它的记录将被从交换缓存中移出。
当Linux需要移出一页内存到交换文件中时,它先查询交换缓存,如果这页有一个有效的记录,它就不需要把页写到交换文件中了。因为自从它上次从交换文件中读出后,在内存中没被修改过。
交换缓存中的记录是已被交换出的页的页表记录。它们被标为无效,但是告知了Linux页在哪个交换文件以及在交换文件的哪个位置。
3.10 移入页 保存在交换文件中的脏页可能会被再次调用。例如,一个应用程序要向已交换出物理页面的虚拟内存区上写入时。这样,存取不在内存中的虚页将引起页错。页错误是由处理器发信号给操作系统,告诉操作系统它不能把某个虚地址翻译成物理地址,由于描述这个虚页在被交换出时,页表记录已被标记为无效,处理器不能处理虚拟地址到物理地址的转换,于是,处理器把控制权交还给操作系统。同时告诉操作系统发生页错的虚拟地址及原因。消息的格式以及处理器怎样把控制权交给操作系统这是与处理器相关的。
与处理器相关的页错处理代码必须找到引起页错的虚地址对应的vm_area_struct数据结构。在这个过程中,系统检索该进程所有的vm_area_struct数据结构直到找到为止。这段代码对时间的要求很高,所以vm_area_struct应被合理组织起来,以缩短查找所需的时间。(参见arch/i386/mm/fault.c中的do_page_fault())
系统执行完这些与处理器相关的操作并找到引起页错的虚地址所代表的有效内存区域后,处理页错的其它代码是通用并且与运行Linux的处理器无关的了。
页错处理代码寻找引发页错的虚地址对应的页表记录。如果页表记录指示这页在交换文件中,Linux就必须把这页读回物理内存。页表记录的格式因处理器的不同而各不相同,但所有的处理器都会标记此页无效,并且都保存着有关这页在交换文件中的有用的信息。Linux需要利用这些信息来把页重新载入内存。(参见mm/memory.c中的do_no_page())
此时,Linux知道了引起页错的虚地址及其对应的页表记录,并且拥有一个包含此页被交换到哪个交换文件中的信息的页表记录。vm_area_struct数据结构可能包含一个指向一个例程的指针,这个例程能将此虚拟内存中的任何见交换到物理内存中去。这是swapin操作。如果此虚拟内存区域存在swapin操作,Linux就会调用它。实际上,这是怎样换出系统V共享内存页的操作,这个操作需要特殊处理,因为系统V的页的格式与一般的页不同。这里也可能没有swapin操作,在这种情况下,Linux将认为它是一普通的页,而不需要做任何特别的处理。
系统将在物理内存中分配一空页,并从交换文件中把交换同的页读回来。而关于页面在交换文件中的位置信息(以及在哪个交换文件中)是从无效的页表记录中取回的。(参见mm/memory.c中的do_swap_page()、ipc/shm.c中的shm_swap_in()、mm/page_alloc.c中的swap_in()) 如果引起页错的不是写操作,那么这页将被留在交换缓存中,它的页表记录不会被标为“可写”。如果后来这页被写了,那么会产生另一个页错,这时,页被标成“dirty”,并且它的页表记录被从交换缓冲中删去。如果这页没被修改过,而它又需要被换出,Linux将避免再把这页写到交换文件中,因为它已经在那儿了。
如果引起页面从交换文件中读出的操作是写操作,页将被从交换缓存中删除,它的页表记录将被标成“脏的(dirty)”和“可写(writable)”。 |