地址转换与内存管理-优快云博客

Please indicate the source: http://blog.youkuaiyun.com/gaoxiangnumber1.

9.6 Address Translation

l Address translation is a mapping between the elements of an N-element virtual address space (VAS) and an M-element physical address space (PAS),

MAP: VAS → PAS ∪ ∅

where

l A control register in the CPU, the page table base register (PTBR) points to the current page table.

l The n-bit virtual address has two components: a p-bit virtual page offset (VPO) and an (n − p)-bit virtual page number (VPN).

l The MMU uses the VPN to select the appropriate PTE. For example, VPN 0 selects PTE 0, VPN 1 selects PTE 1, and so on.

l The corresponding physical address is the concatenation of the physical page number (PPN) from the page table entry and the VPO from the virtual address. Since the physical and virtual pages are both P bytes, the physical page offset (PPO) is identical to the VPO.

l The steps that the CPU hardware performs when there is a page hit.

² Step 1: The processor generates a virtual address and sends it to the MMU.

² Step 2: The MMU generates the PTE address and requests it from the cache/main memory.

² Step 3: The cache/main memory returns the PTE to the MMU.

² Step 4: The MMU constructs the physical address and sends it to cache/main memory.

² Step 5: The cache/main memory returns the requested data word to the processor.

l Unlike a page hit, which is handled entirely by hardware, handling a page fault requires cooperation between hardware and the operating system kernel.

² Steps 1 to 3: The same as Steps 1 to 3 (a).

² Step 4: The valid bit in the PTE is zero, so the MMU triggers an exception, which transfers control in the CPU to a page fault exception handler in the operating system kernel.

² Step 5: The fault handler identifies a victim page in physical memory, and if that page has been modified, pages it out to disk.

² Step 6: The fault handler pages in the new page and updates the PTE in memory.

² Step 7: The fault handler returns to the original process, causing the faulting instruction to be restarted. The CPU resends the offending virtual address to the MMU.

l Because the virtual page is now cached in physical memory, there is a hit, and after the MMU performs the steps in (b), the main memory returns the requested word to the processor.

9.6.1 Integrating Caches and VM

l In any system that uses both virtual memory and SRAM caches, there is the issue of whether to use virtual or physical addresses to access the SRAM cache.

l Most systems choose physical addressing. With physical addressing, it is straightforward for multiple processes to have blocks in the cache at the same time and to share blocks from the same virtual pages. And the cache does not have to deal with protection issues because access rights are checked as part of the address translation process.

l The main idea is that the address translation occurs before the cache lookup. Notice that page table entries can be cached, just like any other data words.

9.6.2 Speeding up Address Translation with a TLB

l Every time the CPU generates a virtual address, the MMU must refer to a PTE in order to translate the virtual address into a physical address. In the worst case, this requires an additional fetch from memory, at a cost of tens to hundreds of cycles. If the PTE happens to be cached in L1, then the cost goes down to one or two cycles.

l Many systems try to eliminate this cost by including a small cache of PTEs in the MMU called a translation lookaside buffer (TLB).

l A TLB is a small, virtually addressed cache where each line holds a block consisting of a single PTE. A TLB usually has a high degree of associativity.

l The index and tag fields that are used for set selection and line matching respectively are extracted from the virtual page number in the virtual address. If the TLB has T = 2^t sets, then the TLB index (TLBI) consists of the t least significant bits of the VPN, and the TLB tag (TLBT) consists of the remaining bits in the VPN.

l (a) shows the steps involved when there is a TLB hit. All of the address translation steps are performed inside the on-chip MMU, and thus are fast.

² Step 1: The CPU generates a virtual address.

² Steps 2 and 3: The MMU fetches the appropriate PTE from the TLB.

² Step 4: The MMU translates the virtual address to a physical address and sends it to the cache/main memory.

² Step 5: The cache/main memory returns the requested data word to the CPU.

l When there is a TLB miss, then the MMU must fetch the PTE from the L1 cache, as shown in (b). The newly fetched PTE is stored in the TLB, possibly overwriting an existing entry.

9.6.3 Multi-Level Page Tables

l We have assumed that the system uses a single page table to do address translation. If we had a 32-bit address space, 4 KB pages, and a 4-byte(i.e., 4B) PTE, then we would need a 4 MB page table resident in memory at all times(2³²/(4*2¹⁰) = 2²⁰ number of pages, 4KB*2²⁰ = 4 MB), even if the application referenced only a small chunk of the virtual address space. The problem is compounded for systems with 64-bit address spaces.

l The common approach for compacting the page table is to use a hierarchy of page tables instead.

l Consider a 32-bit virtual address space partitioned into 4 KB pages, with page table entries that are 4 bytes each. Suppose that at this point in time the virtual address space has the following form: The first 2K pages of memory are allocated for code and data, the next 6K pages are unallocated, the next 1023 pages are also unallocated, and the next page is allocated for the user stack. Figure 9.17 shows how we might construct a two-level page table hierarchy for this virtual address space.

l Each PTE in the level-1 table is responsible for mapping a 4 MB chunk of the virtual address space, where each chunk consists of 1024 contiguous pages. For example, PTE 0 maps the first chunk, PTE 1 the next chunk, and so on.

l Given that the address space is 4 GB, 1024 PTEs are sufficient to cover the entire space. If every page in chunk i is unallocated, then level 1 PTE i is null. For example, chunks 2–7 are unallocated. However, if at least one page in chunk i is allocated, then level 1 PTE i points to the base of a level 2 page table. For example, all or portions of chunks 0, 1, and 8 are allocated, so their level 1 PTEs point to level 2 page tables.

l Each PTE in a level 2 page table is responsible for mapping a page (4KB) of virtual memory.

l With 4-byte PTEs, each level 1 and level 2 page table is 4K bytes, which is the same size as a page.

l This scheme reduces memory requirements in two ways.

1. If a PTE in the level 1 table is null, then the corresponding level 2 page table does not need to exist. This represents a significant potential savings, since most of the 4 GB virtual address space for a typical program is unallocated.

2. Only the level 1 table needs to be in main memory at all times. The level 2 page tables can be created and paged in and out by the VM system as they are needed, which reduces pressure on main memory. Only the most heavily used level 2 page tables need to be cached in main memory.

l The virtual address is partitioned into k VPNs and a VPO. As with a single-level hierarchy, the PPO is identical to the VPO. Each VPN i, 1 ≤ i ≤ k, is an index into a page table at level i. Each PTE in a level-j table, 1 ≤ j ≤ k − 1, points to the base of some page table at level j + 1. Each PTE in a level-k table contains either the PPN of some physical page or the address of a disk block.

l To construct the physical address, the MMU must access k PTEs before it can determine the PPN. Accessing k PTEs may seem slow. However, the TLB comes to the rescue here by caching PTEs from the page tables at the different levels. In practice, address translation with multi-level page tables is not significantly slower than with single-level page tables.

9.6.4 Putting It Together: End-to-end Address Translation

l In this section, we put it all together with a concrete example of end-to-end address translation on a small system with a TLB and L1 d-cache. We make the following assumptions:

² The memory is byte addressable.

² Memory accesses are to 1-byte words (not 4-byte words).

² Virtual addresses are 14 bits wide (n = 14).

² Physical addresses are 12 bits wide (m = 12).

² The page size is 64 bytes (P = 64 = 2⁶).

² The TLB is four-way set associative with 16 total entries.

² The L1 d-cache is physically addressed and direct mapped, with a 4-byte line size and 16 total sets.

² Since each page is 2⁶ = 64 bytes, the low-order 6 bits of the virtual and physical addresses serve as the VPO and PPO respectively. The high-order 8 bits of the virtual address serve as the VPN. The high-order 6 bits of the physical address serve as the PPN.

TLB

² The TLB is virtually addressed using the bits of the VPN. Since the TLB has four sets, the 2 low-order bits of the VPN serve as the set index (TLBI). The remaining 6 high-order bits serve as the tag (TLBT) that distinguishes the different VPNs that might map to the same TLB set.

Page table

² The page table is a single-level design with a total of 2⁸ = 256 page table entries (PTEs). However, we are only interested in the first sixteen of these.

² We have labeled each PTE with the VPN that indexes it, but these VPNs are not part of the page table and not stored in memory. The PPN of each invalid PTE is denoted with a dash to reinforce the idea that whatever bit values might happen to be stored there are not meaningful.

Cache

² The direct-mapped cache is addressed by the fields in the physical address.
Since each block is 4 bytes, the low-order 2 bits of the physical address serve as the block offset (CO).
Since there are 16 sets, the next 4 bits serve as the set index (CI).
The remaining 6 bits serve as the tag (CT).

l Consider the CPU executes a load instruction that reads the byte at address 0x03d4. (Recall that our CPU reads one-byte words rather than four-byte words.)

l To begin, the MMU extracts the VPN (0x0F) from the virtual address and checks with the TLB to see if it has cached a copy of PTE 0x0F.

l The TLB extracts the TLB index (0x03) and the TLB tag (0x3) from the VPN, hits on a valid match in the second entry of Set 0x3, and returns the cached PPN (0x0D) to the MMU.

l The MMU form the physical address by concatenating the PPN (0x0D) with the VPO (0x14)s, which forms the physical address (0x354).

l Next, the MMU sends the physical address to the cache, which extracts the cache offset CO (0x0), the cache set index CI (0x5), and the cache tag CT (0x0D) from the physical address.

l Since the tag in Set 0x5 matches CT, the cache detects a hit, reads out the data byte (0x36) at offset CO, and returns it to the MMU, which then passes it back to the CPU.

l Other paths through the translation process are also possible. For example, if the TLB misses, then the MMU must fetch the PPN from a PTE in the page table. If the resulting PTE is invalid, then there is a page fault and the kernel must page in the appropriate page and rerun the load instruction. Another possibility is that the PTE is valid, but the necessary memory block misses in the cache.

Please indicate the source: http://blog.youkuaiyun.com/gaoxiangnumber1.