Memory-mapped I/O and port-mapped I/O_port memory address and limit has changed-优快云博客

本文链接：https://blog.youkuaiyun.com/CHALLENG_EVERYTHING/article/details/144568345

ref: https://en.wikipedia.org/wiki/Memory-mapped_I/O_and_port-mapped_I/O

From Wikipedia, the free encyclopedia

For more generic meanings of input/output ports, see Computer port (hardware). For memory-mapped file I/O, see Memory-mapped file.

“Port-mapped I/O” redirects here. For the related concept in computing, see Programmed input–output.

“MMIO” redirects here. For the airport in Mexico with the code MMIO, see Saltillo Airport.

Memory-mapped I/O (MMIO) and port-mapped I/O (PMIO) are two complementary methods of performing input/output (I/O) between the central processing unit (CPU) and peripheral devices in a computer (often mediating access via chipset). An alternative approach is using dedicated I/O processors, commonly known as channels on mainframe computers, which execute their own instructions.

Memory-mapped I/O (MMIO) 和 port-mapped I/O (PMIO) 是在一台计算机中CPU和并行设备之间执行IO（输入/输出）操作的两种互补的方法。另一种方法是使用专用的IO处理器，在大型机上，这种设备被成为通道。

Memory-mapped I/O uses the same address space to address both main memory[a] and I/O devices.[1] The memory and registers of the I/O devices are mapped to (associated with) address values, so a memory address may refer to either a portion of physical RAM or to memory and registers of the I/O device. Thus, the CPU instructions used to access the memory (e.g. MOV ...) can also be used for accessing devices. Each I/O device either monitors the CPU’s address bus and responds to any CPU access of an address assigned to that device, connecting the system bus to the desired device’s hardware register, or uses a dedicated bus.

Memory-mapped I/O 这种方法在主内存和IO设备之间共用相同的地址空间，主内存和IO设备的寄存器都被映射了地址，因此，内存地址可能会引用到物理内存的一部分，也可能会引用到IO设备的内存和寄存器。因此，可以访问物理内存的CPU指令（例如mov）也可以访问IO设备。每个IO设备会监视地址总线的数据，当CPU对分配给IO设备的地址进行访问时，该IO设备会进行响应，同时，IO设备的寄存器还连接了系统总线（包括数据和控制总线）或者专用总线。

To accommodate the I/O devices, some areas of the address bus used by the CPU must be reserved for I/O and must not be available for normal physical memory; the range of addresses used for I/O devices is determined by the hardware. The reservation may be permanent, or temporary (as achieved via bank switching). An example of the latter is found in the Commodore 64, which uses a form of memory mapping to cause RAM or I/O hardware to appear in the 0xD000–0xDFFF range.

为了容纳IO设备，CPU需要在地址空间中保留一部分的空间给这些IO设备，并且这部分地址空间不能用于正常的物理内存。保留给IO外设的地址区间的范围是由硬件决定的。给IO设备保留的地址区间可以是永久的，也可能是临时的（通过 bank switching 实现），后者在Commodore_64计算机上通过内存映射的方式使物理内存或者IO外设出现在 0xD000–0xDFFF 地址区间。

在某种程度上说，IO端口就是IO设备上的寄存器。

Port-mapped I/O often uses a special class of CPU instructions designed specifically for performing I/O, such as the in and out instructions found on microprocessors based on the x86 architecture. Different forms of these two instructions can copy one, two or four bytes (outb, outw and outl, respectively) between the EAX register or one of that register’s subdivisions on the CPU and a specified I/O port address which is assigned to an I/O device. I/O devices have a separate address space from general memory, either accomplished by an extra “I/O” pin on the CPU’s physical interface, or an entire bus dedicated to I/O. Because the address space for I/O is isolated from that for main memory, this is sometimes referred to as isolated I/O.[2] On the x86 architecture, index/data pair is often used for port-mapped I/O.[3]

Port-mapped I/O 这种方法通常使用特殊的CPU指令来执行IO操作，例如x86架构CPU上的 in 和 out 指令。这两个指令有多种不同形式，用于在CPU的寄存器和指定IO设备的端口地址间拷贝一个，两个或者四个字节的数据，IO设备有独立于通用内存的地址空间，通过CPU上额外的I/O pin脚或和专用于IO设备的总线来访问，因为IO设备的地址空间和主内存的地址空间是隔离的。

Overview

Different CPU-to-device communication methods, such as memory mapping, do not affect the direct memory access (DMA) for a device, because, by definition, DMA is a memory-to-device communication method that bypasses the CPU.

有多种CPU和IO设备间通信的方式，例如内存映射这种，但是，CPU和IO设备的通信方式不会影响到direct memory access (DMA)，因为从DMA通信方式的定义上来说，其是一种内存和IO设备间的通信方式，这种通信方式绕开了CPU。

Hardware interrupts are another communication method between the CPU and peripheral devices, however, for a number of reasons, interrupts are always treated separately. An interrupt is device-initiated, as opposed to the methods mentioned above, which are CPU-initiated. It is also unidirectional, as information flows only from device to CPU. Lastly, each interrupt line carries only one bit of information with a fixed meaning, namely “an event that requires attention has occurred in a device on this interrupt line”.

硬件中断是CPU和并行设备之间通信的另一种方式，但是，由于种种原因，中断是被单独处理的。中断是由设备发起的，这个和之前提到的其他CPU和设备间通信的方式不通，之前的方式都是由CPU发起的。并且中断的数据流是单向的，只能从设备流向CPU。最后，每个中断线只能传输1个bit的信息（0/1），这个信息被称为事件，为1表示在这个中断线上连接的并行设备发生了特定的事件。

I/O operations can slow memory access if the address and data buses are shared. This is because the peripheral device is usually much slower than main memory. In some architectures, port-mapped I/O operates via a dedicated I/O bus, alleviating the problem.

在地址总线和数据总线是共享的情况下，IO操作会减慢内存访问的速度，因为并行外部设备的速度要远慢于主内存的速度。在某些架构中，端口映射IO（port-mapped I/O）通过专用的IO总线来缓解这个问题。

One merit of memory-mapped I/O is that, by discarding the extra complexity that port I/O brings, a CPU requires less internal logic and is thus cheaper, faster, easier to build, consumes less power and can be physically smaller; this follows the basic tenets of reduced instruction set computing, and is also advantageous in embedded systems. The other advantage is that, because regular memory instructions are used to address devices, all of the CPU’s addressing modes are available for the I/O as well as the memory, and instructions that perform an ALU operation directly on a memory operand (loading an operand from a memory location, storing the result to a memory location, or both) can be used with I/O device registers as well. In contrast, port-mapped I/O instructions are often very limited, often providing only for simple load-and-store operations between CPU registers and I/O ports, so that, for example, to add a constant to a port-mapped device register would require three instructions: read the port to a CPU register, add the constant to the CPU register, and write the result back to the port.

内存映射IO（memory-mapped I/O ）的一个优点是在抛弃了IO端口映射（ port I/O ）带来的复杂度后，CPU内部实现的逻辑会更少，因为会带来造价便宜，便于构建以及低能耗等特性，并且CPU的体积可以更小；这个符合一个基本原则，就是尽量减少CPU的指令个数，这个可以降低复杂度和功耗，这个在嵌入式系统中是一个优点。内存映射IO的另一个优点是，当使用常规寻址指令用于寻址外设时，CPU上的所有寻址模式都可以用于外部IO设备，就像其作用于主内存那样，同时，在ALU上面的指令操作（load操作和store操作，这些操作在内存和寄存器之间交换数据）也可以用于IO设备上。与之相反的是端口映射IO（port-mapped I/O ）所使用的指令集的作用就非常有限，只提供简单的在CPU和IO外设端口间的load和store操作，因此，举例来说，想要把一个常量放到端口映射IO的寄存器中，需要三条指令：1. 将IO端口读取（IO寄存器）到CPU寄存器 2. 把常量读取到CPU寄存器 3. 将结果写入IO端口（IO寄存器）

As 16-bit processors have become obsolete and replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space for I/O is less of a problem, as the memory address space of the processor is usually much larger than the required space for all memory and I/O devices in a system. Therefore, it has become more frequently practical to take advantage of the benefits of memory-mapped I/O. However, even with address space being no longer a major concern, neither I/O mapping method is universally superior to the other, and there will be cases where using port-mapped I/O is still preferable.

16位CPU已经完全过时，并且被32位和64位处理器取代，因此为IO设备保留地址空间已经不是什么大问题了，因为CPU可以寻址的地址空间远大于IO外设和物理内存所占用的地址空间总和，因此，内存映射IO的使用现在变得越来越频繁。但是，即使CPU可以寻址的地址空间大小已经不是问题的情况下，对于IO端口映射和IO内存映射两种IO映射方式来说，并不存在优劣的关系，在某些场合下，IO端口映射仍然很受欢迎。

x86

Memory-mapped I/O is preferred in IA-32 and x86-64 based architectures because the instructions that perform port-based I/O are limited to one register: EAX, AX, and AL are the only registers that data can be moved into or out of, and either a byte-sized immediate value in the instruction or a value in register DX determines which port is the source or destination port of the transfer.[4][5] Since any general-purpose register can send or receive data to or from memory and memory-mapped I/O devices, memory-mapped I/O uses fewer instructions and can run faster than port I/O. AMD did not extend the port I/O instructions when defining the x86-64 architecture to support 64-bit ports, so 64-bit transfers cannot be performed using port I/O.[6]

IA-32和x86-64架构比较偏好内存映射IO这个IO映射方式，因为在这些架构平台上，端口映射IO被限制在一个专用寄存器上：EAX，AX和AL，也就是数据只能通过这个寄存器来输入输出，并且，指令中字节大小的立即数或者DX寄存器中的值决定了传输数据的源或者目标IO端口。任何通用寄存器可以和内存或者内存映射IO设备之间进行数据的发送或者接收，并且内存映射IO使用更少的指令，运行速度比端口映射IIO要快。AMD在定义x86-64架构的时候没有拓展端口映射IO的指令用于支持64位IO设备端口，因此，和IO设备间进行64位数据传输就无法用到端口映射IO链。

On newer Intel platforms beginning with 2008 5 series, I/O devices on the chipset directly communicate via a dedicated Direct Media Interface (DMI) bus.[b][7]

从2008年起，intel新的平台上，主板上的芯片组（南桥/北桥）通过专用的DMI（Direct Media Interface/直接媒体接口）总线来和CPU进行数据交互。

Memory barriers

Since the caches mediate accesses to memory addresses, data written to different addresses may reach the peripherals’ memory or registers out of the program order, i.e. if software writes data to an address and then writes data to another address, the cache write buffer does not guarantee that the data will reach the peripherals in that order.[8] Any program that does not include cache-flushing instructions after each write in the sequence may see unintended IO effects if a cache system optimizes the write order. Writes to memory can often be reordered to reduce redundancy or to make better use of memory access cycles without changing the final state of what got stored; whereas, the same optimizations might completely change the meaning and effect of writes to memory-mapped I/O regions.
因为缓存系统会调整对内存地址的访问，对不同地址的数据写入可能不会按照代码指定的顺序写入外设或者寄存器。例如，一个程序先后向两个地址写入数据，这两个地址分别映射了两个不通的外设，那么数据到达这两个外设的顺序不一定能保证和写入的顺序一致。应用程序如果在写入数据到IO设备后没有调用 cache-flushing 指令，那么可能会在IO设备上看到意料之外的效果（顺序），因为这个场景中，缓存系统会优化写入IO设备的顺序。对内存的写入动作经常被重新排序在不改变最终存储结果的前提先可以降低冗余或者更好的利用内存访问周期，然而，相同的优化动作对于内存映射IO的写入产生完全不同的影响和结果。

Lack of foresight in the choice of memory-mapped I/O regions led to many of the RAM-capacity barriers in older generations of computers. Designers rarely expected machines to grow to make full use of an architecture’s theoretical RAM capacity, and thus often used some of the high-order bits of the address-space as selectors for memory-mapped I/O functions. For example, the 640 KB barrier in the IBM PC and derivatives is due to reserving the region between 640 and 1024 KB (64k segments 10 through 16) for the Upper Memory Area. This choice initially made little impact, but it eventually limited the total amount of RAM available within the 20-bit available address space. The 3 GB barrier and PCI hole are similar manifestations of this with 32-bit address spaces, exacerbated by details of the x86 boot process and MMU design. 64-bit architectures often technically have similar issues, but these only rarely have practical consequences.

上一代的计算机由于缺乏对于内存映射IO在映射区域选择上的考细致虑导致了许多内存容量方面的阻碍。设计师很少预见到计算机会持续发展，并且将架构理论上所支持的内存容量全部用完，因此，他们常常使用地址空间的高位部分用于内存映射IO功能。例如，在IBM PC中，存在640K的内存栅栏，因为640K～1024K之间的区域是用于Upper Memory Area （包括IO和BIOS）的，这个设计一开始影响不大，但是它最终导致了20位地址空间的中可用内存大小的限制。在32位操作系统中也存在3GB的内存栅栏和PCI空洞这样相似的问题。从技术上说，64位操作系统也会有类似的问题，但只是很少产生实际的影响。

Examples

Address range (hexadecimal)	Size	Device
0000–7FFF	32 KiB	RAM
8000–80FF	256 bytes	General-purpose I/O
9000–90FF	256 bytes	Sound controller
A000–A7FF	2 KiB	Video controller/text-mapped display RAM
C000–FFFF	16 KiB	ROM

A simple system built around an 8-bit microprocessor might provide 16-bit address lines, allowing it to address up to 64 kibibytes (KiB) of memory. On such a system, the first 32 KiB of address space may be allotted to random access memory (RAM), another 16 KiB to read-only memory (ROM) and the remainder to a variety of other devices such as timers, counters, video display chips, sound generating devices, etc.
一个在8位微处理器上构建的简单的操作系统可会提供16位地址线，最多可以寻址到64K字节的内存。在这个系统上，地址空间的前32K字节被用于RAM（随即存储），剩下的32K字节中，16K字节用于ROM（只读存储），余下的则用于各种设备（定时器，计数器，视频显示卡，音频卡等）。

The hardware of the system is arranged so that devices on the address bus will only respond to particular addresses which are intended for them, while all other addresses are ignored. This is the job of the address decoding circuitry, and that establishes the memory map of the system. As a result, system’s memory map may look like in the table on the right. This memory map contains gaps, which is also quite common in actual system architectures.
硬件系统被设计成在地址总线上挂在的外部设备只响应与该设备绑定的地址的寻址动作，这些设备对其他地址的寻址请求一概忽略，这部分功能的实现是由ADC （ address decoding circuitry #地址解码电路）来完成的，ADC构建了系统的 memory map （内存映射）功能。因此，系统的内存表看起来就像上面那张表格那样，这个内存表格包含了一些空的区间，这个情况在实际的系统架构中是比较常见的。

Assuming the fourth register of the video controller sets the background colour of the screen, the CPU can set this colour by writing a value to the memory location A003 using its standard memory write instruction. Using the same method, graphs can be displayed on a screen by writing character values into a special area of RAM within the video controller. Prior to cheap RAM that enabled bit-mapped displays, this character cell method was a popular technique for computer video displays (see Text user interface).
假设4个视频控制器的寄存器设置了屏幕的背景色，CPU通过写入颜色值到A003地址（在上面那张表格中属于Video controller的地址区间）上来修改屏幕上显示的背景色，这个写入动作使用的是标准的内存写入指令。使用相同的方法，可以通过向视频控制器关联的内存区域上写入字符集合的方式改变屏幕上的背景图片。在cheap RAM出现之前，这种字符单元方法是计算机上常用的显示技术 ( Text user interface)。

Basic types of address decoding

Address decoding types, in which a device may decode addresses completely or incompletely, include the following:
根据设备是否可以完全解码地址，地址解码的类型可以分为：

Complete (exhaustive) decoding
完全解码
1:1 mapping of unique addresses to one hardware register (physical memory location). Involves checking every line of the address bus.
1：1 将唯一的地址映射到IO外设硬件寄存器上，这个解码涉及到地址总线上每一条线的值，也就是对地址总线上完整的地址值做解码。
Incomplete (partial) decoding
非完全解码
n:1 mapping of n unique addresses to one hardware register. Partial decoding allows a memory location to have more than one address, allowing the programmer to reference a memory location using n different addresses. It may also be done to simplify the decoding hardware by using simpler and often cheaper logic that examines only some address lines, when not all of the CPU’s address space is needed. Commonly, the decoding itself is programmable, so the system can reconfigure its own memory map as required, though this is a newer development and generally in conflict with the intent of being cheaper.
n：1 将n个不同的唯一地址映射到同一个IO外设硬件寄存器上。部分解码允许一个内存地址有多个地址空间上的地址，这个特性允许程序员使用多个虚拟地址访问同一个总线地址，当不需要所有的CPU总线地址时，通过只检测地址总线上部分地址线的方式，可以简化地址解码电路的设计（电路更简单，价格也更便宜）。通常，地址解码设备本身时可编程的，因此，系统可以重新按照自身的需求设置地址映射，尽管这个特性可能会带来新的开发工作，并且和使解码设备更廉价这个目标是有冲突的。
Linear decoding
线性解码
Address lines are used directly without any decoding logic. This is done with devices such as RAMs and ROMs that have a sequence of address inputs, and with peripheral chips that have a similar sequence of inputs for addressing a bank of registers. Linear addressing is rarely used alone (only when there are few devices on the bus, as using purely linear addressing for more than one device usually wastes a lot of address space) but instead is combined with one of the other methods to select a device or group of devices within which the linear addressing selects a single register or memory location.
没有地址解码硬件的参与，直接使用地址总线来访问内存和外设，这是通过拥有线性地址的RAM和ROM设备以及用于类似线性输入地址（用于寻址寄存器组）的外部设备芯片来做到的。线性地址很少单独使用（当地址总线上挂载的设备比较少，当地址总线上有多个设备时，如果仅仅使用线性地址，会浪费比较多的地址空间），但是，当线性地址解码和其他地址解码方式结合起来用于选择一个或者一组设备时，其中线性地址解码可以用于选择单个寄存器或者物理内存地址。

Port I/O via device drivers

In Windows-based computers, memory can also be accessed via specific drivers such as DOLLx8KD which gives I/O access in 8-, 16- and 32-bit on most Windows platforms starting from Windows 95 up to Windows 7. Installing I/O port drivers will ensure memory access by activating the drivers with simple DLL calls allowing port I/O and when not needed, the driver can be closed to prevent unauthorized access to the I/O ports.
在window系统中，通过特定的驱动 (例如DOLLx8KD) 来访问外部存储，驱动程序可以在大多数window平台（从windows95～windows7）执行8，16和32位的IO访问。安装IO端口驱动后，通过简单的DLL方法可以调用激活驱动，激活驱动后就可以进行IO端口访问，这使得程序开发这可以通过内存访问的方式寻址IO外设，当关闭IO驱动后，可以阻止应用程序对于IO端口未授权的访问。

Linux provides the pcimem utility to allow reading from and writing to MMIO addresses. The Linux kernel also allows tracing MMIO access from kernel modules (drivers) using the kernel’s mmiotrace debug facility. To enable this, the Linux kernel should be compiled with the corresponding option enabled. mmiotrace is used for debugging closed-source device drivers.
Linux系统提供pcimem工具允许程序员读写MMIO地址空间，linux内核也允许内核驱动使用内核的mmiotrace debug工具来追踪MMIO地址空间的访问，但是这需要带上相关的使能选项重新编译linux内核。