1.5 高速缓存的重要性
从这个简单的例子来看,系统花费大量的时间把信息从一个地方移动到另一个地方。在hello程序中的机器指令一开始是存在硬盘上的,当程序被加载的时候,会把这些指令拷贝到主存。当处理器运行这段程序,指令又被拷贝到处理器。简单来说,数据字符串"Hello,World\n",一开始在硬盘,然后拷贝到主存,最后是到显示设备。站在程序员的视角,这些动作就是给程序真正执行增加系统的开销。所以,系统设计者的目标就是使这些操作尽可能地快。
由于设备的物理特性,大容量设备会比小容量的慢,当然更快的设备相对来说会更贵。例如,在典型系统上的硬盘大小是内存的1000倍,但是处理器要从硬盘读取一个字要比内存读取慢10,000,000倍。
类似一个典型的寄存器只可以存储100个字节的信息,与其相比,内存可以存储十亿个字节。但是,处理器从寄存器读取数据比内存大概块100倍。随着半导体技术的提升,处理器可以集成越来越多的寄存器。使处理器跑得更快比内存相对来说更简单和便宜。
L1、L2高速缓存利用了硬件技术的静态随机访问器(SRAM)实现的,为了解决处理器和内存之间速度的差异,系统设计者加入一个叫做高速缓存的小而快的设备,它作为保存处理器最近需要的数据的临时状态区域。L1高速缓存保存了处理器最近访问的10000个字节的数据,它的访问速度和寄存器一样快。大一点的L2高速缓存通过特殊总线直接连接到处理器,容量大概是数千到百万字节。处理器访问L2高速缓存的时间大概要比L1的5倍,但是依然比访问主存快5-10倍。现在越来越多的系统有三层缓存,分别是L1,L2,L3。系统可以有更大的内存,并且可以快速访问,利用高速缓存的局部性原理,那么程序在获取数据和代码的时候,就可以清楚知道程序最近访问的趋势。通过设置高速缓存区保存我们经常访问的内容,我们可以把原来在内存中的操作,换成在高速缓存中操作。
在本书其中一个最重要的章节就是明白高速缓存原理的应用程序员可以利用这个来对他们的程序的性能提高一个数量级。你会在第6章学习到更多的设备和如何利用他们。
1.5 Caches Matter
An important lesson from this simple example is that a system spends a lot of time moving information from one place to another. The machine instructions in the hello program are originally stored on disk. When the program is loaded, they are copied to main memory. As the processor runs the program, instructions are copied from main memory into the processor. Similarly, the data string hello, world\n , originally on disk, is copied to main memory and then copied from main memory to the display device. From a programmer’s perspective,much of this copying is overhead that slows down the “real work” of the program. Thus, a major goal for system designers is to make these copy operations run as fast as possible.
Because of physical laws, larger storage devices are slower than smaller storage devices. And faster devices are more expensive to build than their slower counter parts. For example, the disk drive on a typical system might be 1,000 times larger than the main memory, but it might take the processor 10,000,000 times longer to read a word from disk than from memory.
Similarly, a typical register file stores only a few hundred bytes of information, as opposed to billions of bytes in the main memory. However, the processor can read data from the register file almost 100 times faster than from memory. Even more troublesome, as semiconductor technology progresses over the years, this processor memory gap continues to increase. It is easier and cheaper to make processors run faster than it is to make main memory run faster.
To deal with the processor-memory gap, system designers include smaller, faster storage devices called cache memories (or simply caches) that serve as temporary staging areas for information that the processor is likely to need in the near future. An L1 cache on the processor chip holds tens of thousands of bytes and can be accessed nearly as fast as the register file. A larger L2 cache with hundreds of thousands to millions of bytes is connected to the processor by a special bus. It might take 5 times longer for the processor to access the L2 cachethan the L1 cache, but this is still 5 to 10 times faster than accessingthe main memory. The L1 and L2 caches are implemented with a hardware technology known as static random access memory(SRAM). Newer and more powerful systems even have three levels ofcache: L1, L2, and L3. The idea behind caching is that a system can get the effect of both a very large memory and a very fast one by exploiting locality, the tendency for programs to access data and code in localized regions. By setting up caches to hold data that are likely tobe accessed often, we can perform most memory operations using the fast caches.
One of the most important lessons in this book is that application programmers who are aware of cache memories can exploit them to improve the performance of their programs by an order of magnitude.You will learn more about these important devices and how to exploitthem in Chapter 6 .