Posted CAS and Additive Latency (AL)

最新推荐文章于 2025-10-01 10:22:12 发布

转载最新推荐文章于 2025-10-01 10:22:12 发布 · 3.5k 阅读

x86 专栏收录该内容

22 篇文章

订阅专栏

本文探讨了DRAM内存控制器中使用Posted CAS和加性延迟(AL)来避免命令总线冲突的方法，通过提前发出读取命令并将其内部推迟，可以消除数据流中的气泡，从而提升内存带宽性能。

Posted CAS and Additive Latency (AL)

For the DRAM designer and those who work on memory controllers, it is self-understood that the command bus can carry only one signal at the time. The same holds for the time-multiplexed address bus but at this point, we are only concerned with the command bus. Essentially, there are four command lines that are important, that is RAS, CAS, Chip Select (CS) and Write Enable (WE). Leaving CS out of the picture for now (since it only selects the physical bank out of all DIMMs within the system), any combination of high and low signals on the RAS, CAS and WE lines means either a bank activate (ACT), read, write, precharge or refresh command to name those commands important for the following. Typical command signals for the three lines mentioned above would be e.g. 101 or 110 or 010 using the RAS, CAS and WE line matrix. Keep in mind that we are talking about physical lines from separate pins on the controller to separate pins on the DRAM.

It is important to understand that only one command can be issued at any time because each of the three command lines can only be either high or low. Any two commands issued on the same clock will cause bus contention and so-called data collision (since at least one line would need to be high and low at the same time). For example, in bank interleave mode, a bank activate command to a second or third internal bank on the DRAM chip can be issued after the specified Row-To-Row Delay (tRRD). At the same time, because of a pre-defined RAS-To-CAS delay, a read command is already scheduled. The two commands coincide on the same clock and conflict or collide with each other on the command bus. Consequently, the next bank activate command will have to be pushed out by one cycle.

The bank activate is the first step in every memory access and, therefore, all subsequent steps like a read command to the same bank will be pushed out by the same 1 clock. Whatever terrain has been lost on the bank activate cannot be gained back by a faster CAS latency (it is impossible to change CAS latency on demand) and, therefore, there will be a gap or bubble in the data stream which will manifest itself in a memory bandwidth performance hit. We don't like performance hits, don't we?

Conventional command issuing (top) compared to Posted CAS mode (bottom)

Out of the four internal banks accessible in bank interleave mode, three are shown in green, blue and purple (commands and resulting data). The clock traces refer to the I/O buffers, not to the core. Act: Bank activate command; Read: Read Command; P-Rd: Posted Read / Posted CAS; D: data output (1 bit/pin, 1 quadword / bus width).

Conventional Operation: Bank activate commands to internal banks are given with a Row-to-Row delay (tRRD) of 2. In this particular case, tRCD, i.e., the delay until a read command can be given, equals 4T. That means that the first read command (four cycles after the first bank activate) will fall onto the same clock as the third bank activate, in other words, the commands conflict with each other or collide on the bus. Consequently, one of the commands needs to be shifted by one clock. Since tRCD is defined in the BIOS setup, whereas bank activate commands are issued whenever a memory access is started, it is the bank activate that will be postponed. Bank activate, however, is the one command at the beginning of every access and, therefore, all subsequent commands to the same bank will be delayed by one cycle as well. This causes a bubble in the data stream. CAS latency equals four cycles here (20 or 15 ns for DDR400 or DDR533, respectively)

Posted CAS Operation: Bank activate and read commands (CAS) to the same bank are issued by the controller in back-to-back mode on consecutive cycles. In this case, all activate commands are done on even cycles whereas the read commands are always on odd cycles. Internally, the read commands (CAS commands) are held and then issued after a predefined additive latency (AL) as a postponed read (P-Rd) or Posted CAS. Since the Posted CAS does not require any external command, the bus is free to communicate a new activate command on the same clock. In summary, instead of a normal tRCD, we have a single cycle delay for the read command to which we need to add the internal delay (hence the name additive latency; AL) for the equivalent of a RAS-To-CAS delay and with no need for an additional read command. This will avoid bus collision.

The solution to this problem is to issue the commands in form of bundles, that is, a read command is issued immediately on the next cycle after the bank activate command that it belongs to. A command buffer on the DRAM chip will hold the command and internally schedule it without any further input needed from the command bus. This means that the command bus is free to activate another bank. This mode of operation, using an early issued but internally postponed read (or CAS) command is called Posted CAS where the delay or additive latency (AL) is specified by the mode register set (MRS) during initialization of the DRAM chip.

The consequence is that bank activate and read commands that belong together can be issued on consecutive clock cycles and immediately thereafter free up the bus for the next frame information structure (oops, that was serial ATA). The net effect of Posted CAS and AL is that there will be no command bus collisions and, thus, no bubbles in the data stream.

Update

Posted CAS and Additive Latency are optional features that are supported by the DRAM devices but do not necessarily have to be used by any given controller. IBM's memory controllers are apparently using Posted CAS and AL, likewise, the features appear to be used in graphics cards. However, to the best of our knowledge, neither Intel, nVidia, ATI or AMD are using Posted CAS / AL on mainstream chipsets for the PC-Workstation platform. In this case, the AL feature is simply set to "0" and a conventional Read command is given after tRCD has been satisfied.

Variable Write Latency

Conventional SDRAM including DDR I uses random accesses as the name implies. This means that the controller is free to write to any location within the physical memory space, which, in most cases, means that it will write to whichever page is open and to the column address closest to the (CAS) strobe. The result is a write latency of 1T, as opposed to read or CAS-Latency values of 2, 2.5 or 3. In DDR2, this changes in that the write latency will be the Read Latency (RL) minus 1T.

That means that at CAS-4, and AL-3 for a combined read latency of RL=7, the write latency will be 6T. This sounds somewhat worse than it is, especially compared to the 1T in DDR I but one needs to consider that, just like a read command, a write command will be issued early and will be using Posted CAS. That is, the write command abides by the same rules as the read command, only that the "Write Enable" signal is a logical "true" in this case. Effectively, therefore, the CAS latency is the important timing parameter to determine write latency, meaning that in the above example, the write latency will be 3T. This is only 3 times as long as the equivalent latency in DDR I. It will be very interesting to look at integrated graphics using UMA and DDR2 but it appears as if interesting is spelled u g l y.