Posted CAS and Additive Latency (AL)

本文探讨了DRAM内存控制器中使用Posted CAS和加性延迟(AL)来避免命令总线冲突的方法,通过提前发出读取命令并将其内部推迟,可以消除数据流中的气泡,从而提升内存带宽性能。

Posted CAS and Additive Latency (AL)

For the DRAM designer and those who work on memory controllers, it is self-understood that the command bus can carry only one signal at the time. The same holds for the time-multiplexed address bus but at this point, we are only concerned with the command bus. Essentially, there are four command lines that are important, that is RAS, CAS, Chip Select (CS) and Write Enable (WE). Leaving CS out of the picture for now (since it only selects the physical bank out of all DIMMs within the system), any combination of high and low signals on the RAS, CAS and WE lines means either a bank activate (ACT), read, write, precharge or refresh command to name those commands important for the following. Typical command signals for the three lines mentioned above would be e.g. 101 or 110 or 010 using the RAS, CAS and WE line matrix. Keep in mind that we are talking about physical lines from separate pins on the controller to separate pins on the DRAM.

It is important to understand that only one command can be issued at any time because each of the three command lines can only be either high or low. Any two commands issued on the same clock will cause bus contention and so-called data collision (since at least one line would need to be high and low at the same time). For example, in bank interleave mode, a bank activate command to a second or third internal bank on the DRAM chip can be issued after the specified Row-To-Row Delay (tRRD). At the same time, because of a pre-defined RAS-To-CAS delay, a read command is already scheduled. The two commands coincide on the same clock and conflict or collide with each other on the command bus. Consequently, the next bank activate command will have to be pushed out by one cycle.

The bank activate is the first step in every memory access and, therefore, all subsequent steps like a read command to the same bank will be pushed out by the same 1 clock. Whatever terrain has been lost on the bank activate cannot be gained back by a faster CAS latency (it is impossible to change CAS latency on demand) and, therefore, there will be a gap or bubble in the data stream which will manifest itself in a memory bandwidth performance hit. We don't like performance hits, don't we?

Conventional command issuing (top) compared to Posted CAS mode (bottom)

Out of the four internal banks accessible in bank interleave mode, three are shown in green, blue and purple (commands and resulting data). The clock traces refer to the I/O buffers, not to the core. Act: Bank activate command; Read: Read Command; P-Rd: Posted Read / Posted CAS; D: data output (1 bit/pin, 1 quadword / bus width).

Conventional Operation: Bank activate commands to internal banks are given with a Row-to-Row delay (tRRD) of 2. In this particular case, tRCD, i.e., the delay until a read command can be given, equals 4T. That means that the first read command (four cycles after the first bank activate) will fall onto the same clock as the third bank activate, in other words, the commands conflict with each other or collide on the bus. Consequently, one of the commands needs to be shifted by one clock. Since tRCD is defined in the BIOS setup, whereas bank activate commands are issued whenever a memory access is started, it is the bank activate that will be postponed. Bank activate, however, is the one command at the beginning of every access and, therefore, all subsequent commands to the same bank will be delayed by one cycle as well. This causes a bubble in the data stream. CAS latency equals four cycles here (20 or 15 ns for DDR400 or DDR533, respectively)

Posted CAS Operation: Bank activate and read commands (CAS) to the same bank are issued by the controller in back-to-back mode on consecutive cycles. In this case, all activate commands are done on even cycles whereas the read commands are always on odd cycles. Internally, the read commands (CAS commands) are held and then issued after a predefined additive latency (AL) as a postponed read (P-Rd) or Posted CAS. Since the Posted CAS does not require any external command, the bus is free to communicate a new activate command on the same clock. In summary, instead of a normal tRCD, we have a single cycle delay for the read command to which we need to add the internal delay (hence the name additive latency; AL) for the equivalent of a RAS-To-CAS delay and with no need for an additional read command. This will avoid bus collision.

The solution to this problem is to issue the commands in form of bundles, that is, a read command is issued immediately on the next cycle after the bank activate command that it belongs to. A command buffer on the DRAM chip will hold the command and internally schedule it without any further input needed from the command bus. This means that the command bus is free to activate another bank. This mode of operation, using an early issued but internally postponed read (or CAS) command is called Posted CAS where the delay or additive latency (AL) is specified by the mode register set (MRS) during initialization of the DRAM chip.

The consequence is that bank activate and read commands that belong together can be issued on consecutive clock cycles and immediately thereafter free up the bus for the next frame information structure (oops, that was serial ATA). The net effect of Posted CAS and AL is that there will be no command bus collisions and, thus, no bubbles in the data stream.

Update

Posted CAS and Additive Latency are optional features that are supported by the DRAM devices but do not necessarily have to be used by any given controller. IBM's memory controllers are apparently using Posted CAS and AL, likewise, the features appear to be used in graphics cards. However, to the best of our knowledge, neither Intel, nVidia, ATI or AMD are using Posted CAS / AL on mainstream chipsets for the PC-Workstation platform. In this case, the AL feature is simply set to "0" and a conventional Read command is given after tRCD has been satisfied.

Variable Write Latency

Conventional SDRAM including DDR I uses random accesses as the name implies. This means that the controller is free to write to any location within the physical memory space, which, in most cases, means that it will write to whichever page is open and to the column address closest to the (CAS) strobe. The result is a write latency of 1T, as opposed to read or CAS-Latency values of 2, 2.5 or 3. In DDR2, this changes in that the write latency will be the Read Latency (RL) minus 1T.

That means that at CAS-4, and AL-3 for a combined read latency of RL=7, the write latency will be 6T. This sounds somewhat worse than it is, especially compared to the 1T in DDR I but one needs to consider that, just like a read command, a write command will be issued early and will be using Posted CAS. That is, the write command abides by the same rules as the read command, only that the "Write Enable" signal is a logical "true" in this case. Effectively, therefore, the CAS latency is the important timing parameter to determine write latency, meaning that in the above example, the write latency will be 3T. This is only 3 times as long as the equivalent latency in DDR I. It will be very interesting to look at integrated graphics using UMA and DDR2 but it appears as if interesting is spelled u g l y.

### BFAST模型用于时间序列分解 BFAST(Breaks For Additive Season and Trend)是一种用于时间序列分解的模型,特别适用于检测和分析具有季节性和趋势成分的时间序列中的突变点。该模型可以将时间序列分解为三个主要部分:趋势项、季节项和残差项,并进一步识别趋势项中的突变点。 BFAST的核心思想是通过构建一个加性模型来分离时间序列的趋势、季节性和残差成分,同时利用统计方法检测趋势中的结构变化[^1]。其基本形式可以表示为: $$ Y_t = T_t + S_t + R_t $$ 其中: - $ Y_t $ 是观测值; - $ T_t $ 是趋势项; - $ S_t $ 是季节项; - $ R_t $ 是残差项。 BFAST模型的关键步骤包括: 1. **季节性建模**:使用谐波回归(harmonic regression)对季节性进行建模,通常通过傅里叶级数近似。 2. **趋势建模**:使用线性回归或更复杂的模型(如多项式回归)来拟合趋势项。 3. **突变检测**:在趋势项中应用广义最大似然比检验(Generalized Maximum Likelihood Ratio Test)来检测潜在的突变点。 在R语言中,`bfast`包提供了实现BFAST模型的功能。以下是一个简单的代码示例: ```r # 安装并加载bfast包 install.packages("bfast") library(bfast) # 生成一个带有季节性和趋势的时间序列数据 set.seed(123) n <- 10 * 12 # 10年,每月一次数据 time <- seq.Date(from = as.Date("2000-01-01"), by = "month", length.out = n) trend <- 0.1 * (1:n) + sin(2 * pi * (1:n)/12) # 趋势和季节性 noise <- rnorm(n, sd = 0.5) ts_data <- ts(trend + noise, frequency = 12) # 使用bfast函数进行分解和突变检测 bfast_result <- bfast(ts_data, h = 0.2, season = "harmonic", max.iter = 10) # 可视化结果 plot(bfast_result) ``` 在这个示例中,`bfast()`函数被用来检测时间序列中的突变点。参数`h`指定用于分割样本的比例,`season`指定季节性的建模方式,`max.iter`控制最大迭代次数。 BFAST模型广泛应用于遥感数据分析、环境监测等领域,尤其是在植被指数(如NDVI)的长期变化分析中表现优异[^4]。它能够有效识别由于气候变化或人为活动引起的突变事件。 ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值