FPGA clock resources

good article cited from here: https://www.allaboutcircuits.com/technical-articles/clock-management-clock-resources-of-fpgas/

This article will review the common FPGA resources that allow us to efficiently generate and distribute clock signals throughout a system.

There are many circumstances in which we need to generate new clock signals and efficiently distribute them throughout a system. Modern FPGAs have dedicated clock management blocks that allow us to perform these two tasks.

In this article, we’ll first briefly review the issues of efficient clock distribution and new clock generation. Then, we’ll see that these two design requirements can be easily handled by the clock management blocks of modern FPGAs.

Efficient Clock Distribution

Even in a small digital design, the clock signals may be distributed to hundreds of clocked elements throughout the system. These high-fanout clock signals are responsible for synchronizing different subsystems or components of the system. That’s why we need to pay careful attention to the delay introduced by different paths of the clock network.

As an example, consider distributing a clock signal from node A to nodes B and C as shown in Figure 1 below. In this figure, the clock signal enters the FPGA and goes through a buffer (shown as a triangle in the figure) and then reaches a register at node B. This path has an inherent delay which is shown by Δb in the waveforms. To distribute the clock from A to C, we could have used a PCB trace without going through the FPGA but, in the figure, the clock distributed in the FPGA is used for the “Other Device on Board”. The delay of this path is represented by Δc. Since Δb and Δc are not equal in general, the register at node B and the device at node C will see slightly different clocks; this fixed time offset between identical clocks that are supposed to have aligned edges is called clock skew. We can see, then, that one serious problem in digital design is distributing high-fanout clock signals across a system with the minimum possible clock skew.

 

Figure 1. The delay in distributing the clock signal. Image courtesy of Xilinx.

 

Generating New Clock Signals

A desired feature of an FPGA can be the ability to modify a given clock signal to generate new clocks based on the requirements of the system. For example, assume you’ve designed a board with an FPGA. Suppose that a 50 MHz clock is sufficient for the board but, in the FPGA, you need a 200 MHz clock to perform a particular algorithm. If you could somehow generate a 200 MHz clock from the 50 MHz clock on the board, you could avoid using two different clock sources in your system, and you also would not have to operate the board at a higher-than-necessary frequency. We can identify another application of clock synthesis if we recall that some digital signal processing (DSP) applications are multirate and require different clock frequencies in different parts of the algorithm.

Now, let’s look at the clock-related resources of FPGAs that allow us to generate a new clock signal and distribute it across the system.

Clock Management Resources of FPGAs

Today’s FPGAs incorporate powerful clock management blocks to facilitate the design process and reduce costs. We’ll refer to these embedded clock management blocks as CMBs. Different vendors use different terms to refer to their CMBs. For example, Xilinx uses clock management tile (CMT) or digital clock manager (DCM), Intel uses the well-known term phase-locked loop (PLL), and Microsemi uses clock conditioning circuitry.

The CMBs can generate new clock signals by performing clock multiplication and division. They may be able to apply a programmable phase shift to a generated clock or even adjust the duty cycle of a clock. CMBs can de-skew not only the clock signals distributed within the FPGA but also the clocks distributed out of the FPGA to other devices on the board (such as the clock going to node C in Figure 1).

One application for the programmable phase shift feature is synchronizing the clock signal with input data. When we’re working with digital signals, we have to sample the signal at moments when the data is stable (after the last data transition and before the next transition). That’s why some FPGAs provide a dynamic mechanism for fine phase shift adjustment of the generated clock. For example, the CMBs of the Xilinx Virtex II Pro can apply a phase shift of 1/256th of the clock period. Without this phase shift capability, the clock may not fall at the center of the data eye.

A CMB with the duty cycle adjustment feature can be helpful in applications that need a clock signal with 50% duty cycle. Examples are high-speed communications such as LVDS and dual data rate (DDR) interfaces. If the clock duty cycle is not 50%, we can use the duty cycle correction feature of a CMB to reshape the clock signal to have a 50% duty cycle.

To obtain these features, FPGAs use CMBs based on a phase-locked loop (PLL) or a delay-locked loop (DLL). In addition to the PLLs and DLLs, there’s another very important group of clock-related FPGA resources: the dedicated buffers and routes for distributing the clock signals in the FPGA. These dedicated buffers and routes enable us to have a low-skew clock network.

In the rest of the article, we’ll first review the basic concepts of using a PLL/DLL to implement a CMB. Then, we’ll look at the low-skew clock network employed by the Intel FPGAs.

DLL-Based Clock De-skew

Figure 2 shows the basic block diagram of a DLL used to compensate for clock distribution delay.

 

Figure 2. Block diagram showing the use of a DLL to compensate for clock skew. Image courtesy of Xilinx.

 

In this figure, CLKIN is the input clock that we intend to distribute through the “Clock Distribution Network”. As the name suggests, the “Variable Delay Line” block introduces an adjustable delay to CLKIN and produces CLKOUT. The amount of delay is determined by the “Control” logic. The “Control” block monitors its inputs and varies the delay so that the rising edges of the two inputs, CLKIN and CLKFB, are aligned. In this way, the circuit compensates for the delay in the clock distribution network, and theoretically we can de-skew the clock distribution.

Note that the “Variable Delay Line” cannot apply a negative delay to CLKIN to compensate for the inherent delay of the “Clock Distribution Network”. So, how can the block diagram of Figure 2 eliminate the clock skew? The key point is that the clock signal CLKIN is periodic. To clarify this, let’s consider the clock skew example of Figure 1. The path from A to B has a delay of Δb, therefore, we need to apply a delay of −Δb (i.e., a negative delay) to the clock at node A and then distribute it through the path with delay Δb. How can we circumvent the problem of creating a negative delay? Remember that the clock signal is a periodic waveform. Assume the period is T. We know that a periodic function satisfies

f(t)=f(t+T)

Applying a negative delay of −Δb, we have

f(t−Δb)=f(t+T−Δb)

Therefore, instead of applying a negative delay of −Δb, we can apply the positive delay of T−Δb. This is shown in Figure 3 below; note that Figure 3 illustrates the delay that the DLL applies to the original clock signal, in contrast to Figure 1, which shows the delay caused by the path from A to B and from A to C.

 

Figure 3. For a periodic signal, an appropriate positive delay can function like a negative delay. Image courtesy of Xilinx.

PLL-Based Clock De-skew

Instead of using a DLL, we can use a PLL to effectively eliminate the delay of the clock distribution network. This is illustrated in Figure 4.

 

Figure 4. PLL-based clock de-skew. Image courtesy of Xilinx.

In this case, a “Voltage Controlled Oscillator” is used instead of a “Variable Delay Line”. The feedback loop adjusts the frequency of the oscillator such that the two clock signals CLKIN and CLKFB have aligned edges.

We can easily stabilize the feedback loop of a DLL; however, this is not the case with a PLL, because the clock distribution network will be within the PLL's feedback loop. That’s why a DLL-based structure more easily lends itself to clock de-skew applications. On the other hand, a PLL is more flexible when synthesizing new clock signals.

Based on our discussion, we can use the CLBs of FPGAs to de-skew clock distribution of Figure 1 as shown in the simplified block diagram of Figure 5 below. Note that the CMBs can de-skew not only the clock signals distributed within the FPGA but also the clocks going to other devices on the board.

 

Figure 5. Using CLBs to perform clock de-skew. Image courtesy of Xilinx.

 

We saw that the CMBs can be used to synthesize new clock signals. Now, let’s look at another group of clock-related FPGA resources: the dedicated buffers and routes that can be used for distributing the clock signals in the FPGA.

Clock-Related Buffers and Routes

These resources are responsible for distributing high-fanout clock signals with a low skew. Going through the details of these buffers and routes can be overwhelming but we’ll have a brief review of some important concepts. To distribute a clock signal to all parts of a chip with equal propagation delays, we can use a special form of routing called an H tree. You can see an example in Figure 6.

 

Figure 6. An H tree. Image courtesy of IEEE.

 

You can easily verify that there’s a similar path from CLK-in to each of the rectangles (which represent a clocked element). Therefore, theoretically, all the clocked elements will see the same clock signal. Figure 7 shows the H tree that some Intel FPGAs use to distribute a global clock signal (GCLK).

 

Figure 7. Image courtesy of ACM. --> GCLK

GCLKs are driven throughout the device and serve as low-skew clock sources for functional blocks such as adaptive logic modules (ALMs), digital signal processing (DSP) subcircuits, embedded memory, and PLLs.

In addition to GCLKs, there are regional clock (RCLK) networks in the Intel FPGAs. As shown in Figure 8, these RCLKs are driven into only one quadrant of the chip.

 

Figure 8. Intel quadrant clock networks. Image courtesy of ACM. --> RCLK

The Intel devices also have periphery clock (PCLK) networks that are distributed in a smaller area of the chip. An example is shown in Figure 9 below.

Figure 9. The Intel periphery clocks. Image courtesy of ACM. --> PCLK

 

Now, overlaying all of these networks, we get the network shown in Figure 10.

Figure 10. The clock network of an Intel FPGA combines GCLK, RCLK, and PCLK networks. Image courtesy of ACM.

As shown in Figures 8 and 9, FPGAs have dedicated clock routes that are distributed in just one region of the chip. These are called clock regions. Note that different devices have different clock regions. Also note that we need some buffers to efficiently distribute a clock signal along the dedicated routes that we discussed above. You can find the details in your device datasheet, but one example is shown in Figure 11. This figure shows one clock region of a Xilinx 7 series FPGA. BUFG, BUFH, and BUFR are three clock-related buffers that are shown in the figure.

 

Figure 11. There are several clock-related buffers in each clock region of the Xilinx 7 series FPGAs. Image courtesy of Xilinx.

Here are some concluding notes on FPGA clock resources:

  1. We can use the clock-related buffers to dynamically enable and disable the clock of a particular region of the chip. This clock enable option can be used as a clock gating technique to reduce the power consumption of a design. For more details refer to the Xilinx 7 series Clocking Resources User Guide or Intel’s Clock Control Block IP Core User Guide.
  2. The number of high-performance buffers and routing resources are limited, so they should be carefully managed. If there are several clocks in your design, try to assign high-speed and high-fanout clocks to higher-performance global clock routes and resources. For more details, refer to Chapter 9 of Rapid System Prototyping with FPGAs: Accelerating the Design Process.

To see a complete list of my articles, please visit this page.

<think>好的,用户的问题是关于“global clock nets are using local routing resources”,也就是全局时钟网络使用本地布线资源的情况。首先,我需要理解用户可能的背景。他可能是在学习数字电路设计,特别是FPGA或ASIC设计,遇到了时钟网络相关的问题。用户可能遇到了设计中的时序问题,或者想优化时钟分布,所以需要了解为什么全局时钟网络会占用本地资源,以及如何解决。 首先,我应该回忆一下时钟网络的基本概念。在芯片设计中,时钟信号需要以低延迟、低偏斜的方式传递到各个触发器。通常,专用全局时钟网络(如FPGA中的全局时钟树)用于此目的,因为它们具有专用的布线资源和缓冲器,以确保信号质量。但是,有时候设计中的时钟信号可能没有正确使用这些全局资源,转而使用了本地布线资源,这会导致问题。 接下来,我需要分析可能的原因。可能的原因包括:设计中没有正确约束时钟信号,导致综合工具误用本地布线;时钟信号被逻辑门驱动,无法使用全局资源;或者设计中时钟数量超过芯片提供的全局资源,迫使工具使用本地资源替代。 然后,考虑用户可能的疑问:为什么使用本地布线会有问题?答案应该是本地布线延迟大、偏斜高,影响时序性能,可能导致建立和保持时间违规,甚至系统不稳定。 接下来,如何解决这个问题?需要分步骤说明:检查时钟约束是否正确,确保时钟信号被识别为全局;避免在时钟路径中使用组合逻辑;增加全局时钟资源或优化时钟结构;使用工具报告分析问题根源。 还要注意用户可能的误区,比如认为所有时钟信号都会自动使用全局资源,但实际上需要正确配置和约束。此外,可能需要提及具体的EDA工具中的设置,比如在Vivado或Quartus中如何指定时钟网络。 最后,确保回答结构清晰,分点说明问题现象、原因、影响和解决方案,使用中文,避免术语混淆,必要时举例说明。同时,检查是否符合系统级指令,比如数学表达式的格式是否正确,虽然这个问题可能涉及较少数学,但如果有公式应正确应用格式。</think>在数字集成电路设计中,全局时钟网络使用本地布线资源的现象通常与以下原因和解决方案相关: ### 一、问题现象 当芯片的全局时钟网络(Global Clock Network)未按预期使用专用时钟资源,反而占用普通逻辑单元之间的本地布线资源(Local Routing Resources)时,可能出现: - 时钟信号延迟增加(典型值可达 1-5ns) - 时钟偏斜(Clock Skew)显著增大(例如从 50ps 增加到 300ps) - 时序违例(Setup/Hold Violation)风险提高 ### 二、根本原因 1. **时钟信号未正确约束** - 综合工具未识别关键时钟信号 - 示例:未使用 `create_clock` 约束定义主时钟 2. **时钟路径经过组合逻辑** ```verilog assign clk_div = counter[3]; // 分频时钟经过组合逻辑门 ``` 这种派生时钟无法使用专用时钟缓冲器(BUFG) 3. **全局时钟资源耗尽** - FPGA 的全局时钟缓冲器数量有限(例如 Xilinx UltraScale+ 每个Bank 24个BUFG) - 当时钟数量超过器件容量时,工具自动降级使用区域时钟资源 ### 三、解决方案 1. **正确约束时钟** ```tcl create_clock -period 10 [get_ports clk] # 定义主时钟 derive_pll_clocks # 自动派生PLL生成时钟 ``` 2. **优化时钟结构** - 使用专用时钟缓冲器(BUFG/BUFH) - 避免时钟门控(Clock Gating)改用同步使能 ```verilog always @(posedge clk) begin if (enable) q <= d; // 推荐:同步使能 end ``` 3. **资源扩展策略** - 对低速时钟使用区域时钟资源(如BUFH) - 采用时钟复用技术(Clock Multiplexing) 4. **设计分析** ```tcl report_clock_networking # 查看时钟网络分布 report_clock_utilization # 检查BUFG使用率 ``` ### 四、对比影响 | 指标 | 专用时钟网络 | 本地布线资源 | |---------------|-------------------|-------------------| | 延迟 | 0.5-2ns | 2-5ns | | 偏斜 | <100ps | 200-500ps | | 功耗 | 低(专用低电容路径) | 高(经过更多开关)| | 最大频率 | 500MHz+ | <200MHz | ### 五、实践建议 1. 在FPGA设计中优先使用片上PLL/DCM生成派生时钟 2. 当使用多个时钟域时,采用跨时钟域同步电路(如双触发器同步器) $$ t_{met} > t_{clk} + t_{jitter} $$ 3. 对时钟走线进行后期时序分析: ```tcl report_timing -from [get_clocks] -max_paths 20 ``` 通过合理规划时钟架构和正确约束,可将时钟网络偏斜控制在器件规格的10%以内(例如对1GHz时钟实现<100ps偏斜)。在Xilinx UltraScale+器件中,专用时钟网络可支持高达931MHz的全局时钟频率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值