问题描述:
Xilinx K7系列板卡,利用Vivado 18.2进行开发,在编译时发现编译时间很长,之前使用50M时钟配置时综合编译基本在40分钟内,后期换用100M时钟,编译时间为8小时。并且由于布线冲突导致生成比特流文件失败。
发现问题:
查看编译log,发现基本是卡在route 4.1,布局布线困难。
警告为:WARNING: [Route 35-447] Congestion is preventing the router from routing all nets. The router will prioritize the successful completion of routing all nets over timing optimizations.
意思大致为:拥塞阻碍了route,会优先完成route。
分析
接触FPGA时间不长,而且基本是在编写代码,对代码到硬件的映射实际上不是很懂,调试的时候也是没有做时序约束的,所以遇到这个问题的时候是比较困惑的,网上查了好久,也不知道从什么地方入手,最后记录一下自己的修改过程。
无论是官网,还是其他途径,找到的对于Congestion的描述,原因基本为以下三点:
1.时序不满足;
2.信号扇出太大;
3.资源利用率。
解决步骤
1.首先查看编译log,寻找问题详细描述以及看是否有解决办法(时序满足WHS<-0.4,THS<-400)
首先是place_design 的Log
发现冲突。
然后查看route报告
Router Utilization Summary
Global Vertical Routing Utilization = 19.6222 %
Global Horizontal Routing Utilization = 22.1567 %
Routable Net Status*
*Does not include unroutable nets such as driverless and loadless.
Run report_route_status for detailed report.
Number of Failed Nets = 0
Number of Unrouted Nets = 0
Number of Partially Routed Nets = 0
Number of Node Overlaps = 110
Congestion Report
North Dir 16x16 Area, Max Cong = 91.7723%, Congestion bounded by tiles (Lower Left Tile -> Upper Right Tile):
INT_L_X48Y206 -> INT_R_X63Y221
INT_L_X48Y190 -> INT_R_X63Y205
INT_L_X64Y190 -> INT_R_X79Y205
South Dir 16x16 Area, Max Cong = 90.5159%, Congestion bounded by tiles (Lower Left Tile -> Upper Right Tile):
INT_L_X48Y206 -> INT_R_X63Y221
INT_L_X48Y190 -> INT_R_X63Y205
East Dir 16x16 Area, Max Cong = 86.0064%, Congestion bounded by tiles (Lower Left Tile -> Upper Right Tile):
INT_L_X48Y206 -> INT_R_X63Y221
INT_L_X48Y190 -> INT_R_X63Y205
West Dir 16x16 Area, Max Cong = 95.7777%, Congestion bounded by tiles (Lower Left Tile -> Upper Right Tile):
INT_L_X48Y206 -> INT_R_X63Y221
INT_L_X48Y190 -> INT_R_X63Y205
INT_L_X48Y174 -> INT_R_X63Y189
CRITICAL WARNING: [Route 35-162] 191 signals failed to route due to routing congestion. Please run report_route_status to get a full summary of the design's routing.
Resolution: Run report_route_status to get a full summary of the design's routing. To find the areas of the congestion, use the route congestion Metrics in the Device View and check the logfile for the Congestion Report.
给出了Congestion的基本信息以及问题的解决建议。
2.在TCL对话框输入:report_route_status 获取详细报告信息。
给出了出现布线冲突的模块,只罗列了前10个。
但是到这里只是基本确定了出现问题的代码模块,详细信息还是不太确定。
按照log里的建议, 可以通过Device view > Metric Vertical/Horizontal routing congestion per CLB) 确定Congestion区域在device里的位置。
首先点击Open Implemented Design,然后在工具栏的report中,选择 Report Design Analysis,并选中该界面Congestion。打开拥塞报告
然后在选中一个window,在device中右键该区域,选择 Metric /Horizontal routing congestion per CLB), 将显示拥塞百分比以及相关tile。
可以选中一个区域,如CLBLM_R_X57Y203,标记并高亮,在两个slice中选择相关的beil,然后查看原理图。点击nets,在Find Results可以看到布线存在问题的信号。
可以通过此命令选中:如select_objects [get_tiles CLBLM_R_X63Y200]
如:
能够看到存在未route的信号。Flat Pins显示的是扇出。可与report_high_fanout_nets显示的信号进行比对。
结合xilinx官网文档信息:
When the metrics view is brought up, the Metric Results will show the congestion percentage (%) and the tile in question.
A value over 100 would mean that over 100 percent of the routing resources in this tile are used, and the router will have to divert routes that would normally use this tile around this area.
现在已经大致确定了问题以及区域,剩下的就是解决了。
①Utilization是造成Congestion的一个重要因素,高利用率更能容易导致拥塞。一般来说利用率超过80%(slice LUTS)的设计就很难满足布线和时序要求了。
针对高利用率的问题,可以通过平衡不同类型的site的利用率来解决。例如,如果LUT利用率过高,那可以将其中一些转移至块RAM中。
② 高扇出信号通过复制寄存器来解决,或者也可以通过在代码中添加(*MAX_FANOUT = value *) 进行扇出限制。
run report_hign_fanout_nets 查看高扇出信号。
+-----------------------------------------------------------------------------------------------------------------------+--------+-------------+
| Net Name | Fanout | Driver Type |
+-----------------------------------------------------------------------------------------------------------------------+--------+-------------+
| clk_manage_inst/Clk_Syn_inst/inst/CLK_100M | 54764 | BUFG |
| ddr3_manage_inst/u_mig_7series_0/u_mig_7series_0_mig/u_ddr3_infrastructure/CLK | 8102 | BUFG |
| DDC_System_inst/DDC_channel_A/ddc_module_cic_cicc1_i_inst/cic1_compensate_fir_inst/fir_fun_inst/Sum_reg_S1_reg[63]_0 | 7713 | LUT1 |
| clk_manage_inst/reset | 6861 | BUFG |
| DDC_System_inst/DDC_channel_A/ddc_module_cic_cicc1_i_inst/module_cic1_inst/cic_out_scale_inst/Data_Out_Valid | 6244 | FDCE |
| DDC_System_inst/DDC_channel_A/ddc_module_cic_cicc2_inst/cic2_compensate_fir_inst/fir_fun_inst/Adder_Reg_reg[51][32]_0 | 4566 | LUT1 |
| DDC_System_inst/DDC_channel_A/module_mmbf_inst/gen_mult_inst[4].module_fir_hbf/hbf_decimate_inst/Data_Out_ChIdx[0] | 4243 | LUT3 |
| DDC_System_inst/DDC_channel_A/module_dfir_inst/module_fir_inst/fir_fun_inst/Adder_Reg_reg[51][32]_0 | 4121 | LUT1 |
| DDC_System_inst/DDC_channel_A/module_mmbf_inst/gen_mult_inst[4].module_fir_hbf/hbf_decimate_inst/Data_Out_Valid | 2197 | LUT3 |
| DDC_System_inst/DDC_channel_A/ddc_module_cic_cicc1_i_inst/cic1_compensate_fir_inst/fir_fun_inst/fsm_current_state[1] | 1378 | FDCE |
+-----------------------------------------------------------------------------------------------------------------------+--------+-------------
非时钟驱动的高扇出信号会导致拥塞,尤其要注意其中由LUT驱动的高扇出信号。可以通过:
phys_opt_design -force_replication_on_nets
-force_replication_on_nets <args> - (Optional) Forces the drivers of the
specified net objects to be replicated, regardless of timing slack. This
option requires the use of the get_nets command to specify net objects.
Replication is based on load placements and requires manual analysis to
determine if replication is sufficient. If further replication is required,
nets can be replicated repeatedly by successive commands.
来复制寄存器,如果执行之后不能复制,可以考虑手动复制或者是加BUFG(但是添加BUFG会带来延迟,并且FPGA的BUFG资源有限)。
输入命令:
phys_opt_design -force_replication_on_nets [get_nets DDC_System_inst/DDC_channel_A/ddc_module_cic_cicc1_i_inst/cic1_compensate_fir_inst/fir_fun_inst/Sum_reg_S1_reg[63]_0]
执行后发现以下错误:
ERROR: [Vivado_Tcl 4-265] Option -force_replication_on_nets is specified but not supported yet for post-route physical synthesis. Please remove the option and rerun phys_opt_design
需要更改编译设置,为:opt-place-phys_opt-toute
③执行 report_design_analysis -congestion -complexity -hierarchical_depth 10,查看拥塞类型和等级。
一一查看拥塞类型,以及各种资源使用情况。发现拥塞都是出现在滤波模块的滤波器设计上,并且主要是FIR滤波器设计。
④执行 report_qor_suggestions,查看解决建议(此命令重点是与关键路径相关的QOR改进,但是当有拥塞问题时也会给出拥塞解决建议)。
(1)
对于Congestion ,给出建议为减少cell,组合逻辑LUT 等利用率。
(2)
(3)
这里可以看到Congestion的类型,有三种,分别为Global,long,short
按照上述建议进行修改。
造成这三类拥塞的原因是不同的。
Global:拥塞区域的Combined LUT过多,或者控制集过多;
Long:拥塞区域的BRAM、URAM和DSP过多,或者跨die路径过多;
Short:拥塞区域的MUXF或Carry Chain过多;