SUMMARY OF KEY POINTS
. A high-throughput architecture is one that maximizes the number of bits
per second that can be processed by a design.
. Unrolling an iterative loop increases throughput.
. The penalty for unrolling an iterative loop is a proportional increase in
area.
. A low-latency architecture is one that minimizes the delay from the input
of a module to the output.
. Latency can be reduced by removing pipeline registers.
. The penalty for removing pipeline registers is an increase in combinatorial
delay between registers.
. Timing refers to the clock speed of a design. A design meets timing when
the maximum delay between any two sequential elements is smaller than
the minimum clock period.
. Adding register layers improves timing by dividing the critical path into
two paths of smaller delay.
. Separating a logic function into a number of smaller functions that can be
evaluated in parallel reduces the path delay to the longest of the
substructures.
. By removing priority encodings where they are not needed, the logic structure is flattened, and the path delay is reduced.
. Register balancing improves timing by moving combinatorial logic from
the critical path to an adjacent path.
. Timing can be improved by reordering paths that are combined with the
critical path in such a way that some of the critical path logic is placed
closer to the destination register.
. A high-throughput architecture is one that maximizes the number of bits
per second that can be processed by a design.
. Unrolling an iterative loop increases throughput.
. The penalty for unrolling an iterative loop is a proportional increase in
area.
. A low-latency architecture is one that minimizes the delay from the input
of a module to the output.
. Latency can be reduced by removing pipeline registers.
. The penalty for removing pipeline registers is an increase in combinatorial
delay between registers.
. Timing refers to the clock speed of a design. A design meets timing when
the maximum delay between any two sequential elements is smaller than
the minimum clock period.
. Adding register layers improves timing by dividing the critical path into
two paths of smaller delay.
. Separating a logic function into a number of smaller functions that can be
evaluated in parallel reduces the path delay to the longest of the
substructures.
. By removing priority encodings where they are not needed, the logic structure is flattened, and the path delay is reduced.
. Register balancing improves timing by moving combinatorial logic from
the critical path to an adjacent path.
. Timing can be improved by reordering paths that are combined with the
critical path in such a way that some of the critical path logic is placed
closer to the destination register.
本文探讨了高吞吐量和低延迟架构的设计原则,包括迭代环展开、流水线寄存器移除等方法来提升系统性能。同时介绍了如何通过增加寄存器层、平衡寄存器等方式改善定时特性。
294

被折叠的 条评论
为什么被折叠?



