Conquer Placement And Clock Tree Challenges In HPC Designs

Bringing real route information and parasitics to any step in the place-and-route flow.

popularity

High-performance computing (HPC) applications require IC designs with maximum performance. However, as process technology advances, achieving high performance has become increasingly challenging. Designers need digital implementation tools and methodologies that can solve the thorny issues in HPC designs, including placement and clock tree challenges.

Placement and clock tree synthesis are critical steps in the physical design of high-performance compute integrated circuits (ICs), and they pose several challenges, including:

  • Congestion: ICs for HPC have a large number of logic elements and interconnects, which can lead to significant congestion, impacting the quality of the design and the final performance of the IC.
  • Timing closure: Achieving timing closure is challenging in HPC ICs because of their high operating frequency and the complex nature of the clock network. It requires careful optimization of the placement of the components, the structure of the clock network, and the routing, plus excellent correlation between pre- and post-route and with signoff tools.
  • Power and thermal considerations: The placement of the components and the routing of the clock network can impact the power consumption and thermal behavior of the IC. Power consumption and thermal management are critical considerations in high-performance ICs as they can significantly impact the reliability and performance of the design.
  • Design rule constraints: Especially at advanced nodes, design rule constraints can make the placement and clock tree synthesis challenging. Closing design rules through multiple ECO cycles can impact the design schedule.

The placement and clock tree solution for HPC ICs

Siemens EDA offers Aprisa, a modern physical design implementation solution for hierarchical and block-level designs that addresses all the challenges of HPC ICs (figure 1). It was built from the ground up with a detail-route-centric architecture that reduces the time to design closure. Its unified data model is shared throughout the entire flow, bringing real route information and parasitics to any engine and any step in the place-and-route flow. This allows for consistent timing and DRCs across engines, which translates into excellent correlation with signoff tools and reduction of the number of ECOs.

Fig. 1: Aprisa architecture, the detail-route-centric digital implementation solution for fast design closure.

Aprisa optimizes placement with advanced knowledge of the detail routing, which minimizes the need for manual guidance from designers. By reducing or eliminating place guides altogether, designers can quickly achieve optimal placement without having to rely on experience and full-flow pre-runs. This early insight into timing, area, power, and congestion is extremely valuable for saving time and cost, especially as design cycles increase due to the complexity and size of today’s ICs.

Analyzing design tradeoffs

At advanced nodes, the cost of adding each additional metal layer to a chip increases dramatically, often exceeding millions of dollars. Therefore, it is crucial for chip designers to carefully weigh the benefits of saving power versus the added cost of additional metal layers.


Fig. 2: Arm core A76 frequency vs. use of metal layers tradeoff.

Aprisa’s ability to provide early insight into PPA metrics allows designers to fine tune their options for optimizing performance vs. power vs. cost, very early in the flow. Sacrificing a little power savings, for example, helps them reduce manufacturing costs and achieve the desired performance. These opportunities to study metrics while keeping their time-to-market goals, empower designers to make informed decisions about tradeoffs that impact their bottom line.

Clock tree methodology for HPC ICs

Among other top challenges in HPC designs is clock tree synthesis (CTS). CTS is a critical step in the physical design process, as it determines the final timing of the design. CTS involves routing the source of the clock to all the sinks, including registers, latches, clock gates, and macro clock pins. A low-quality CTS can result in poor timing, high power consumption, and poor signal integrity.

The three primary factors that impact CTS are:

  • Clock skew—the difference in arrival times of the clock signal at different points in the circuit.
  • Insertion delay—the time it takes for the clock signal to propagate through the clock tree.
  • Buffering—inserted into the clock tree to maintain the clock signal’s integrity and ensure that the signal arrives at all the flip-flops simultaneously.

Aprisa supports useful skew starting at placement and continuing all the way to route optimization. This ensures that the challenging frequency targets of HPC designs are met. A strength of Aprisa’s CTS technology is that the push and pull offsets generated during placement optimization are realized during clock tree implementation.

Another significant advantage to CTS is the ability to merge/de-merge of multi-bit flip-flops and clone/declone of integrated clock gates (ICGs) based on the timing, physical location of cells, and criticality of the paths. Because Aprisa understands CTS starting at the placement optimization stage, it can produce the most optimal clock tree and reduce clock power.

After CTS, Aprisa will recover congestion created during CTS without impacting timing. With traditional tools, designers would have to iterate back to placement optimization to reduce congestion.

During post CTS and route optimization, Aprisa can apply useful skew to further improve the timing and achieve excellent correlation.

Clock structures of HPC

Multi-point is the most popular approach for HPC designs. It offers better clock skews than single-point and uses less power than a clock mesh. A multi-point clock structure splits the design into partitions, with each partition connected to an anchor buffer from the top-level design.

Aprisa automatically creates partitions based on anchor cells and it factors metrics like timing, physical proximity, and load on each anchor cell. Designers can also use Aprisa to automatically generate the anchor points, which can save a significant amount of time and effort in the design process.

Conquering all the HPC design challenges

Aprisa addresses all of the implementation challenges for HPC designs at advanced nodes using easy to deploy out-of-the-box reference flows. It provides industry-leading correlation to signoff tools while providing a number of technologies that reduce the number of ECO iterations. It ensures all PPA metrics are carefully balanced for HPC design implementation through high-quality clock trees, placement and patented routing technologies that reduce timing closure friction between the block and top-level during assembly.

With the help of the right place-and-route tool, designers can bring their HPC design innovations to market faster with fewer engineering and compute resources.



Leave a Reply


(Note: This name will be displayed publicly)