Can a single kit optimize all CPUs, GPUs and DSPs?
On a typical System-on-Chip (SoC), CPUs, GPUs and DSPs each have unique requirements to achieve optimal results from logic libraries and memory compilers. However, at the end of the day, they all reside in the same EDA database that goes through an EDA flow of timing closure, area/power minimization and physical/logical verification before tapeout.
Instead of each processor having its own design kit, what if there was a single design flow, integrated with a set of libraries and memories that could be used across an entire SoC for a specific foundry process, that addressed the diverse of requirements of CPUs, GPUs, DSPs and general SoC blocks?
The following (Figure 1) is an example of a typical SoC for mobile applications.
Figure 1. Typical Mobile SoC with CPUs, GPUs, DSPs and general SoC blocks
CPUs typically occupy a small area of the overall SoC but need the highest performance, often in burst mode for short periods of high computation. To achieve these speeds, designers use high-performance libraries, low (or super low) threshold devices and overdrive voltages. All of this can push the logic and memories for setup and access time, as well as strain the local power grid for IR drops in the short term and electro-migration (EM) in the long term.
GPUs typically occupy a large area of the overall SoC so optimization of total routed area results in real cost savings. Operating frequencies are much lower, as GPUs lend themselves well to parallelization. But due to the nature of this parallelization, it can cause tremendous routing congestion with all of the multiple busses coming together with parallel results. To achieve the smallest silicon area, designers use high-density libraries, regular and high (or super high) threshold devices and sometimes lower voltages. The use of multi-bit flip flops, ultra high-density memories and other innovative circuits have shown to reduce GPU area by 10%, leakage power by 20% and dynamic power by 25% in GPU applications.
DSPs can take on many of the attributes of either CPUs or GPUs, depending on the application. If the DSP is being used in a cellular base station, multiplexing as many signals as possible from a box on a cell phone tower, then it shares many of the requirements of a CPU. On the other hand, if the DSP is being used in a handset, area is more critical at a moderate speeds similar to those of a GPU.
General SoC blocks also can take on some of the attributes of CPUs or GPUs. High-speed interfaces such as DDR or SerDes and large switch-box networks that connect processors across the SoC need the highest performance libraries. Moderate speed interfaces such as USB and MIPI and image processing circuits can take advantage of high-density libraries. Designing off-chip asynchronous interfaces and on-chip asynchronous clock domain crossings have additional requirements. Circuits known as synchronizers are built using metastable characterized flip-flops to design around potential failures. These flip-flops are specially designed to minimize the probability of entering a metastable state and maximizing the ability to recover from such a state so as to not cause functional failure. They are also specially characterized so that designers can determine the type of synchronizer circuit required given a specific set of operating conditions.
Challenges in High-Performance CPUs
To illustrate the power strains of running CPUs at high frequency with overdrive voltages, the following graph (Figure 2) shows how electromigration limits (green line in the figure) can be crossed based on frequency (in the case of clock cells) or toggle rate (in the case of non-clock cells), which only gets worse when driving heavier circuit loads with high fan-outs (and even more with elevated temperatures).
Figure 2. Electro-migration safe operating range of standard cell based on power bus width for fan-outs of 1 and 3
High-performance logic libraries (typically taller cells) and high-performance design kits must be able to address these types of requirements and enable SoC designers to use their EDA tools and flows to manage IR drop and electro-migration.
Challenges in High-Density GPUs
To illustrate the density strains of squeezing massively parallel GPUs into the absolute minimum area, the following image shows how routing congestion of high pin density cells can limit the routability of very densely compacted logic libraries (typically shorter cells).
Figure 3. Routing congestion diagram of processor with red and orange showing highest congestion.
Managing routing congestion at the design level requires high-density libraries that understand how placers and routers work to be able to provide them with the right resources so that they can most effectively and efficiently perform their complex tasks. At the end of the day, the area of routed blocks that meet all of their functional, timing and power requirements can easily be translated into real cost savings.
And Minimizing Power Everywhere
Power must be efficiently managed at all levels of any design. Power optimization kits provide all of the switches, isolation cells, always-on cells, retention flops to manage all techniques from simple shut down, to multiple voltage domains to dynamic frequency scaling. Since typically half of an SoC is covered with memory, being able to support multiple sleep modes efficiently right within the memory compilers and instances provides the most efficient and easy-to-implement solution.
Conclusion
The above are just a few examples of the challenges that designers often meet on a single SoC project. Different blocks within the SoC also can take advantage of these libraries, memories, power kits and design kits to produce optimal SoC designs that serve their market requirements at the lowest possible cost. Synopsys has worked with leading CPU, GPU and DSP vendors as well as many experienced SoC design teams to define, build and validate the DesignWare High Performance Core (HPC) Design Kit that contains the critical libraries with special cells and memory compilers with customized instances—all in one box. This design kit enables designers to achieve the highest performance, lowest power and minimum area on their processor-based SoCs—leveraging advanced EDA tools in proven design flows to develop the most competitive SoCs in the shortest time possible.
For more information, visit: DesignWare Embedded Memories and Logic Libraries
See related white paper: CPU, GPU and DSP Core Optimization for High Performance and Low Power
Leave a Reply