Advanced DFT And Silicon Bring-Up For AI Chips

Newer EDA technology eases AI chip development.

popularity

The AI market is growing quickly, spurring an insatiable demand for powerful AI accelerators. AI chip makers are pressed with aggressive time-to-market goals and need the tools to help them get their chips into the hands of customers as quickly as possible. IC test and silicon bring-up are tasks that can affect both the quality and the time-to-market of AI chips.

Different companies are using different hardware development techniques to meet the AI compute growth requirements. Some use novel, massively parallel architectures that maximize the data processing capabilities for the AI workloads. Others continue to develop and optimize the existing architectures like GPU, CPU, and FPGA to keep up with the performance requirements of AI systems and stay ahead of emerging architectures. However, AI chips generally share a few key design characteristics even though they can have different implementations and architectures.

The AI chip architecture and test requirements have an impact on the DFT implementation strategy. No matter what the architecture, the AI chips typically have the following design characteristics: Large designs with billions of gates, a large number of replicated processing cores and distributed memories. Designers need EDA tools that address the challenges of each of these AI chip characteristics.

For DFT and silicon bring-up, a tool solution should be able to do these three things to speed up AI chip development time:

  • Exploit AI chip regularity
  • Shift-left DFT
  • Eliminate DFT-to-test iterations

Taking advantage of chip regularity – hierarchical DFT and core groups

AI chips typically contain a large number of identical cores. Exploiting the AI chip regularity for DFT implies that all the DFT work—including test insertion, test pattern generation, and verification—is completed just once at the core level. The complete, signed-off core is then replicated automatically to complete the chip-level DFT implementation.

Hierarchical DFT is an ideal solution to exploit the regularity in AI chips and allow complete DFT sign-off of the blocks at different hierarchy levels. In the example shown in figure 1, there are three levels of hierarchy: core (tile), block (supertitle), and chip. The core is instantiated multiple times in the block which is then instantiated multiple times at the chip level.

Fig. 1: Tessent hierarchical DFT allows for complete DFT sign-off at different levels of design hierarchy.

Hierarchical methods are made even more effective by using a packetized scan data architecture approach, provided by Tessent Streaming Scan Network (SSN). At the top level, all the blocks can be connected to an SSN bus infrastructure, which forms the mechanism for delivering the packetized scan data. Block configuration data is delivered to each block via the IJTAG network. DFT signals are then locally generated from the SSH (Streaming Scan Host) IP, enabling each block to run independently.

For AI chips, DFT at the individual core level would incur somewhat of an area-usage penalty because the DFT logic like isolation wrappers, compression logic and memory BIST controllers are duplicated in each core. If DFT is implemented at the chip level, the result would be longer ATPG runtime, large memory requirement to load the entire design, layout challenges when routing scan chains through all cores to the compression engine and test power constraints as all scan chains are active at the same time.

The sweet spot for DFT, then, is with core groups that balance area overhead with development time overhead, as shown in figure 2. This implementation groups multiple cores together for DFT often referred to as a super core or tile. All the DFT logic is inserted and signed-off at the core group level and SSN can better manage and balance the test pattern delivery.

Fig. 2: Implementing DFT at the core group level is most efficient, avoiding the drawbacks of both core-level and chip-level DFT.

For AI chips using multi-die 2.5D, 3D and 5.5D packaging, designers will use the same hierarchical methodology and the IEEE 1838 standard along with EDA tools to implement the standard, including the flexible parallel port (FPP). Tessent SSN is well suited to act as the FPP, which further eases the implementation. When using SSN at the die level, moving to multi-die is just an extension of the architecture rather than a completely new architecture.

Another technology that’s useful for developing AI chips reduces the area overhead of memory BIST controllers. Rather than inserting a controller in every core, a single Tessent MemoryBIST controller can be shared between multiple memories in multiple cores. A shared-bus implementation lets the designer optimize the memory access bus for routing and timing and to provide the memory BIST controller connection interface (figure 3).

Fig. 3: A single memory BIST controller can be shared between memories in multiple cores.

AI chips can also use hierarchical DFT during failure diagnosis after the IC is manufactured. It allows for core-level diagnosis, which significantly accelerates the process of diagnosis and failure analysis. This hierarchical diagnosis methodology aligns perfectly with the AI chip architectures, which contain repeated identical processing cores.

Shift left – insert DFT at RTL

DFT logic is traditionally inserted at the gate-level design during or after synthesis. This approach suffers from two significant drawbacks.

  • The gate-level design is usually larger than the RTL, so it takes much longer to simulate and debug. RTL compile and regression debug runtime can be up to 4 times and 20 times faster than gate compile and regression debug runtime respectively.
  • Any changes in the DFT logic or configuration requires another synthesis iteration of the entire design before verification can be performed.

For a huge AI chip, having to repeat simulation, debug, and synthesis for each iteration would significantly impact the design schedule. DFT insertion in RTL mitigates these issues because any DFT changes can be verified and debugged faster in RTL without the need of going through the synthesis step. Also, RTL DFT insertion also allows for early I/O and floor planning of the chip. This process significantly shortens the design development cycle. The difference between DFT during or after synthesis and DFT at RTL is illustrated in figure 4.

Fig. 4: Inserting DFT at the RTL level helps reduce time-consuming iterations.

An EDA tool that inserts DFT at RTL should also perform testability checks and fixes. Designers can achieve higher test quality in less time by performing all the DFT checking and fixing most testability issues at RTL before running ATPG.

Eliminate DFT-to-test iterations – connect DFT with ATE

Reducing the time spent on silicon bring-up is critical in getting AI chips into the hands of customers. The traditional process of silicon bring-up typically involves multiple iterations between the DFT domain and the test/ATE domain for pattern debug, characterization, test optimization, and test scheduling. There is often a back-and-forth interaction between DFT engineer with test logic knowledge and the test engineer familiar with the tester, illustrated in figure 5.

Fig. 5: IP evaluation requires significant learning and is prone to errors causing increased cycle time.

This flow can be vastly improved with software that lets DFT engineers perform the silicon bring-up themselves. The test engineers can then run diagnosis in several resolutions from flop-level to net-level without the help of DFT engineers. This solution speeds up the entire process by eliminating the time-consuming and costly iterations between DFT and ATE domains.

Conclusion – don’t neglect DFT and silicon bring-up solutions for faster AI chip development

The Tessent software from Siemens EDA is well suited to addressing all of the challenges faced when designing AI chips including. The key tools include:

  • Tessent TestKompress – for core level ATPG pattern generation
  • Tessent SSN – for optimized packetized scan-based pattern delivery ideally suited for applications with many identical cores
  • Tessent RTL Pro – for RTL-based hierarchical DFT construction
  • Tessent Multi-die – for DFT of multi die devices
  • Tessent SiliconInsight – for reducing iterations in silicon bring-up


Leave a Reply


(Note: This name will be displayed publicly)