SPONSOR BLOG

Better Optimization For Many-Core AI Chips

System-wide functional analysis helps optimize many-core SoCs and get them to market on time.

August 10th, 2021 - By: Richard Oxland

The rise of massively parallel computing has led to an explosion of silicon complexity, driven by the need to process data for artificial intelligence (AI) and machine learning (ML) applications. This complexity is seen in designs like the Cerebras Wafer Scale Engine (figure 1), a tiled manycore, multiple wafer die with a transistor count into the trillions and nearly a million compute cores.

Fig. 1: The Cerebras Wafer Scale Engine is a good example of a huge, complex manycore SoC.

The market for AI SoCs continues to grow and is highly competitive. Semiconductor companies find their niche based on performance, cost, and flexibility. Targeting one or another of these parameters has led to an explosion of new manycore architectures. System architects are trying many different approaches, but all the designs are highly complex and all the chip makers want to harness that complexity into a competitive advantage.

Of all the sources of complexity, one in particular is very important to consider in multicore AI SoCs: functional errors and degraded performance arise when many threads are running in parallel on shared data. Traditionally, designers could use classical CPU run control to debug the problem, but not with manycore architectures. Between the round-trip delay, the number of cores, control and data parallelism, multiple levels of hierarchy, and interdependent processes, designers have a slim chance of determining the root cause of software problems.

Additionally, designers need to consider hardware-software co-optimization, which requires a lot of functional analysis. To implement AI applications on the SoC, designers need to compile the source code to take advantage of the manycore architecture. This frequently requires a custom toolchain that has full knowledge of the architecture. The process involves a cycle of hardware and software optimization and testing starting in SoC emulation and continuing through first silicon and subsequent generations of the device, shown in figure 2.

Fig. 2: System-level data is used throughout the SoC lifecycle.

Through this cycle of functional analysis, the teams can learn:

How effectively data is shared
Whether the network on chip (NoC) is over-subscribed or unbalanced
How to measure application performance without impacting code execution
How to optimize the memory controller profile for data throughput
How to correlate events from across the SoC

Getting to this point requires a new approach to optimizing AI SoCs and the software that runs on them. It calls for a system-wide functional analysis to bring high-quality AI SoCs to market on time and to maintain optimal performance after deployment. Some features of system-wide functional analysis include:

Detailed insights into any subsystem or component
An accurate and coherent picture of the whole system from boot
Transaction-aware interconnect monitoring and statistics
Classical processor run control and trace
Support for all common ISAs and interconnect protocols
Flexibility to choose or change which subsystems are important
Flexible and powerful tools to generate data insights

An on-chip infrastructure of monitoring and analysis IP and software provides all these benefits from simulation to deployment. Figure 3 shows a typical architecture for SoC functional monitoring and analytics.

Fig. 3: An Embedded Analytics platform provides system-level visibility that turns chip complexity into an advantage.

Let’s posit an example, shown in figure 4. This block diagram of a manycore chip is instrumented with an on-chip network-on-chip (NoC) monitor that traces all NoC transactions into a circular buffer. Since the NoC Monitor is transaction-aware, it can be configured to detect certain bus conditions – for example, a deadlock that causes transaction duration to exceed a certain threshold (in terms of number of cycles). When the threshold is exceeded, the NoC monitor can output the details of the deadlocked transaction and those immediately preceding it, allowing diagnosis of the problem. This requires no run-time intervention from the debug host.

Fig. 4: A block diagram of a manycore chip instrumented with an on-chip network-on-chip (NoC) monitor.

The same NoC monitor can be configured to trigger trace elsewhere in the system on detection of the same deadlock condition– for instance via a status monitor block tracing the behavior of a hardware accelerator – using the cross-triggering functionality of the Embedded Analytics message infrastructure.

Understanding the issues involved in implementing an effective system validation and optimization environment is key to the successful delivery of manycore SoCs and is a key reason why working with a supplier with deep expertise in this area is essential.

Additional resources:

Technical paper: Harness system-level data to optimize manycore AI and ML chips
Webinar: Optimizing complex AI and ML SoCs: the role of system-level data

Richard Oxland

(all posts)
Richard Oxland is a product manager for Tessent Embedded Analytics products at Siemens Digital Industries Software.

Better Optimization For Many-Core AI Chips

Richard Oxland

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

TSMC Tech Symposium 2026, By The Numbers

When Semiconductor Materials Misbehave

Silicon Photonics Lights The Way To More Efficient Data Centers

TSV Complexity Leads To Manufacturing Bottleneck

AI Growing Impact On Chip Design And EDA Tools

Sponsors

Recent Comments

About

Navigation

Connect With Us

Better Optimization For Many-Core AI Chips

Richard Oxland

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

TSMC Tech Symposium 2026, By The Numbers

When Semiconductor Materials Misbehave

Silicon Photonics Lights The Way To More Efficient Data Centers

TSV Complexity Leads To Manufacturing Bottleneck

AI Growing Impact On Chip Design And EDA Tools

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored