Systems & Design

SPONSOR BLOG

NoC Reliability: Simplified

There are four primary failure modes associated with NoCs.

March 26th, 2015 - By: Randy Smith

Recently, the reliability features of on-chip network (NoC) IP have received much attention. One reason for this focus has been the rush of companies to get into the automotive electronics market and the explosion of new automotive features being implemented in electronic systems. While the details may vary, the high-level view of on-chip network reliability is really quite simple.

At the architectural block level, an on-chip network appears as just another functional block in the system-on-chip (SoC) design, albeit a very important one. The function of the NoC is to move data between other blocks in the SoC design. Modern NoC products also layer in many services and features, such as protocol conversion (including data width), quality of service (QoS), automated support for multiple (or unlimited) combinations of power and clock domains, network security, power management, and interrupt management, to make the designers’ job easier. However, the main function is simply moving data, which is what reliability engineers must be concerned with when trying to meet ISO 26262 compliance.

As the table above shows, there are only four primary failure modes associated with NoCs. One failure mode has to do with the corruption of the data during transport. The other three modes have to do with the delivery of the data to the proper recipient block. While there may be other methods to detect these types of errors, the detection methods listed in the table are very straightforward. As should be expected, how the errors are handled will depend on the level of resiliency desired.

Parity does not provide much protection against errors. While Error Correcting Code (ECC) supplies more capabilities, there are a few more decisions to be made. Because modern NoCs support transmission of data, where the sender and receiver have different word sizes, the designer must decide to: (1) use only one word size across the entire network; (2) ECC encode based on the recipient word size; or (3) do all encoding at the byte level. While method 1 is very simple to implement, it requires the redesign of all the IP cores connected to have a single word size, which is very costly. And, while method 2 may be more efficient in wires than 3, it requires that each endpoint know the word size of the destination of each transaction, which complicates the design of IP blocks and software that programs them. So, method 3 is the preferred methodology as it is more flexible and allows rapid reuse of IP cores from many sources.

A “shadow network” is simply another network mirroring the connections of the first network. However, to be more useful in detecting errors, designers will want to isolate the implementation of the primary network from the shadow network. Designers ensure network isolation by keeping the shadow network one cycle out of phase and with a different layout from the primary network. In this way, errors caused by the environment (e.g., particle hits, power spikes, etc.) are not likely to affect both networks in the same way. By comparing the results of the primary and shadow networks, designers can determine if significant errors have occurred in the routing, framing, or delivery of transactions.

NoC resiliency is very straightforward to analyze. As in all situations, what to do about failures will depend on the consequences of the failure of the function being implemented. In the parlance of ISO 26262, the desired ASIL level helps determine how designers should handle a failure. Incorporation of a NoC that includes rich error detection features makes error handling inside the SoC easier through the use of interrupt and other system services the NoC provides.

Randy Smith

(all posts)
Randy Smith is vice president of marketing at Sonics Inc.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

NoC Reliability: Simplified

Randy Smith

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Recent Comments

About

Navigation

Connect With Us

NoC Reliability: Simplified

Randy Smith

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored