Boldly Go Where No NoC Has Gone Before
Is achieving functional safety goals without compromising PPA possible?
Functional safety, at varying degrees of integrity and with or without the ISO 26262, has become a cornerstone of SoCs in many key market segments, not just automotive. And the industry goal is to achieve these reliability levels without sacrificing any PPA and while continually reducing TTM. Go figure! I know, that’s like saying, make me an omelet without breaking eggs. And egg substitute is not an option. Jokes aside, in my opinion it is a challenging goal, but definitely not impossible.
One of the key IPs that has a direct impact on the SoC’s safety, security, coherency, bandwidth, latency and hence overall system performance is the interconnect. The interconnect is where all the various requirements melt into one cohesive IP, and the pressure is on to make that next leap and to boldly go where no NoC has gone before. Yep, I am referring to Star Trek. To this day, I am in awe of the futuristic vision which was way ahead of its time, but at the same time gelled with all the other human theatrics and requirements of making a great television series. That’s the kind of ingenuity required to meet the stringent needs of functional safety by integrating the state-of-the-art diagnostic features with the standard but stringent PPA requirements of the interconnect.
Let us start off by briefly reviewing the ISO 26262 requirements and then discuss what next generation interconnect capabilities are required to tackle these.
ISO 26262 requires adherence to two main areas:
- IP development process
- IP analytics
While the former requires strict adherence of the company processes to standard practices and regulatory requirements, the latter mandates design safety features and fault protection based on the potential failure modes in order to reach specific ASIL targets. The development process aids in making sure the requirements and feedback are managed with stringent requirements, and in turn provides the user an IP with uncompromised quality and regularly scheduled releases that are rigorously tracked. Though the development process compliance is vital to make sure the IP itself has been built under stringent methodologies to ensure quality and continued support, most companies treat it more as a checkmark to make sure the IP meets those requirements. The analytics is where they spend most of their resources and need most of the assistance in analyzing their SoC/IC, specifically how the interconnect IP fits into their context and analyzing the failure dependencies. This has direct impact on the quality and TTM.
The ISO 26262 process in a nut-shell
The analytics part of ISO 26262 compliance deals with identifying failure modes, analyzing impacts, protecting with safety diagnostic features, and calculating metrics for ASIL targets. From an SoC/IC perspective, it involves analyzing dependencies between various IPs at a system level. This can be a time consuming and resource heavy task, and based on the gap between the achieved and target metrics the SoC/IC designer might have to add more safety features and reanalyze. This can take several iterations, which can be further impacted due to lack of control over a third-party IP’s safety features and its impact on the overall ASIL targets. The goal of a third-party IP provider should be to provide all the design support and collateral to enable the SoC team take control and make appropriate and fast decisions.
Below are some of the key technologies that provide support to users of third-party interconnect IP and alleviate some of the challenges they face in adhering to these requirements.
- Integrated safety (vs. add-on) for optimized performance/area
- Challenge – Traditional approaches architect the interconnect based on the short-sighted view of meeting the PPA requirements, at which point the decisions of routingsand sharing resources have already been made. Including ECC/Parity (or any safety feature) as an add-on compromises the performance and area targets by creating additional latencies in critical paths. The other undesirable impact is almost doubling area due to a kitchen-sink approach of adding more safety features than required to meet the safety goals. Instead of paying the price in schedule that would be required to go back and rearchitect, design teams have to live with sub-optimal performance and expensive area.
- Solution – In an integrated solution the effect of ECC/parity (and other resiliency features) is taken into account up front in a top-down methodology, and the decisions for routes, sharing of routers and resources are done in a more optimized manner to achieve the performance and area goals. This is true of all safety features which are all taken into consideration when coming up with the right interconnect architecture. Analyzing the impact up front gives the ability to plan for it and model its behaviors and impacts earlier in the cycle. Similarly, understanding the safety requirements upfront for SRAMs helps in making sure the appropriate banking structure can be used, i.e. one that works best with the safety features and thus does not compromise on the PPA. Treating the safety features as first class citizens of the interconnect, as opposed to an add-on, is vital for an optimized safe interconnect and can provide up to 40% reduction in area while still meeting the performance targets.
- Fine grain configurability (vs. coarse control) of third-party IP safety features
- Challenge – Most interconnect IP vendors provide safety features at a coarse level and in some cases on an all-or-none basis. Typical approaches include replicating entire components or adding ECC to more channels than required because of lack of control, which causes high overhead in area or suboptimal performance. Also, this gives the SoC integrator a ‘black-box’ kind of view due to lack of control over the granularity which, in turn, leads to over designing the rest of the system to compensate for this.
- Solution – On the other hand, providing fine grain control and configurability with a ‘Design Cockpit’ allows the user to be in the driver’s seat with the ability to specify and configure the requirements on a per ‘traffic profile’ basis. Think Captain Kirk’s command chair. In its simplest form, ‘traffic profile’ can be a physical path from a specific master to a specific slave. So, you can imagine there being several traffic profiles in a system that has multiple masters and slaves. Actually, within a traffic profile users can have various logical (virtual) profiles. Apart from specifying the architectural/performance/bandwidth requirements on a per traffic profile basis, NetSpeed also allows users to specify safety requirements at fine granularity. NetSpeed also offers unique advanced features like route duplication for specific paths that users desire to protect. As a result, instead of making the entire interconnect ASIL-D, users can control which paths to target at the higher integrity level of ASIL-D, restrict others to ASIL-C/B, and even leave the rest as non-safe based on their requirements. All this is possible without the need to split the interconnect into a safe and non-safe island. While specifying the features at the per traffic profile granularity, users can choose among the various safety features like ECC, parity, logic protection, timeouts, etc. or a combination of them.
- Rapid analysis and convergence (vs. time consuming) of ASIL target analysis
- Challenge – The analytics part of the ISO 26262 measures the functional safety compliance in two categories: one is the PMHF/fit-rate which is mainly dependent on the area and type of circuitry, and the other is SPFM/LFM diagnostic coverage, which is dependent on the interaction between the various pieces and which safety features are supported. All this is captured in an FMEDA (Failure Modes, Effects, and Diagnostics Analysis). The FMEDA also has information about the potential failure modes in the various parts of the design and how much of those failures can be covered (detected) by the specific safety features. It also captures the area of the various components to understand the impact of a specific failure mode in context of the entire IP (or system). In today’s world, an SoC team integrates third-party IPs from a multitude of vendors, which makes it increasingly difficult and time consuming to analyze the dependencies between the various pieces and the effect on the ASIL targets.
- Solution – NetSpeed alleviates this issue by providing users with all the analytic metrics at every step of the way. Unlike an add-on approach, a top-down requirements driven approach has all the information about the safety features and the interaction/dependency between the various pieces of the NoC. Also, the NoC hardware components themselves are very structured to help reuse. NocStudio is unique in that it has a built-in FMEDA, which uses all this information to estimate the ASIL target of that specific configuration. This way, the user is aware of the decisions made at every step and can make the necessary tradeoffs by analyzing the safety coverage results alongside the entire system. This dramatically shortens the user’s TTM by reducing the time required to understand and analyze failure dependencies for third-party interconnects and create the appropriate FMEDA.
- Achieving ASIL-D targets
- Challenge – ISO 26262 has some really stringent diagnostic coverage requirements for ASIL-D. Apart from 99% SPF requirements, it also requires specific documentation on DFA for dependent fault analysis.
- Solution – NetSpeed achieves ASIL-D readiness by offering unique ECC and parity techniques, route duplication and Network BIST, which are just few of a variety of safety features that are configurable at a fine granularity. This, with the added configurability of having different traffic in the interconnects at other ASIL levels, gives its users the ability to achieve ASIL-D with the least area overhead and without compromising performance. ASIL-D also requires additional collateral and NetSpeed augments its design offerings with comprehensive Safety Manual, DFA, and FMEDA to expedite system certification.
- Scalable coherency impact
- Challenge – Traditional approaches have lacked functional safety support for coherent interconnects. Customers have been forced to split their interconnects into coherent and non-coherent and target only the non-coherent interconnect for functional safety. Not only does this double up the development time, but also effects performance. Also, in today’s world coherency has become a vital part of any functional safety system.
- Solution – NetSpeed offers the industry’s first coherent interconnect with up to ASIL-D readiness. Similar to the functional safety features, the coherent features are also integrated into the interconnect, instead of a separate add-on package. This enables the optimum balance between interconnect, coherent, and functional safety features by treating all these factors as equally important for the overall quality of the SoC.
Achieving the functional safety integrity levels without compromising PPA is an ongoing challenge, especially in the face of shorter TTM. The goal of third-party IPs should be to help its users achieve those targets instead of just providing a coarse level feature set and limited collateral in terms of a Safety Manual. Offering fine grained and configurable protection mechanisms along with a configurable FMEDA that is tailored to the user’s specific needs would go a long way in aiding the users in their path to achieving the required ASIL targets.
Finally, thanks for staying till the end of this journey. In the words of Spock, “Live long and prosper!”
Rajesh Ramanujam
(all posts)
Rajesh Ramanujam is responsible for product marketing at NetSpeed Systems. He has 15 years of experience in hands-on SoC development. He was an SoC system architect at Altera (acquired by Intel) and Huawei and a senior processor architect at Texas Instruments. He received an MS, Electrical and Computer Engineering from Iowa State University and a BS, ECE from the University of Mumbai.
Leave a Reply