Optimization Challenges For Safety And Security

The road to optimized tradeoff automation is long. Changing attributes along the way can make it even more difficult.

popularity

Complexity challenges long-held assumptions. In the past, the semiconductor industry thought it understood performance/area tradeoffs, but over time it became clear this is not so simple. Measuring performance is no longer an absolute. Power has many dimensions including peak, average, total energy and heat, and power and function are tied together.

Design teams are now dealing with the implication of safety and security, which have considerable impact on power/performance/area (PPA) considerations. We are far from understanding the tradeoffs, let alone optimizing them.

“Area is easy,” says Rupert Baines, CEO of UltraSoC. “But with performance, what are you measuring and what are you trying to optimize? If we are talking about an opamp, you just care about gain-bandwidth, and that is an easy measure. If it is a memory cell, that is also easy. If it is a processor, what we are measuring and what we are trading off becomes a lot more subtle. Performance to do what task?”

Tools have had to cooperate to solve some of these problems. “As integrated circuits get more complex, a holistic overview of tradeoffs becomes impossible to do manually,” says Benjamin Prautsch, group manager for mixed-signal automation at Fraunhofer EAS. “CAD tools are required to support IC design engineers, especially for analog where a lot more automation is possible. Fast sizing methods, model generation, and layout generators are examples of tools that must cooperate more closely in the analog IC design flow in order to tackle today’s increasing optimization challenges.”

The notion of a task has changed over time. “We loved single-thread performance,” says Kevin McDermott, vice president of marketing for Imperas Software. “A marketing guy loves the datasheet number and the trend that was up and to the right. But single-thread performance is not where things are anymore. Today, I have multiple processors or clusters of processors, and it is the ability to accomplish the entire workload within my environment.”

Power continues to add dimensions. “When we talk about power, we have to define what that means,” points out Baines. “Many systems have sleep modes and multiple operating conditions. Are we talking about leakage or dynamic power? Some people don’t actually care about power, they care about energy. How long will your battery last? You can trade off power for energy and ask if it is more efficient for a given context. Power has complications that depend a lot on use cases. What are you really trying to measure? What are you trying to optimize?”

Those changes have been reflected in tools. “You have to make sure that the fundamental algorithms are architected around the fact that there is a functional, time-based component to what had previously been a static problem,” says Rob Knoth, product manager at Cadence. “It is no longer sufficient to use an activity factor. Instead, you have to use functional vectors. There are knobs that can be turned between fidelity and throughput. Those choices are critical because when you inject more of the variables into solving the problem you have to make sure it scales well.”

As one problem is solved, more become apparent. “Now we are moving into a world where for many applications, defining the use cases becomes harder,” says Baines. “We need a more intelligent approach for how to think about that because we cannot define all of the use cases a-priori.”

Adding safety and security
Safety and security are slowing being added to the tradeoff maze. “Most recently we have grasped safety as a new variable,” says Knoth. “It became PPA while making sure that the system was safe. This was another extension to the notion of functionality. The industry is just starting to grapple with security as a new variable that impacts schedule. The impact to PPA is fascinating.”

Architects must be creative in how safety and security are implemented. “Safety mechanisms come in a variety of flavors each with their own impact on power, performance and area, and effectiveness in catching faults,” says Jacob Wiltgen, functional safety solutions manager at Mentor, a Siemens Business. “Striking the right balance of hardware and software safety mechanisms is critical in ensuring an optimal PPA implementation which achieves the safety and security targets.”

Both of the fields bring hardware/software much more centrally into the picture. “From a processor subsystem perspective, a ‘safe core’ means one that can always recover from random and systemic faults,” says Chris Jones, vice president of marketing for Codasip. “That means ECC for memories and internal buses, it means memory protection units, it means occupying memory and CPU cycles to run BiST (software built-in self tests that exercise various parts of the architecture, e.g., making sure the multiplier multiplies), and in the most safety-critical apps it means redundant processors running in lock-step.”

In some industries, controllers that were once isolated are become centralized. “Many of them use shared resources and have mixed software on a single processor or cluster,” says McDermott. “This is where the OS has to be aware of the management and responsibility that these containers and activities are different and have to be kept apart. This element is mission-critical. This element needs care and attention—especially when there is a more general capability space where you can download apps or do other riskier things, but you cannot allow them to compromise the system.”

“Safety has had the benefit of being under the magnifying glass for a longer period of time,” adds Knoth. “Security is younger, but the techniques that safety has adopted will find close cousins in the security area. We cannot ignore them. Much like safety, if you architect it with that in mind, you will probably find a more elegant and robust solution than trying to graft it on as an afterthought.”

But that can be difficult today. “Security vulnerabilities are very difficult to detect,” says Sergio Marchese, technical marketing manager for OneSpin Solutions. “Moreover, there are no established development metrics. The assessment of the security state is often left to after-the-fact procedures, where specialized labs determine the achieved security level. There is a strong need for integrating security into the hardware development lifecycle and establishing metrics that can be used to make informed decisions and tradeoffs.”

Understanding the tradeoffs can be difficult. “Spectre and Meltdown are examples of unintended consequences,” points out McDermott. “In their mission to get single-thread performance, they looked at ways to jump ahead, use preemptive or predictive execution, with the limited goal of single-thread performance. And it was job well done. The fact that there were hidden costs that have only recently come to light is just the nature of having a single focus.”

Since then, the context has moved from block to system. “Many systems being designed these days are inherently so complicated that they are impossible to understand,” says Baines. “Spectre and Meltdown are perfect examples of that. Very smart people with a lot of expertise and a huge number of design reviews made decisions that, in a small scale, were the right decisions to have made, and they worked almost perfectly. When you zoom out onto a broader context, other factors came into play and the system became more complex and showed vulnerabilities and flaws. It was not a flaw in unit testing. It was not a flaw in the modeling within the device under test. It was a systemic complexity flaw. It is about the interactions between different things within the systemic complexity that caused the problems and is the primary issue with safety and security.”

Tool optimization flow
There are generally four stages to understanding and building tools, namely codify a process, perform analysis, develop metrics to gain understand, and optimize or automate. The industry clearly is at different stages for each of the design attributes.

“The tooling we have today is solving a lot of the PPA convergence problem,” says Knoth. “This means design teams can spend more time looking at safety and security. There always will be a spec for a product, and there will be table stakes that it has to meet. With power it may be that it must not melt the package. These are hard limits, but there will be many softer limits.  Safety and security also will have hard limits they have to meet. We already see that with standards like ASIL ratings. Security probably will develop similar metrics.”

Security is just getting to the first stage. “We have ISO 26262 and others for safety and the upcoming ISO 21434 for security,” says Baines. “This will be very similar to ISO 26262 but focused on security rather than safety. It’s currently in committee review and expected to be in early public release at the end of June. We need those methodologies, and then we can get the metrics. There are things like FMEDA as an approach, and they can provide metrics. They are not well standardized yet, but we need them.”

Safety has progressed a little further. “While there are no tools that automate trade-offs between safety and PPA targets, it is important to deploy repeatable processes and reduce reliance on expert judgment and guesswork,” says OneSpin’s Marchese. “There are tools available to estimate the effectiveness of safety mechanisms, perform in-depth analysis, and guide optimizations. Software safety mechanisms can reduce the need for safety hardware.”

Some of the tools will need to be updated. “The tools we have, such as simulation and UVM, are very good at block level and sub-system level,” says Baines. “But many of the problems are system-level issues and interactions between systems. What happens when this software runs on this processor in this use case when the something fails? This requires a different approach than throwing constrained vectors at one block.”

Tools are needed that help gain understanding at the system level. “The accuracy with which expert driven judgment can architect a functionally safe and secure system is limited,” says Mentor’s Wiltgen. “Analysis tools which can guide and validate expert driven judgment is critical to project efficiency and ensuring bugs don’t escape to silicon.”

Gaining comprehension about large systems may require new approaches. “One such approach is the use of in-silicon metrics and monitors,” says Baines. “Design teams want something like BIST, where they have ways in silicon to measure the system as a whole and work out what is really happening. Then they can develop their coverage metrics accordingly.”

Getting metrics is a necessary step. “Once you have the metrics we know how to optimize,” asserts Knoth. “That is the role of EDA. To solve those problems, we will need companions and domain experts who understand these areas intimately.”

When can we expect tools? “Reducing the number of configurations in the architectural state space is critical in achieving a final solution which meets the PPA, safety, and security objectives,” says Wiltgen. “There is a foreseeable future where safety analysis, the insertion of safety mechanisms, and the verification of safety becomes an automated workflow. This is similar to high-level synthesis, where designs are correct by construction. Safe by construction is an attainable goal.”

Today, a lot of the responsibility is being placed on the development team. “We will never be able to automate everything and replace the engineers,” says Baines. “Every engineer and chip designer should be thinking about safety and security. When you get to complicated devices, you have a responsibility and there will be big lawsuits for people not following best practices. That applies to both development methodologies, and it applies to in-system analytics and instrumentation. Without data you cannot achieve anything.”

Related Stories
Safety, Security And PPA Tradeoffs
The number of critical design metrics is expanding, but the industry still grapples with their implications.
Memory Tradeoffs Intensify In AI, Automotive Applications
Why choosing memories and architecting them into systems is becoming much more difficult.
The Growing Challenge Of Thermal Guard-Banding
Margin is still necessary, but it needs to be applied more precisely than in the past.



Leave a Reply


(Note: This name will be displayed publicly)