Dealing With AI/ML Uncertainty

How neural network-based AI systems perform under the hood is currently unknown, but the industry is finding ways to live with a black box.


Despite their widespread popularity, large language models (LLMs) have several well-known design issues, the most notorious being hallucinations, in which an LLM tries to pass off its statistics-based concoctions as real-world facts. Hallucinations are examples of a fundamental, underlying issue with LLMs. The inner workings of LLMs, as well as other deep neural nets (DNNs), are only partly known, which means for end-users and designers they are essentially black boxes.

The basic structure is an input, an output, and between them a series of interactions that are not fully understood. As the semiconductor industry becomes more reliant on AI/ML, this uncertainty would seem to have potential negative impacts for designers and customers, such as errors whose origins can’t be traced. While the problem is top of mind for the industry, experts say it can be addressed with familiar tools and workflow approaches.

“Transparency and visibility throughout large, complex compute systems is a big issue right now among hyperscalers,” said Tony Chan Carusone, CTO at Alphawave Semi. “The ability to monitor the flow of data throughout processor networks is mandatory. Generally, that means the more information the better, but without sacrificing performance or burning more power.”

The black box problem has led to a new term of art — explainable AI (XAI), not to be confused with Elon Musk’s new start-up xAI.

“Explainability is a huge issue,” said Steven Latré, vice president of R&D for machine learning and AI at imec. “However, it’s important to not put all the different machine learning software algorithms into one big bucket because we’re also making a lot of progress on explainability. But the successes that we have had, like generative AI, are very much a black box, and they have two problems. One, because they are a black box, with the process being pattern matching and with no features on explainability whatsoever, that is an issue by itself. Think, for example, of deep fakes and hallucinations. These examples show it’s hard to have a good understanding of what these things do.”

Adding to the problem is the current size of models, some of which are nearing 200 billion parameters, making them all but impossible to analyze, let alone intuitively understand. “If you look at traditional computer vision systems, we still had a grasp on what they did,” said Latré. “Consider a traditional computer vision deep learning system that takes as input an RGB image, and then spits out what is actually on that image. It is still a challenge in terms of explainability, but you could still look under the hood and see that it combines pixels and makes patterns, such as a line, and then combines the different lines into shapes and so on. But the sheer size of generative AI models is so unbelievably big that we don’t even know that anymore. It is just one big unknown.”

The field of data science is especially interested in breaking into the black box, as more corporate and financial decisions are based on the predictions of AI models. Two popular approaches to explainability are Local Interpretable Model-Agnostic Explanations and SHapley Additive exPlanations. (A comparison of their methodologies may be found here.)

How to cope
The unknowns are further complicated by human expectations. Consumers may forgive a human for having a bad day, but they expect perfection from an AI, likely based on the naive assumption that it’s expertly trained and therefore must have exact repeatability.

Since that actually may not be possible, at least with the current generation of LLMs, companies would be best advised to set a baseline error rate, noted Steve Roddy, CMO of Quadric. “When it comes to the question of how to test and correct the model, the first thing most companies need to do is establish the realistic goal of what kind of error rate is acceptable, what severity of error needs to be eliminated completely, and then guardband the model using those criteria.”

Such pragmatic suggestions are how most of the semiconductor industry is dealing with the black box problem. In fact, far from an unknown, the problem is actually familiar, said Patrick Donnelly, solutions architect at Expedera. “If we get a model from a customer, we can go layer by layer, and debug that model and figure out exactly where the accuracy in that model might not be matching. From the end users’ perspective though, a lot of this stuff looks like a black box. Take the example of an EDA company. They might get to a point where they’re trying to debug something that goes wrong, and finally think, ‘I have no idea how this model generated these sorts of design recommendations.’”

Nevertheless, Donnelly said this might be less of a black box problem than a white box problem. “A white box is like a black box in that you can think there’s no way to possibly debug the internal mechanics of what’s going on. But with a white box, it is definitely possible for someone with that knowledge to go through and figure out where something might be going wrong. Right now with the tools and knowledge that we have and that the end users might have, it takes prohibitively long to do that. Thus, any sort of debugging might not be useful if the model goes wrong.”

So how would testing/verification for AI/ML be incorporated into a workflow when no one knows how the models are implemented? The answer is not straightforward.

Ashish Darbari, CEO of Axiomise, disagrees with the premise that no one knows how the models are implemented. “It is more the case that the model’s details are not visible. There is a difference between the two artifacts. My sense is that even if we don’t know what exact model has been implemented, we could still test that model against a design model. In the case of semiconductors, the problem is less severe than the software-as-domain specialists, such as architects, designers and verification engineers, all of whom tend to know what to expect from the design model. This means extensive testing of the design against a black box AI model would reveal acceptable and unacceptable patterns across the I/O, which will make it easier to establish trust in the AI-generated black box model. Coverage models can be developed independently to validate the quality of the black box model. Again, the guide would be a coverage specification obtained from domain experts — designers and architects. Another way to address the black box models is to bring in symbolic AI to build alternative models, which are better in that one can explain these, gain deeper insights, and use these explainable AI models to compare and equivalence check against the black box models to authenticate their validity and completeness. This will allow the developers of black box models to have perhaps more optimized implementations and not reveal the secret sauce, but still get validated against what is open to investigation, such as an explainable AI model.”

The problem is multi-layered, according to Neil Hand, director of marketing for the IC segment at Siemens EDA. “Designers should make sure that answers are correct by construction, meaning that the underlying algorithms don’t hallucinate or otherwise give bad results, and that outputs are being verified after an AI has been used to reduce the design space. Then, you’re verifying the outputs. There are workflows today where we use verification of the results to make sure they are correct before they’re even shown to people. Then you use reinforcement learning to correct the AI so that it’s only ever giving good results.”

Experts believe the black box problem should be addressed by traditional techniques, because it parallels what already happens in human practice. Hand gave an example of an interaction with an expert designer who may have an intuition that something is wrong. “You don’t go back to that designer and say, ‘Why did you tell me to look there?’ The answer is probably, ‘I don’t know. It just seemed like a good idea to look at it.’”

It may not even be as bad as all that, said Frank Schirrmeister, vice president of solutions and business development at Arteris, who noted the fears of black boxes can be based on a misconception that humans aren’t in the loop. Schirrmeister compares LLMs to an intern who may be brilliant, but can’t yet be trusted to act completely independently. “It’s like an intern that gives you inputs on work, which you still need to control. Leadership is ‘trust and verify.’”

Further, the black box problem is already familiar to designers. “Think about it as IP,” he said. “A high percentage of a chip is being reused. I don’t know what’s going on within an Arm core to understand every bit and every signal, and I’m not supposed to, because it’s actually somebody else’s IP. From a verification perspective, I look at that code. I then need to look at the results of the code and verify it in context.”

Including AI/ML in EDA tools
When it comes to incorporating AI/ML into EDA tools, those tools rely on reference models for any testing that is functional verification-oriented. “Models are usually developed by engineers building the testbenches, who understand the underlying design domain,” Darbari said. “They are specialists who understand the ins and outs of the design specifications and implementation, allowing them to build complete test environments. Whether they use simulation, emulation, or formal is not relevant.”

There already are many test tasks or utilities where EDA tools have exploited automation, such as linting, autochecks, X-checking, CDC, reset and clock analysis, as well as static timing and other functions. “These areas are where extensive use of machine learning is, and will continue to, help as design rules are well understood,” Darbari said. “These patterns can be easily learned from different runs on different designs using a range of different customers. The variability in design types and customer types provides a rich suite of models which bolster the capabilities of an AI model to make sharper predictions. For example, for functional verification employing complex Universal Verification Methodology (UVM) or formal models, the situation is interesting. AI has been used by EDA traditionally to leverage regression information to make the subsequent runs faster and efficient. In cases of UVM, there has been a significant increase in employing AI-based learning for intelligent sequence generation to mitigate the stimulus challenge with UVM, but the challenge remains of garnering enough data on different design types.”

In the end, AI/ML may be business as usual with an important twist, said Siemens’ Hand. “What makes AI so interesting is that you’ve got to let down your guard a little bit in order to see the potential. You’ve also got to be willing to accept a bit of uncertainty, because when you think of what AI models are all about, it’s a statistical problem. You look at the amazing appearance of intelligence from our largest statistical engine, which is saying the most probable next word to use is this one. This a statistical problem that isn’t new to EDA. You look at the whole semiconductor manufacturing problem, it’s statistical. It’s not a guarantee you’ll get a good die every single time. Every time there is variability. For the semiconductor industry, uncertainty isn’t new. AI just adds another layer. But can we leverage it in such a way that we feel comfortable with that uncertainty?”

Related Reading
EDA Pushes Deeper Into AI
AI is both evolutionary and revolutionary, making it difficult to assess where and how it will be used, and what problems may crop up.
AI/ML Challenges In Test And Metrology
New tools are changing the game, but it will take time and collaboration for them to achieve their full potential.
Using AI/ML To Minimize IR Drop
Heterogeneous and advanced-node designs are creating unexpected post-layout challenges for design teams, but some issues can be addressed earlier in the flow.

Leave a Reply

(Note: This name will be displayed publicly)