Digital Twins For Design And Verification Workflows

Can we model the chip development flow so AI could optimize it?

popularity

Experts At The Table: AI is starting to impact several parts of the EDA design and verification flows, but so far these improvements are isolated to a single tool or small flows provided by a single company. What is required is a digital twin of the development process itself, on which AI can operate. Semiconductor Engineering sat down with a panel of experts, including Johannes Stahl, senior director of product line management for the Systems Design Group at Synopsys; Michael Young, director of product marketing for Cadence; William Wang, founder and CEO of ChipAgents and professor at the University of California, Santa Barbara; Theodore Wilson, a verification expert pushing for this development; and Michael Munsey, vice president of semiconductor industry for Siemens Digital Industries Software. What follows are excerpts from that conversation.


L-R: Cadence’s Young; Synopsys’ Stahl; Siemens’ Munsey; ChipAgents’ Wang; Theodore Wilson.

SE: What is a digital twin in the context of chip development?

Wilson: I’m hoping for somebody to evolve what might be called an industry standard digital twin for chip development. This would provide help with optimization problems. We might have an existing suite of IP, and we want to know which teams should change what for the lowest risk or highest impact on schedule or quality. A related optimization problem is resolving day-to-day issues. While these things are part of their core job function, they might be better viewed as a distraction. Another question would be, what time constants are rate limiting for the teams and the team members? What are the components of these time constants? A final related question we’re trying to resolve is, how might we best deploy the available compute resources? What should we queue for maximum impact and to best secure the time constants that govern how fast people can iterate? It is my sentiment that if teams had such information, they would have data-driven guidance to iterate faster and with more impact. If AI is trained on this data, it would focus the team’s attention and provide insights into current and new programs, such as what best to look at in terms of refactoring code, or addressing issues when there are several things they could do. A final consideration, if we have an industry standard tool with trained AIs, this provides a path for large organizations to apply their data set to companies of interest, or teams of interest, that are either partners, or potential acquisitions, or recent acquisitions, to better assess or secure their execution. This could have an immediate impact for teams working on projects and provide paths for better and secure execution and assessing execution of the teams across the industry.

Stahl: What is the team in this context? Are we thinking about a verification team? Are we thinking about a project team? If you look at teams, they are not inherently motivated to work together. There could be silos that are completely independent, and they optimize their own requirements, schedules and results. They might not necessarily be interested in overall optimization.

Wilson: This is why we need to build the digital twin using actual data, so if some team sacrifices some of their compute and it’s given to another team, the project succeeds overall. Some teams have a significant amount of conflict, such as organizations that have good formal teams and existing simulation teams. There can be some conflict there, which data-driven analysis can start to resolve. When formal is hitting that exponential piece, it can take other stuff off the verification team’s workload when there was no way the verification team could actually succeed with all the work in front of them. Data-driven analysis would start to address those conflicts, or provide a path where senior management just tells people, ‘This is what’s going to happen.’

Wang: Ted’s opening remarks are spot on. We build AI agents, and I came from an AI background. Some of our clients have been running into issues with refactoring because they have legacy code from an earlier generation of engineers, and their code base is gigantic. Now they want to separate design. They want to make the design cleaner, and that falls into design optimization. It is a perfect use-case for AI agents, or digital twins, to act in that space. Another thing that is interesting in this space is functional verification, and particularly coverage. How do you design a testbench? It is challenging because for a big design, the test bench is a thousand files, and sometimes each file can be a few thousand lines of code. It’s not easy to generate a test bench that can optimize performance. But there’s certainly a demand within functional verification. In addition to optimization and functional verification, debugging is another big challenge. People struggle with traditional workflows because manually searching through the code base can take a long time. But with chip agents, and some of the AI technologies, it helps to read the code base much faster, understanding the context where you actually need to locate the bugs and how to fix the bugs. We do see a lot of value for AI to really improve EDA. We have validated some of the use-cases that have been mentioned.

Wilson: If we have data-driven analysis of project execution, we get to ask some really interesting questions, particularly time constant-related things. Is debug actually the key problem? I have colleagues who have worked very hard on debug agents, but is that really the rate limiter? Or is it that they are debugging something the team should never have had to debug in the first place, because there’s an unrelated code quality issue? If we solve that, all of this time spent debugging would go away. An AI-assisted digital twin will provide powerful insights that will guide a roadmap for workflow and help define the core job function of these people, or the ideal core job function. There are super interesting questions that may come out of this.

Young: I would step back a little bit. The first question to answer is, ‘What is the economic model to support a digital twin platform?’ Not every scenario requires a digital twin, but there are certain scenarios, based on conversations with customers in the automotive industry, where they do a lot of collision testing. We’re talking about hundreds, even thousands of collisions. That motivates a reason for a digital twin. They can model all these scenarios and do data-driven analysis, but it’s very difficult in that industry. Part of the reason is that digital modeling, in the physical aspect, gets a little bit harder. How do you model the entire braking system? How do you model the entire collision zone and the protections and mechanisms that are built into the vehicle, whether it’s airbags or other things. There’s definitely a desire, but there’s a very tricky tradeoff between the business model and the individual teams. For the people on the project, if they’re focusing on the ASIC, the manager isn’t going to give them extra budget to create this digital twin. It may be a nice-to-have item, but they never put in any genuine investment. What you need is a separation, or a collaboration, between the R and the D. Most engineers are focusing on the D part, and the verification part. Very little money is spent on the R part. And this is where the innovation, and the AI driven agents that will be helping with design and verification, could really benefit.

Wilson: I see this as analogous to when I started in verification. We wrote testbenches in TCL or something like that, and there was no possible way that the organizations were ever going to come up with Specman or RVM or UVM. We had to have them provided to us. This is a similar case. I’ve worked in central engineering teams and it is really difficult, even in those teams, to influence the organization. Maybe the correct role for these teams is to secure what the company is committed to — true R&D, or participating in a standards body, in order to enable digital twins. You are coloring outside the lines. We keep seeing it as a struggle, but the EDA industry has had enormous success bringing standard tools to bear. How do you get it started? Once that starts to happen, a lot will happen very, very fast.

Stahl: The economics behind it is really the key. If there is no economic value that somebody can immediately see, nothing will happen. If you look at verification history — and you mentioned constrained random technology — it brought value. Then, standardization followed with UVM and bringing that into the design community. Any impact in productivity, or cost, needs to come from this immediate economic value that companies see. I’ll give two examples of that. If you look into verification, what we are seeing in the market today when people build processor-based systems, is that they want to take their verification to the next level by combining some of the architecture knowledge, such as the CPU architecture, with constrained random generation technologies. They have a new way of driving vectors, but it’s immediately obvious because they’re targeting a CPU that has value behind it. That methodology shift can happen. A second example is when you look generally at how the complexity of chips is evolving, and the related software. It’s obvious that you need more capacity and more performance on hardware-assisted verification. We have seen customers who say, ‘We are not going the traditional route. We are going to split our hardware-assisted verification into modular elements, and then individually create these modules, maybe for a chiplet. And once we have validated that chiplet, we put multiple chiplets together.’ They call this modular hardware-assisted verification. The foundation of that is both the need and the ability of a technology to support that. The methodology behind that in these companies changes, because they are asking the design teams to work in this new methodology.

Munsey: In the context of chip development, the chip ultimately exists as part of a larger system. That chip is going into a phone, or a car, or an airplane. It’s the context of the digital twin that really affects chip development. The benefit or the value associated with that is to allow people to make their decisions, whether it’s design or verification in context of the entire product that the chip belongs in. Ideally, you’re starting to make decisions on what you’re doing for tradeoffs for design in the context of how that chip is going to interact with the rest of the system. This is important, especially when we talk about software-defined systems where you’re trying to decide what you’re going to do in software and what you’re going to do in hardware. And then, how are you are going to verify that. How you are going to go back to your requirements and make sure that you’re actually refining your requirements throughout the whole process, making sure that the system itself — of that semiconductor, and the semiconductor interacting with the rest of the product — is meeting those requirements. That is going to start to dictate how you do your design and how you do your verification. And hopefully, there is the advantage of being able to optimize your design and verification environments because they’re done in systems as opposed to a monolithic thing that you’re just designing without the bigger picture around it.

SE: We have seen some economic value already. EDA companies have been spending money creating optimization loops around a single tool, and we’ve seen co-pilots being developed by end users. Everyone seems to speak very highly of them, which is creating small pieces of a methodology or flow around a single tool. Wouldn’t it be a natural step to go from there to looking at complete flows, be it some aspect of verification, or debug, or design? What areas are the low hanging fruit that can be done quickly and easily that will demonstrate the value to the end customer? Once they have seen success, they are probably going to come back and ask for more.

Young: We have developed SimAI, which looks at the runs from a certain suite of tests. Then an engineer makes changes, and something breaks in the suite. Instead of having the engineer manually go through what is wrong, we have this AI agent that has the ability to sort through the regression, look for the issues, root cause where the issue is, and then highlight that back to the engineering team through this regression suite. They can understand that this change was made, and that has a rippling effect that caused that test to fail. These are point improvements in the flow, but that doesn’t address the goal of a digital twin. The feedback is quite clear. The economics for automotive must be very clear to have a good digital twin because the cost is enormous. There are issues with the digital twin if it is not covering all the aspects of, not just the function of the chip, but of the system in terms of power and the efficiency. The company that’s going to shine and win the design socket is the one that can prove that whatever KPI metrics that I’m measuring has better power efficiency than the next guy. Imagine you’re building an AI data center, or you’re building 2 million cars. If this vehicle with this footprint has 5% higher power efficiency, the math works out well for the company that’s able to demonstrate that, and they will win the socket. This is the critical piece. We need to marry the functional capability and verification at both the chip and the system-level verification, as well as the physical energy. How much efficiency can you build into the digital twin? Most of the time, when I look at the digital twin in the current technology, I don’t see people talking about power efficiencies. That is an afterthought. They don’t think about it. They talk about the software workload. Arm talks about the fast model they can create and all the other IP you can plug in. But no one is thinking about it from a power efficiency standpoint, because the abstract models are usually so high-level that they don’t have details to calculate the power accurately.

Wilson: I have a proposed thin wedge. What if this digital twin really looked like a very fancy CI/CD system at the start. The KPI for that is these kinds of time constants actually experienced by engineers using this system. A time constant that may be of interest is, ‘How long until someone gets their first failure? And what are the pieces in there? Is there any statistical basis to see that someone could have gone faster?’ If you can deliver something like that, that’s a wide interest.

Wang: There are two ways to disrupt. One is to have the human in the loop. This is a solution to having digital twins working, or AI agents working, but still have a human engineer in the loop, whether it’s RTL design or verification. This approach does not change the complete workflow for existing customers. These companies are more willing to accept the current solution, where you have AI agents or digital twins that help with document understanding, spec understanding, or RTL generation. But it doesn’t replace the entire loop. In the future, we could have completely autonomous AI agents, or digital twins, that help us do certain tasks such as verification. We let it run and look at the results 10 hours later. That is a little bit more futuristic in terms of having self-improving AI agents. We’ve seen something related in the research community that is certainly possible for certain tasks. But with EDA software, getting instant customer feedback is not easy. Typically, there’s a cycle in the development of EDA tools. But in the future, there could be a virtuous circle where you run these autonomous AI agents or digital twins. They complete the workflow, you get feedback — whether it’s automated feedback or human feedback — and then you keep improving. Your AI agent or digital twins then writes better code. This nice virtuous circle where you keep improving your AI, your design gets better, you have fewer bugs, that’s the future. But we probably still need another three to five years.

Munsey: Doing this all virtually lends itself to AI techniques that can start to come into the flow, because it’s not just about the localized optimizations or the localized decision-making that you’re looking at when you virtualize the whole flow and have the digital twin of the entire product. You could start looking at decisions made upfront at the system architecture phase. What impact did those decisions have on my design efforts and my verification efforts? What decisions were made during design that actually impacted the manufacturability of things later on? What happened in manufacturability? Can that be improved in future designs by making better design decisions and better product decisions upfront to eventually move things? You start to create this feed-forward flow, as well as this shift-left flow of mining data that’s already happened to improve things happening earlier on. It’s building out this AI layer across the entire virtual environment where you can start doing cross-domain analysis and cross-domain optimizations. You normally wouldn’t think about that if you were focused on one specific task or one specific process in the whole flow.

Stahl: Michael said we don’t see a lot of power in our verification thinking. It’s underrated, even though people optimize for power, and they might win on a 5% difference. The question really is whether that 5% was achieved across the overall optimization, or was it the implementation team that really squeezed the technology and got that 5%. Today with power, it is still the situation that we have silos in the individual steps, and every individual team tries to do something good. A story appeared in Semiconductor Engineering in 2023 about it being a highly wasteful industry. On the power side, we haven’t gotten everybody on board in a design team to look at power holistically and use common thinking. On the functional side, the chances are better, because at least there’s something like coverage that can bring together different views. And I also believe that by recording what type of bugs are being found in different steps of the verification process, that it can create a database with more intelligence. When I go to a customer presentation with my friend who runs the software tool for verification, I say, ‘Please make the code really clean before it comes on my emulator.’ Having certain goals and certain steps in the process are really important. And for functional verification, we probably have a chance to get there.



Leave a Reply


(Note: This name will be displayed publicly)