Using Data And AI More Effectively In EDA

EDA produces a lot of data, but how useful is that for AI to consume? The industry looks at new ways to help AI do a better job.

popularity

Key Takeaways

  • The data being produced by EDA tools tends to be for human consumption and has weak semantics.
  • Agents are attempting to create actionable information from unstructured data.
  • The Model Context Protocol may provide AI with access to better data.

Semiconductor design generates a lot of data, but how much of that is useful or currently being used by AI tools? And how much more productive could it be with different or more accessible data? These are open questions that semiconductor companies and EDA tool companies are continually exploring.

Reports already are surfacing about significant gains from applying agentic AI to existing tools and data. This enables tight feedback loops for the purpose of repetitive tasks or optimization. But the industry may need to take a step back before the full gains of an agentic flow become possible. This is because data changes through the semiconductor development flow, and the relevance of data between design stages or across designs has not yet been fully considered.

The first steps of this were necessary to enable shift left. This is where approximations of what would normally happen later in a flow are used to make informed decisions early in the flow. In many cases, this may require data abstraction or reduced order models (ROMs). Those need to be created automatically, potentially also calling on AI methods.

The data we have
Engineers have been successfully creating and verifying designs based on the information produced by tools today. “Almost all the data we get is what I call human first,” says Simon Davidmann, AI and EDA researcher at University of Southampton. “It is aimed at a human, an engineer, to look at and study. Whether it’s logfiles or waveforms, they all have weak semantics. We’ve got all this data, but the data doesn’t describe what it’s doing properly. Most flows treat this data as a byproduct, not a control surface. Data should be a control service guiding you.”

While LLMs are capable of reading this data, some of it is open to interpretation. “There is always more data that could be generated, but the biggest opportunity is extracting more value from what already exists,” says Olivera Stojanović, CTO at Vtool. “More data isn’t the goal — better data is. Agents can mine logfiles and waveforms to deliver a richer picture of verification progress. Enhanced logging already improves analytic depth, and AI assisted log generation will push this further. The priority should be using analytics to turn the data we do have into meaningful guidance. We often see teams improving their logging once they realize how much insight agents can extract.”

Processes such as verification produce huge quantities of logfiles and other information, and sifting through this information is a prime opportunity for AI. “We are using AI in all kinds of places at my companies, and one of the places we’re using it quite extensively is debugging,” says Dean Drako, CEO of IC Manage. “We have many processes running — all these chiplets and chip components and IP components in the chip — and then something goes wrong. Somehow we detect that something went wrong with our simulations, and then we need to figure out what the problem is. AI is really good and really fast at sifting through large log files, large sets of data, large amounts of stuff, and finding out what went wrong.”

More data could be made available, however. “EDA tools produce a lot of data, but much of it is not exposed to the user,” says Jim Schultz, principal product management at Synopsys. “This data is used by the engines to share information and take corrective actions. Most users don’t want all of this data because they can’t effectively use it to improve their design.”

But is more data required? “I don’t think so,” says Southampton’s Davidmann. “What we need is better instrumentation of it so we get the data in the right forms. I believe it’s an instrumentation challenge that we face. It’s all about observability, semantics, controlling and governing data. The problem is that the data produced today is human-oriented exhaust. It’s not machine-actionable evidence. It’s just exhaust. It dumps all this stuff out, and then you have to plow through it.”

Not everyone agrees. “There’s a persistent assumption in EDA that tool outputs are ‘for human consumption,'” says William Wang, CEO of ChipAgents. “In reality, that hasn’t been true for years. A single regression can generate millions of log lines and massive waveform dumps. No engineer can read them all end-to-end. At best, they grep for keywords, open a small slice of a waveform, and react to whatever looks suspicious. The rest is effectively machine exhaust.”

EDA data is massive, often many terabytes for a single snapshot. “It’s important to store data in a way that minimizes disk space and is efficient to access,” says Bill Mullen, Synopsys Fellow. “Human readability should not be the primary concern, but it’s critical to be able to extract and visualize the data for human understanding.”

To make real progress, the industry needs to get serious about control data. “We have enough data that we can make it work, although there needs to be some improvement in the way we are formulating data to be inferred,” says Sathishkumar Balasubramanian, head of products at Siemens EDA. “The key thing is how you label the data in the data lake, how you vectorize the database, and how you’re connecting to all the relevant sources and keeping your data, or a data lake, current with what teams are supposed to do. When we build the data lake, we create what we call signal and label and origin, and everything else on where it should be used, where it cannot be used, what version of the software it can be attached to. There are a lot of things that you can label and attach to a given data when you’re starting to open it up to a data lake.”

The scope of the data also needs to be broadened. “Current tools are not making full use of all available data,” says Shelly Henry, founder and CEO of Moores Lab AI. “There are two main reasons for this. The first is that most of the data required for effective agentic AI use is isolated in tool-specific data silos with no standard way of correlating data between tools. The other is that the data tends to be ‘trapped’ in human-readable logs, reports and tool-specific databases, which are not geared toward providing machine-parseable structured data.”

Once that happens, much of the data produced today may not even be necessary. “Most of the data that EDA tools produce today ends up being unnecessary in an AI-native workflow,” says Arvind Srinivasan, product engineering lead for Normal Computing. “A large share of it consists of intermediate artifacts, all generated to give a human user confidence that a given step was performed correctly. But many of those artifacts could be discarded entirely, because they exist only in service of the end product.”

Flows require change. “The truth is that when the full EDA landscape is considered, it’s possible to find all of the data that we currently know how to use, which is why the tools were created in the first place,” says Moores Lab’s Henry. “Developing the design telemetry schema could allow the design and process state to be available as a queryable graph with a minimal API to share access across tools. This could presumably be enhanced to provide additional data as the need arises.”

The future will require different data than is produced today. “Ultimately, you’re going to go directly from specification all the way down to physical design,” says Normal’s Srinivasan. “So why do you need intermediate output from EDA tooling, as long as you have models that can validate the same review decisions that those intermediate outputs would have given a human confidence in? The question isn’t really, ‘Are we producing enough data?’ It’s whether we’re producing the right data for the workflows that are actually going to matter.”

AI agents
Today, most AI agents look at the data produced by a single tool, possibly over several runs, and attempt to infer useful information from that run. It may result in a change to the design, or to the parameters used to run the tool. “Engineers take simulation results and iteratively refine parameters, converging on the desired performance targets within a short number of iterations,” says Doyun Kim, AI engineer at Normal Computing. “AI agents operate in a similar fashion, continuously extracting insights from data generated through simulations and other processes to inform their next decisions. These approaches work with well-standardized data.”

Agents must either produce better results or save engineering effort. “Knowledge transfers between runs, for different versions of a given problem, are going to be very key,” says Siemens’ Balasubramanian. “The benefits are large. Once we know that we have a self-verifying check loop that doesn’t compromise on accuracy, think about the amount of savings you get, both in terms of time in your computer resource and licenses, and you can get to the answer very fast. This is going to be an order of magnitude better once we get the agentic flow to work. When you have a working agentic flow that is tuned to a certain task, the agent will know about the previous version. It will know how to structure the regression, because this is the fastest way for throughput, and that’s what it is going to do for the next run.”

Identifying change is at the core of that. “What engineers actually need is not more output, but higher-level orchestration,” says ChipAgents’ Wang. “Which failures are new? Which are noise? What should be rerun? What changed? Are we tapeout-ready? We should treat logs and waveforms as telemetry for an intelligent control layer. An AI agent can continuously triage regressions, cluster failures, extract root causes, and drive the next set of tool runs. Humans stay in the loop for judgment and sign-off, but the agent operates the flow. At today’s scale, that’s not a luxury. It’s the only way to manage complexity on the path to tapeout.”

Agentic AI capabilities of EDA tools are evolving rapidly. “Such capabilities can leverage all available data, whether from this design or from previous runs of the same or different designs,” says Synopsys’ Mullen. “This enables improved quality of results, where optimization can be tuned based on previous data.”

The industry needs to build on those capabilities. “The EDA industry is at a critical AI inflection point,” says Henry. “The key to continuing the performance and productivity trends at the heart of Moore’s Law is to leverage agentic AI to complement and enhance (not replace) engineering talent. This starts with reimagining EDA tools’ data requirements and using AI to create a unified schema for ‘design telemetry,’ similar to the way modern cloud systems standardized traces and metrics.”

Some problems already span multiple runs from multiple tools. “We treat verification as a big data problem, applying analytics and modeling to convert massive simulation output into clear, actionable insight,” says VTool’s Stojanović. “By applying analytics and modeling, we give engineers visibility and control they simply didn’t have before. Failure triage is about detecting hidden anomalies in passing tests, visualizing behavioral patterns, and correlating design or testbench changes with failures. These types of visualization can expose patterns and outliers that would otherwise stay buried.”

Any way that AI can save time in verification is a huge boost. “For example, helping with simulations where there are no real changes to the design,” says IC Manage’s Drako. “The AI is tremendous at generating test cases, looking at the design — whether it be a Verilog design or a gate-level design — and generating synthetic or realistic test cases to exercise the design fully, so that you can get much better coverage of the design.”

Not all data has to be output, so long as it is discoverable when required. “One of the things that we did in Open Hardware was to build something called RISC-V verification interface (RVVI), because we wanted a trace that we could interact with,” says Davidmann. “You had a definition of the trace, but also the APIs to interact with the model, as well. It became a contract to interact with it, rather than just looking at traces and screenshots. This is the right approach. The future is being able to wrap these things in AI agents as sub-components and do things with them. You can ask questions of the designs and the traces, not just try to regurgitate stuff. You need to give these agents real execution evidence, not just trace information, and to be able to prove it so they can track where it came from. That’s going to change the way these agents work.”

Model Context Protocol (MCP)
The EDA industry has long used APIs to access internal data. “The industry has made a significant step with the Model Context Protocol (MCP) open standard,” says Cristian Amitroaie, CEO for Amiq. “This connects AI agents to external data and applications. Our MCP server uses this protocol to make information in our compiled database of the complete design and verification hierarchy available to AI agents, especially those generating code. Limited training data and lack of context means that AI may hallucinate or generate incorrect RTL or testbench code. We remedy this by allowing agents to leverage our deep knowledge of language semantics and project context in order to generate correct code. We fully expect that other tools will find new ways to leverage the data in our database and that we will find additional useful information we can provide via MCP.”

This is gaining traction within the industry. “The most important thing that I believe needs to happen, that both industry and customer are asking for, is MCP-compliant ways of building the product,” says Balasubramanian. “You define an MCP server, which is an AI way of saying an API layer, but it is much smarter. For each of the products there is a server, and then you open up and define all your commands and everything else. With the agents, with the LLM, and with the RAG infrastructure, each of the use case flows can be built together so that they are self-optimizing, to reach the end goal.”

It also could be a product differentiator. “As chip companies move from AI co-pilots to AI workers, you fundamentally have to make your EDA tooling compatible with those AI workers,” says Srinivasan. “It’s entirely possible for an EDA vendor to expose an MCP server that lets external systems access data in and out of their tools programmatically. Whether every vendor sees that as advantageous today is an open question. But the biggest chip companies will start pushing in that direction, and if a vendor doesn’t move, those companies will go to one that does. It will be a point of competitive differentiation.”

There are some nuances to consider here. “Everybody’s talking about the MCP,” says Drako. “The right thing to do would be to create MCPs so that tools work better, but what we’re finding is that you really don’t need the MCPs. Sometimes they get in the way. Documentation written for a human is pretty good. And if I go and create documentation or a data format that’s more machine-readable, I can digest it a little faster. But why bother? That’s the beauty of the AI. It takes the AI another minute to digest the data in a different format. But who cares? This way, I didn’t have to go create another thing and maintain it and take care of it.”

The problem is there are good MCPs and bad ones. “It is easy to write an MCP server for a product,” says Balasubramanian. “But it may not work, or it might work for 20% of the time. Each product must have a fully owned, very clear way of doing an MCP server, and then having an MCP orchestrator that’s much more efficient in managing all these MCPs for a given flow. That’s going to be very critical. We have already seen some customers put them together, but they come back and say, for your product, I created an MCP server, but somehow it doesn’t work. That is because you don’t know everything about the product. If it comes from us, we can make an efficient MCP server. MCP compatibility is going to be very key. It’s already happening.”

This will require collaboration. “EDA engineers are some of the smartest people in the world, because they understand both software and hardware,” says Drako. “We have lots of software engineers, a lot of AI engineers, but they don’t know anything about designing hardware. Then we have hardware guys who know how to design hardware, but they just use the tools that the EDA guys give them. It’s a really hard industry. They’re headed in the right direction. They’re doing the right stuff, but it’s the Wild West. We have no idea how it’s going to play and what it’s going to mean.”

Davidmann agrees. “You need to bring people with domain knowledge who really understand design or verification with the AI experts. Bring them together, and in two years you’ll get something really clever, because AI on its own isn’t going to solve the problem, and EDA guys need AI to move to the next generation. You need to bring these guys together.”

Editor’s note: A second part of this will look deeper into problems associated with the correlation of data as it moves through the development flow, and the interoperability and problems associated with putting an agentic flow together.



Leave a Reply


(Note: This name will be displayed publicly)