Multi-Modal AI In EDA Development Flows

The development of a semiconductor system is more complex than just describing functionality in RTL. How ready are AI models to handle the larger task?

popularity

RTL coding is a critical step in the development of semiconductors, but many would argue it is not the most difficult. Things become a lot more complex as you get closer to implementation, and as the system context becomes larger than can be comprehended by text alone. In both cases, layout, timing, power, and many other factors come into play, but none is as easily represented by text, and they do not follow the same rules.

As the popular maxim says, ‘a picture is worth a thousand words,’ and that may be highly conservative. Block diagrams, timing diagrams, waveforms, state charts, flow charts, floor plans, layouts, heat maps — the list of graphical forms that are used as both inputs and outputs is extensive. AI needs to be able to comprehend them and generate them.

That maxim also can be viewed the other way. Pictures, as well as all those other formats, can be represented by text at various levels. A waveform is nothing more than a value change dump file, which in turn is a time-ordered list of transitions on signals. Similarly a picture can be viewed either as a set of pixels, or as recognized objects that represented the image in a lossy abstracted format.

Models for AI have been evolving rapidly, and some of the first practical uses for it were not text-based. They were solving problems associated with handwriting detection, and speech recognition and generation. When ChatGPT first entered the public’s attention, it was text-only. Its performance on other formats was very poor.

AI model development has been fragmented. Separated systems often are deployed for image recognition, text processing, audio processing, reasoning, and text generation. The integration of these presents challenges and significantly increases computational load and power consumption, something the industry is becoming increasingly concerned about.

The second problem is that most AI systems in use today do not understand physics. While the latest models are supposed to have this knowledge, it is clear that multi-physics reasoning is beyond their capabilities today. Physics are intertwined with many of the alternative representations we use. A layout represents where a transistor or IP block lives in relation to others, but performance is influenced by those other blocks. In addition, the other blocks may impact other decisions at a later stage of the development flow. In both cases, they are modified by activity over significant durations of time.

AI models have been improving rapidly, so where is AI today in being able to handle this multi-modal, multi-physics problem? And is it ready to deal with some of the problems that electronic design and implementation faces?

Why multi-modal is important
In an ideal world we would have a clear and unambiguous specification that can be fed into an AI-based flow, but that is never the case. “I’ve talked to many users about agentic workflows that start from the design specification, and images almost always come up,” says Andy Penrose, software engineering group director at Cadence. “It is clear that architectural block diagrams, FSM flowcharts, and timing diagrams all contain vital information that can be missing from the text of the specification. Image handling is not optional for spec-driven design and verification automation.”

Multi-modal AI is a new slant for AI. “Over the past year, we have doubled down on turning cutting-edge multi-modal AI research into production-grade EDA tools,” said William Wang, CEO of ChipAgents. “We started by perfecting a PDF-parsing engine that extracts intent from timing diagrams, architectural block diagrams, and dense specification charts, and debuted our new Waveform Agent at DAC, which is able to ingest and reason over gigabytes, even terabytes, of simulation dumps in minutes. That capability is grounded in work I led on TabFact, whose table-understanding techniques now underpin most frontier LLMs — in Apple’s MGIE, invented by my former PhD student, which set the bar for language-guided image editing, and in the VaTeX evaluation harness we built for Google’s Gemini. By fusing those innovations with domain-specific knowledge of RTL verification, we can close the gap between human-friendly visual artifacts and machine-driven design automation.”

This is a fundamental shift. In some cases, there is no text at the beginning. “Humans are very comfortable with representation, such as a block diagram,” says Hamid Shojaei, co-founder and CTO for ChipStack. “In fact, when we started working on the TPU project at Google, for the first generation we didn’t have any written spec at all. It was just a block diagram that showed different IPs in the TPU project and how they communicate with each other. There was no text, just a single block diagram, and that is where we started.”

There is a lot of concentration on the coding of RTL today. “There are many startups on the front-end side,” says Sathish Balasubramanian, head of products for IC verification and EDA AI at Siemens EDA. “They claim to do RTL generation, and they can because it’s just coding. They don’t care about anything. There are a lot of open-source examples available, and with it, you can create a prototype faster. There is value in it, but it’s fun. You’re getting the outer layer of the onion. But there are a lot more parameters involved in getting to the right answer as you go close to the physical domain.”

By looking at individual functions within the flow, the value of text augmented with graphics becomes clear. “For verification, there are two areas where images, tables, state transition diagrams and more are very helpful,” says Kartik Hegde, co-founder and CEO at ChipStack. “The first one is design intent extraction, which is understanding what the design does. In that process it is important to understand the block diagrams and state transition diagrams, which are not easily represented as text. We need recognition of all these techniques to understand the spec. The second aspect is the waveform. This is special because it captures both space and time. It is super important in debug. Once we generate test plans and test benches and run them, and if something were to fail, we have to figure out why something is failing. In the process of doing that, we need to understand the states for each variable over time, for which you need to go into waveforms.”

Verification is one of the most resource-constrained aspects of the flow, so it is receiving a lot of attention. “Multi-modal AI is most valuable to verification because it lets the tool read the same spec the engineers read,” says Thomas Ahle, head of machine learning at Normal Computing. “That includes PDF text, timing diagrams, state charts, and register maps. You can then turn that understanding into stimulus, coverage, and RTL code.”

Optimization often involves iteration that is impacted by multi-modal constraints. “If you think about building an AI agent, it is essentially a super set of individual prompts that forms a flow,” says Siemens’ Balasubramanian. “They can look at results, get feedback, do a number of actions. For example, a simple floor plan through to the entire place-and-route flow requires the agent to be able to understand all the abstractions of the design, the different databases, all the different sources of information. This includes log files, PDFs, PDFs with images, GDSII, and congestion maps. An agent needs to know a lot more and needs to be able to understand all the intricacies, all the way from writing an RTL assertion, or writing timing constraints (SDCs), to being able to understand them and their implication to synthesis. Synthesis, in turn, means being able to pick the right cells, and for that it needs to be able to understand the .lib, and select cells based on the timing and power properties.”

A lot of work remains to be done on the implementation side. “In the broader industry, most non-textual AI is focused on physical design,” says Doyun Kim, senior AI engineer at Normal Computing. “This is where the layout can be represented as an image, and the layout is a collection of billions of rectangles in different layers. But AI developments for physical designs are often limited to prediction tasks such as power or congestion estimation, whose generalizability is restricted by the diversity and quantity of available training data.”

Generating non-textural output
For the foreseeable future, humans are going to remain in the loop for design and verification, and in many cases, humans comprehend graphics better than they comprehend text. Graphical output is vital because it aids in comprehension, and it can provide a level of abstraction above RTL.

In the 1990s, EDA companies attempted to generate state diagrams from Verilog. They were an abysmal failure. A state chart never was laid out the way a human wanted to see it. Designers would spend so much time trying to fiddle with what had been generated that it wasn’t worth it.

“Having LLMs natively generate images or graphics that would aid in understanding is not a viable way to go, because they’re not capable of doing that in a very precise manner,” says ChipStack’s Hegde. “What they’re really good at is writing code. LLMs can directly ingest images, but they’re not good at very precisely generating images. So we regenerate code that represents these images and then need to find ways to render them.”

That kind of helper requires good interfaces. “APIs can help enormously, enabling me to read and write,” says Balasubramanian. “For example, FSDB is a very popular format for waveforms. FSDB has a reader and writer. An agent needs to have access to both the reader and writer API. Interpretation of the information is fairly simple. Once I have those rules, it’s similar to how we are learning. We know the FSDB syntax for a signal. When the transition happens, we know where it is placed on the time axis. It’s fairly straightforward.”

But that is about one format in a standalone manner. “An FSDB cannot stand alone,” adds Balasubramanian. “An FSDB needs the design context. Assume I am loading a RISC-V core and doing a top-level performance simulation. When you have that loaded, you have the inputs, which include the design parameters, the test benches, and you have the entire session with the log files. If I query any signal, it will give you the answer for that particular session. With AI agents things can go further. They can access the entire memory for that verification. Perhaps there is a request to do a particular run at 1GHz for this scenario in this particular mode. I could easily run that again. Or I may know that it was tried before and had some issues because the architecture was not good enough.”

Reasoning
Humans and AI want to work on different representations. “I’m not sure if AI should operate on graphical data,” says Andy Heinig, head of the Chiplet Center of Excellence at Fraunhofer IIS’s Engineering of Adaptive Systems Division. “Humans often need a graphical interpretation to see patterns. But AI can work very efficiently on the underlying data, and maybe we only lose some information by this graphical representation. AI also can learn on the underlying data and figure out patterns on the underlying data, for example, on the waveform. They have no reason to plot the waveform, because AI can directly use the data points and reason on those data points. For layout, everything is represented by points, and this is something where AI really can learn on the underlying data structure.”

The models do have to be aligned with the problem they are attempting to solve, though. “LLMs are generalized reasoning engines,” says Hegde. “If you use them that way and take any complex problem, you can break it down into finer granular pieces. There is going to be an abstraction layer at which the generalized LLMs, are pretty good at doing something. Now let’s take an example of waveforms. You have to understand what’s happening in a given waveform. LLMs have not been trained on high quality waveforms. They have not seen that data. But you could structure that problem in a way where problem and sub-problems, are something LLMs can solve.”

This leads to a couple of approaches. “Access to data is critical for LLM inference,” says Kyle Dumont, CTO and founder of AllSpice. “However, if the LLM cannot understand the data it’s presented with, the uses are limited. There are two solutions to this problem:

  • Build a transformer to convert to a well-understood schema (easier), and
  • Fine-tune a model to train it on a specific encoding (hard).

Some LLMs have employed the above solutions for common formats, such as PDF parsing. However, hardware-specific inference requires translators for necessary data. Image parsing is very expensive and can provide flawed inference. Errors at the input layers will have magnified effects on the quality of the output response, so it is always better to provide textual interfaces in a common schema, such as XML or CSV.”

Work remains to be done. “Through the silicon design and verification processes, there are a number of different types of data other than text we come across – schematics, gate-level netlists, layouts, and diagrams,” says Normal’s Kim. “They are usually semantically meaningful in their original format and can be represented as text. If we come up with a good text representation which can capture their original semantics well, similar to the SMILES representation for molecules in chemistry, it allows downstream tasks to be handled with text-based processing, which reduces the need for extra, costly compute layers.”

Training and specialization
Outside of the semiconductor industry, training is improved by maximizing the amount of data used. “Every design team, or even design teams within a company, have their own design styles, design guidelines, and often these are related to the products they want to design,” says Fraunhofer’s Heinig. “I would expect that we need different flavors here, and this can only be trained on the company internal data. It is not a good approach to mix and match the data from different companies, because this is not what companies really want. I would expect that each company wants to have its own flavor. For example, if you are designing a power-efficient edge processor, this is very different from a high performance CPU. If you mix all the training data together, you get a mixture out of that. But it’s not very specific.”

There are different approaches to this. “There are two ways of making that guidance,” says Balasubramanian. “One is essentially making it part of your LLM through fine tuning. Trying to get the LLM to learn more on specific architectures, such as RISC-V. The second way is creating a sophisticated RAG framework. The moment you put an identifier saying that I am doing a RISC-V verification using an AMBA bus protocol, it should be able to go back and see, ‘Does it have any prior information on RISC-V, and that AMBA bus protocol for RISC-V?’ We build all the rules and keep adding to the knowledge database.”

Within verification, vast data sets can be created. “We have the potential to generate huge synthetic data sets,” says Hrolfur Eyjolfsson, AI engineer at Normal. “Much of the visual information in verification, like timing diagrams and state spaces, is actually generated from some piece of code to begin with. This means we can create our own training data instead of relying on scarce real-world examples.”

“Models can be trained to understand textual information, diagrams, and waveforms in isolation from code,” adds Normal’s Ahle. “This leaves a smaller requirement for code data. However, code data is very sparse. This also means we have to take a reinforcement learning approach, where the system can learn from interacting with the existing tools and systems, rather than just fine-tuning on the code directly.”

Progress
The industry is moving fast, but not all of it is moving in a synchronized manner. This can create some problems. “The latest frontier models can be very good at multiple modalities, but such models may not be available inside secure design environments,” says Cadence’s Penrose. “This is one of the challenges we face going from R&D prototype to production.”

It also requires a different mindset within the EDA industry. “Gone are the days of very closed-loop systems that we have seen in the past,” says Balasubramanian. “It needs to be much more flexible, much more adaptable to the user’s needs. If you talk to me three months from now, I will have a different set of questions, a different set of opinions, because we are still learning. We’re all learning, including us as a supplier, or among our customers. We are learning every day when we are trying to deploy this.”

Related Reading
AI In Chip Design: Tight Control Required
Why and where limitations are needed in AI-driven design, and where software-defined hardware works best.



Leave a Reply


(Note: This name will be displayed publicly)