Capturing Knowledge Within LLMs

The knowledge contained within your employee’s heads might be the most valuable asset you have, but what happens when it walks out the door?

popularity

At DAC this year, there was a lot of talk about AI and the impact it is likely to have. While EDA companies have been using it for optimization and improving iteration loops within the flow, the end users have been concentrating on how to use it to improve the user interface between engineers and tools. The feedback is very positive.

Large language models (LLMs) have been trained on a huge amount of data available on the Internet, and they have been shown to be quite good at summarizing and replicating that information in various forms. What they lack is domain specific knowledge about design. The number of publicly available design and verification suites is limited to a few open-source projects, and many of those are of questionable quality.

But design companies have lots of internal data, lots of experts, lots of experience. They are capturing this and using it on top of the LLMs to provide the missing knowledge. The kind of tasks it is helping with are understanding or finding data in manuals, language construction rules and best practices, help with summarizing results from tool runs, and many other things. Over time these companies are learning how to capture the experience of their senior engineers, and effectively making that available to more junior engineers. The productivity improvements are notable.

One session at DAC looked at three such programs, and I would like to focus on the system being developed and deployed at NVIDIA. It was presented by Sid Dhodhi, GenAI hardware engineer.

“We should be able to take our tools and give them a conversational UI,” Dhodhi said.” We should be able to ask our tools to summarize the data that’s being produced by them. This should make it easier for engineers to analyze the data and to ask questions. Another opportunity is a coding assistant. Having access to a coding AI sitting next to us should make us more productive, at least by an incremental amount. I have been much more productive in my coding tasks just because I have access to these models. The third opportunity is instant access to documentation. As boring as it might seem, we spend a lot of time trying to find documents. This includes things like documentation from previous designs. You need to have search access to all of this. All of this can help provide context about a problem you are trying to solve. If you have a way of analyzing regressions, logs, reports, you have a way of extracting or parsing out problems that you’ve seen, and then you have access to documentation that talks about the error message, bug reports that people have seen before. Then we can combine the two and work in a loop, iterate using agents until you have been able to solve the problem partially.”

Dhodhi noted there are challenges ahead in achieving these goals. “One of the biggest issues is your data. Anyone working with LLMs knows it is a garbage in, garbage out problem. You can’t build anything without good data. You have to invest time in creating and cleaning data. We have access to state-of-the-art models like GPT4 and others, which are all very well aligned and structured models, so we can use them immediately for our task. But the quality of these models, the alignment quality of these models, from an application perspective, means there could be variations. When you ask a question and get an answer which is almost correct, but then gives you a hallucination, it takes away from the user experience. If your model is not fully aligned, then it’s going to struggle to be reliable in production systems where you try to call the correct APIs every single time, regardless of how they use it.”

He added that while you can ask an LLM to summarize your code, it is going to do it at a superficial level, and without sufficient context about what the code is trying to achieve. “It comes down to how efficient these models are for synthesizing information related to your context,” he said. “This can be a problem for organizations that have a large amount of proprietary data that you are trying to protect differently within your own users. Information available to one of your users may not be accessible to another depending on their access permissions. Before anything goes into production, you have to make sure that you are not believing something that is inaccurate which is why you need it as human assistant rather than asking it to do everything. Eventually, humans have to be involved.”

NVIDIA released ChipNemo last year, part of an internal effort to boost hardware engineering productivity and efficiency using LLMs. Essentially, the company took a foundation model and continued to train it on NVIDIA data, such as RTL, chip design documents, and tools. The company determined it was pretty good as a “coaching model.”

“We have built two co-pilots,” Dhodhi said. “The first is a bug copilot, and the second is a chat copilot. The mission of the bug copilot is to look at bug reports. We use tracking reports to help with coordination and also capture details of something we have previously solved. In the future, when we see a similar issue, we can go back and see what happened before. This is very helpful to people who are new and may not be aware of a particular topic.”

NVIDIA has a production system in place, which provides both an engineering and managerial perspective. It includes next steps and who else needs to be involved, and receives about 10,000 API calls per day. He said some user studies say it saves about four minutes per bug, which equates to about 1,000 engineering hours per day.

The second co-pilot is for chat, allowing NVIDIA to turn user queries into a vector. “You have already put all of the information necessary to answer that query into a database,” he said. “Doing that you can find the most relevant chunks and feed that to the LLM. We built several of these systems, each focused on a particular topic. It is important to remember that data has to stay high quality and existing data can become stale. Perhaps a tool changed, or a block is using a different architecture. Every chat copilot is backed by an expert. Many experts can be working together to create these high-quality copilots, and then everyone starts to benefit.”

It turns out that users wanted it to provide answers as well as references related to those answers. “People want to dive deeper and look at more documentation than provided by the chatbot,” he said. “This has been rolled out to 11,000 engineers, and gets about 350 queries a day. We believe that each of these saves about 45 minutes, particularly for new hires and junior engineers. That translates into hundreds of hours of engineering time per day.”



Leave a Reply


(Note: This name will be displayed publicly)