Experts at the Table: The good and bad of more data, and how AI can leverage that data to optimize designs and improve reliability.
Semiconductor Engineering sat down to talk about the role of AI in managing data and improving designs, and its growing role in pathfinding and preventing silent data corruption, with Michael Jackson, corporate vice president for R&D at Cadence; Joel Sumner, vice president of semiconductor and electronics engineering at National Instruments; Grace Yu, product and engineering manager at Meta, and David Pan, professor in the Department of Electrical and Computer Engineering at the University of Texas at Austin. What follows are excerpts of that conversation, which were held in front of a live audience at DesignCon. Part one of this discussion is here.
SE: Everyone and everything is collecting an enormous amount of data these days. Where do we store it, and for how long? And how do we determine what’s relevant?
Sumner: Where we’re seeing AI applied first is the places that do have very large and robust data storage. We are fortunate that the semiconductor production test infrastructure is run off a standard format, so it allows you to put it into these really massive databases with good tagging, which is what’s a pass and what’s a fail. That’s given us a springboard to research these things, and to use those as a proof point because it has applicability in a lot of other industries for validation or anything automated. But those data stores don’t really exist in a standard way today in many places. Where it does exist is where we’re seeing the adoption.
SE: Will all that data be stored in one place? And how will that data be used going forward?
Sumner: It will end up being distributed for a number of reasons. One is that it’s just practical. Second, there’s customer data involved. So you can’t necessarily delete everything. For example, we’re running AI algorithms at multiple places in the design chain. We’re running them in the cloud, but also close to where the data is acquired. That requires the data to be distributed. But at the same time, you really need all the data that you want to look at in order to train the model to be in one place and easily accessible.
Pan: And you can use that data to help you to make better decisions. For example, we can generate tens of thousands or different layouts, and then do the simulation, extraction, and the final layout. That’s complementary to the design experts.
Jackson: From an EDA standpoint, the creation of new data often can be done by permuting or randomly creating layouts. So you can create problems synthetically, and this can be another source of data. This is one of the advantages EDA has.
SE: Given the volume of data, is all of this going to be done in the cloud, or will it be done locally? We’re talking about much larger data sets, which require much more compute horsepower.
Jackson: That will depend on the company. I’m working with printed circuit board design and we’re doing some work with AI, and there’s a lot of compute capability in the cloud that enables the AI. Small companies may be okay with keeping their data in the cloud, but large companies are going to want to run it in their private clouds.
Pan: Data privacy is definitely a big concern. That’s an important area in terms of machine learning. But you don’t have to pass around your data. You can encrypt it and then do homomorphic computing. Secure computing is an emerging research area. So without sharing data, you can still check on it.
Yu: It depends on what kind of a data we’re talking about. We have a very strict policy on customer privacy. Only people who need to access that data can do that. Each employee that joins Meta goes through annual training on data privacy. For design data, it depends on the project. Some data we store on the local server, and we utilize the cloud for our big data access, and also for simulation and validation. So it’s case-by-case.
SE: As the hardware ages, how does that impact the behavior of AI?
Sumner: When it comes to aging, it’s important that you talk about the environment the AI is running in. It’s not algorithms that we see age. It’s the training data. So you’ve trained it on a particular set of manufacturing data, and that manufacturing data was taken from a particular manufacturing environment. And then, over time, things drift. You will see one of two situations. One is the whole system drifts, and so the AI has to detect that now because the whole system has moved far enough from its initial training data that it needs to re-train. The second situation is where some device comes through with something so different than whatever it has seen before that the algorithm has to say, ‘Wait, hold on, I am not the best answer here. I need to now consult a human because this is just too far away.’ Both of those are examples of decay in the system. Constant refreshing is necessary.
Jackson: I agree. Constant retraining is necessary to address aging. But as the software is exposed to a larger and larger training set, it also evolves and becomes more effective.
Pan: Retraining from scratch can be very expensive. Instead, you can do transfer learning. For example, a number of years ago we did some work on hotspot detection. When you’re detecting something at 14nm and you migrate that to 7nm, you don’t have to start from scratch. You can use an original machine learning architecture, but you can start from somewhere in the middle.
SE: One of the big issues today is silent data corruption, which is due to hardware defects. Can we trace this through systems using AI and identify the problem and the exact cause?
Yu: AI is like any other tool. It’s not perfect. But the way to avoid these issues is to have a human in the loop to do validation testing frequently, maybe using a known scenario to run AI and the computer to see whether we get the expected result. Using simple approaches like that you can identify the issue, identify the mismatch, and take a deep dive into those areas. Engineers are not perfect, and AI is not perfect. To constantly improve you have to double check and cross check more often to avoid those kinds of issues.
Jackson: We’re investing heavily in the whole area of verification as it relates to speeding up or assisting people in the design, and the debug of functional problems in those designs. So we definitely see this as a sweet spot, and we’re channeling a lot of energy into AI.
SE: Is that just done at the design stage, or is it across the entire lifecycle of the chip?
Jackson: To a certain extent, it’s the lifecycle of the chip. It’s the testing of it, the deployment, and debug of problems
Sumner: This technology works well for things that require exhaustive amounts of people to all pitch in and figure something out, and to be able to do that while removing a lot of the mundane but difficult work. The goal ultimately is for you to be able to go home at night, come back in the morning and to get a report that says, ‘I have gone through gigabytes, or more, of data and here’s the place you should look. And I’m not saying there’s a problem, but there might be, so take a look at that.’ It’s taking needle-in-a-haystack problems and turning them into focused efforts for how you end up dealing with a problem in your product. It also can be applied to how we make our algorithms more trustworthy, creating a sense that I can rely on this thing because it’s been tested and I know it’s coming from a reputable source.
Pan: There are formal ways to verify something and there is simulation. Ultimately, we need both for good coverage. Ideally, we want to be able identify those weird glitches that cause silent data corruption early in the process. That is a pretty active research topic today.
Related Reading
AI: Engineering Tool Or Threat To Jobs? Part 1 of above roundtable.
AI works better sometimes than others; what happens when there isn’t enough good data?
How Chip Engineers Plan To Use AI (part 3 of 3 of above discussion)
Checks, balances, and unknowns for AI/ML in semiconductor design.
Leveraging Chip Data To Improve Productivity
Collecting, analyzing and utilizing data can pay big benefits for design productivity, reliability, and yield.
Leave a Reply