Any optimization problem must have a clear, unambiguous specification and a way to define the goodness of the solution. Today, we have neither.
For many, the long-term dream for AI within EDA is the ability to define a set of goals and tell the computer to go design it for them. A short while later, an optimized design will pop out. All of today’s EDA tools will remain hidden, if they even exist at all. You would only be limited by your imagination.
But we also know that AI is not to be trusted today, especially when millions of dollars are at stake for every error it makes. There has to be some kind of verification.
There are a few other issues. First, it relies on someone being able to define a complete and unambiguous specification. I am not aware of anyone having achieved this without at least a few iterations after problems have been identified. Even something like the RISC-V ISA specification, which was vetted by a large number of eyes, was shown to have problems. Is it 100% correct today? Probably not. This is why we continue to do verification.
But the second problem is that any automatic process needs to know what better means. Even for optimization, which many see as a stepping stone, the AI system needs goals that can be evaluated. They must be able to unequivocally say that one design is better than another. But what does better mean?
I have been writing recently about the relevance of PPA metrics in today’s designs, and while the industry is reluctant to say they have little relevance, it is just as quick to say that these are not the primary metrics used to make decisions. Designs have become way too complex, and the physics that links problem spaces together make it impossible to fully understand all of the dependencies.
When talking to one processor manufacturer, they longed for the past when the simple goal was to produce a design with a higher clock rate. Nothing else mattered. But then power scaling broke down and it became increasingly difficult to increase clock rates. The design objective shifted, and the metrics changed to look more at computation rates. Multiple cores were used to solve the problem so that the total number of MIPS could increase while maintaining the same clock rate.
But then came thermal issues. This limited the power density, such that you could not just build a massive array of cores all running flat out for long periods of time. The temperatures would get to the point where the chip would start to melt. Before that happens, you have to slow the clock for all the cores down, and possibly reduce the voltage so they consume less power. There goes your sustainable performance.
Of course, you could just spread the cores out more, but that would increase cost and size, which may be limited by the device in which you want your system to operate. Then you start to add other concerns like safety, security, and reliability, all of which impact the design and its implementation. Every factor you have to consider makes it a bigger problem. Today, we are just getting a first glimpse at this complexity when we start to consider the implications of shift left within the design flow. In a nutshell, it says that what used to be separatable back-end concerns now impact front-end decisions to a degree where they cannot be left until later.
When humans design, they leave safety margins for everything they are not 100% certain about, and they almost always add some capabilities they believe will future-proof their products. While machines may be able to help with the former, they cannot help with the latter. This is highly subjective, and there are few experts in the field who are capable of building in just enough without costing too much.
By definition, machines cannot do this until they can predict the future. While the rate of change in AI capabilities is stunning, I am not sure I will be around to see this. But then we should remind ourselves about Arthur C. Clarke’s three laws:
What we do today in semiconductors would have been considered magic just a few decades ago, and many a pundit has had to eat his words when talking about how constrained things will be in the future. Who amongst us could have predicted what has happened over just the past two years with ChatGPT?
Aloha
Who indeed.
Happy T-day. Jolly December Solstice. And Awesome Adiabatic 2025!!
Always enjoy and benefit from you sharing your thoughts.
Beaucoup Mahalo, Art
It appears that for EDA there are a number of ways that persistent data collection, and building dashboards with statistical distributions across teams will highlight and confirm specific things that can meaningfully improve iteration time, focus and schedule.
Lacking a statistical analysis, refactoring can be viewed as purely an additive schedule or functional risk and proposals and execution of any initiatives lack hard numbers to drive decisions, track impact and inform learning.
The attention of any team member is a precious commodity and projects are cyclical. So even with dashboards it can be that simple impactful steps are not taken.
I suspect that in this more constrained context AIs may be well suited to tirelessly search through the data and bring actionable items to the attention of the teams.
Examples may include targeted effort on faster coverage closer only for teams that represent a significant fraction of total cluster load and a focus on IT/flow issues that are known to be both recurrent and impactful.
In longer time scales this kind of persistent data collection should let teams build a digital twin of the development process itself, for data driven project scheduling, risk assessment and architectural tuning of the development process itself in much the same way the chip architectures themselves are iterated.
I feel the EDA companies will be able to step past point problems of faster constraint solve, faster coverage closure… better place and route …to provide insights and tracking of specific changes can have a demonstrated impact on day to day resource constrained IP development.