Common themes emerge, but so do different ways of looking at a problem, from business opportunity to concern for the environment.
Chip design and verification are facing a growing number of challenges. How they will be solved — particularly with the addition of machine learning — is a major question for the EDA industry, and it was a common theme among four keynote speakers at this month’s Design Automation Conference.
DAC has returned as a live event, and this year’s keynotes involved the leaders of a systems company, an EDA vendor, a startup, and a university professor.
Mark Papermaster, CTO and executive vice president for technology and engineering at AMD
Papermaster began his talk with an observation. “There has never been a more exciting time in technology and computation. We are facing a massive inflection point. The combination of exploding amounts of data and more effective analysis techniques that we see in new AI algorithms means that to put all that data to work has create an insatiable demand for computation. We have relied on Moore’s Law for 30 of my 40 years in the industry. I could count on a dramatic improvements every 18 months, with lowering the cost of the devices and gains in density and performance with each process node. But as the industry has moved into these minute lithographies, the complexity of manufacturing has grown tremendously. It is obvious that Moore’s Law has slowed. Costs go up with each node. The number of masks is rising, and while we are getting density gains, they are not getting the same scaling factors that we once did or the same performance improvements. There will be a metamorphosis of how we approach the next generation of devices.”
Papermaster noted that embedded devices are becoming pervasive, and they are getting smarter. The demand for computation, driven by AI, is going up everywhere, which requires new approaches to accelerate improvements. “Experts predict that by 2025, the amount of machine-generated data will exceed the data generated by humans. That drives change in how we think about computation. It makes us think about new ways to put accelerators into devices as chips or chiplets. We must take on the challenges collectively as an industry, and that is the metamorphosis that has me excited. That is how we will overcome the challenges.”
One of the big issues involves reticle limitations, which determine how much can be crammed onto a monolithic piece of silicon. Papermaster said this will lead to more design, and more design automation, and that can only come about through collaboration and partnerships. The solutions will rely on heterogeneity and how to handle complexity. Software needs to be designed along with the hardware in a “shift left” manner. “The 225X gain in transistor count over the past decade means we are now looking at designs with 146 billion transistors, and we have to deploy chiplets.”
Fig. 1: Ecosystems created through partnership. Source: AMD (based on Needham & Co. data)
This is not a new idea, however. “If we look back to the first DAC in 1964, it was created as Society to Help Avoid Redundant Effort (SHARE). That acronym is very prescriptive of what we need right now. We need a shared vision of the problem we are solving,” he said.
Put simply, solving problems the industry now faces cannot be done by any single company, and a lot of the innovation happens at the overlap of partnerships.
Fig. 2: Percentage of gains from scaling. Source AMD
At 3nm, design technology co-optimization (DTCO) is expected to overtake intrinsic scaling. The trends are a challenge to EDA, the application developers, and to the design community. In order to solve the problems, the solution platform needs to be re-architected, particularly for AI. That brings engines and interconnects together with chiplets, up through software stack layers, to create the platform. Engines are becoming more specific, and domain-specific accelerators are needed for an increasing number of tasks.
Fig 3. Platform approach to problem solving. Source: AMD
“In the next era of chiplets, we will see multiple combinations of 2D and 3D approaches, and partitioning for performance and power will open up new design possibilities. This will create incredible opportunities for EDA, and you will have to rethink many things from the past. We also have to do this sustainably and think more about power. IT computation is on a trajectory to consume all available energy, and we have to cap it now.”
Papermaster called upon Aart deGeus, chairman and CEO for Synopsys, to talk about the sustainability of computing.
DeGeus focused on the exponential of Moore’s Law, overlaid with the exponential of CO2 emissions. “The fact that these two curves fit almost exactly should be very scary to all of us,” he said. “Our objective is clear. We have to improve performance per watt by 100X this decade. We need breakthroughs in energy generation, distribution, storage, utilization, and optimization. The call to action — he or she who has the brains to understand should have the heart to help. You should have the courage to act. I support this message from our sponsor, planet Earth.”
Papermaster followed up saying that AMD has a 30X by 2025 power efficiency goal, exceeding the industry goal by 2.5X. He said AMD is on track and currently has achieved a 7X improvement. “If the whole industry was to take on this goal, it would save 51 billion KW of energy over 10 years, $6.2B in energy costs, and drive down CO2 emissions equivalent to 600 million tree seedlings.”
Papermaster added that AI is at the point of transformation for the design automation industry. “It touches almost every aspect of our activities today,” he said, pointing out that various technologies such as emulation, digital twins, generative design, and design optimization are use cases that are driving EDA. “We are using AI to help improve the quality of results, to explore the design space and improve productivity.”
He also provided one example where packaging can help. By stacking cache on top of logic, AMD could achieve 66% faster RTL simulation.
Anirudh Devgan, president and CEO of Cadence
Devgan’s presentation was entitled, “Computational Software and the Future of Intelligent Electronic Systems,” which he defines as computer science plus math, noting this is the underpinning of EDA.
“EDA has done this for a long time, since the late ’60s to early ’70s,” Devgan said. “Computational software has been applied to semiconductors and is still going strong, but I believe it can be applied to a lot of other things, including electronic systems. The last 10 years has been big in software, especially in social media, but for the next 10 to 20 years, even that software will become more computational.”
There are a lot of generational drivers to semiconductor growth. In the past there were single product categories that went through a boom and then a bust. “The question has always been, ‘Will it continue to be cyclical or become more generational growth?'” said Devgan. “I believe, given the number of applications, that semiconductors will become less cyclical.”
He said that while the cost of design is going up, people forget to include the volume. “The volume of semiconductors has gone up exponentially, so if you normalize the cost of design, has cost really gone up? Semiconductors must deliver better value, and this is happening and that is reflected in the revenues over the past few years. There is also an increase in the amount of data that needs to be analyzed. This changes the computer storage and networking paradigm. While domain-specific computing was talked about in the ’90s, it has become really critical in the last few years. This brings us closer to the consumer and the system companies doing more silicon. The interplay between hardware and software and mechanical is driving the resurgence of system companies, driven by data. 45% of our customers are what we would consider system companies,” he said.
Fig 4. Data’s growing impact. Source: Cadence.
Devgan pointed to three trends. First, system companies are building silicon. Second is the emergence of 3D-IC or chiplet-based design. And third, EDA can provide more automation by utilizing AI. He supplied some supporting information for each of these areas and then looked at various application areas and how models apply to them. He agreed with Papermaster, who said that gains were no longer coming just from scaling, and that integration was becoming a bigger thing. And he outlined the phases of computational software’s emergence in different generations of EDA.
Fig. 5: Eras of EDA software. Source: Cadence
Perhaps the most important thing that came out of this discussion was that EDA has to start addressing the whole stack, not just the silicon. It must include the system and the package. “The convergence of mechanical and electrical requires different approaches, and traditional algorithms have to be rewritten,” Devgan said. “Thermal is different. Geometries are different. Classical EDA has always been about more productivity. A combination of a physics-based approach and data-driven approach works well, but EDA has only historically focused on a single run. There has been no transfer of knowledge from one run to the next. We need a framework and mathematical approach to optimize multiple runs, and that is where the data-driven approach is useful.”
Optimization is one area where he provided an example, showing how numerical methods could be useful and provide an intelligent search of the space. He said that approach can be used to achieve better results in a shorter time than a person can do.
Devgan also addressed sustainability. “This is a big thing for our employees, for our investors, for our customers,” he said. “Semiconductors are essential, but they also consume a lot of power. There is an opportunity for us to reduce that power consumption, and power will become the driving factor in PPA — not just at the chip level, but in the data centers and at the system level. Compared to biological systems, we are orders of magnitude off.”
Steve Teig, CEO of Perceive
After more than three decades of working on machine learning applications, Steve Teig is convinced more can be done. “First, deep learning would be even stronger than it is if we depended less on folklore, and anecdotes, and spent a little more time on math and principles,” he said. “Second, I believe efficiency matters. It is not enough to make models that seem to work, we should worry about computational throughput per dollar, per watt, per other things.”
Teig observed that deep learning is impressive, and would have been considered witchcraft just 15 years ago. “But we need to recognize them for the magic tricks they are,” he said. “We keep making bigger, badder models. We have forgotten that the driver of innovation for the last 100 years has been efficiency. That is what drove Moore’s Law, the advance from CISC to RISC, and from CPUs to GPUs. On the software side we have seen advances in computer science and improved algorithms. We are now in an age of anti-efficiency when they are doing deep learning. The carbon footprint to train a big language model just once, which costs about $8M, is more than 5X the amount of carbon footprint for driving your car for life. The planet cannot afford this path.”
Fig. 6: Growing AI/ML model sizes. Source: Perceive
He also said that from a technical point of view, these gigantic models are untrustworthy because they capture noise in the training data, which is especially problematic in medical applications. “Why are they so inefficient and unreliable? The most significant reason is we are relying on folklore.”
He structured the rest of his presentation around the theme of, “A Myth, a Misunderstand and a Mistake.” The “myth” is that average accuracy is the right thing to optimize. “Some events don’t really matter, and other mistakes are more serious. The neural networks that we have do not distinguish between serious errors and non-serious. They are all scored the same. Average accuracy is almost never what people want. We need to think about penalizing errors based on their severity and not their frequency. Not all data points are equally important. So how do you correct this? The loss function must be based on severity, and the training set should rate based on the importance of data.”
The “misunderstand” is the mistaken belief that neural networks are expressive as computing devices. “Many of the assumptions and theorems are very specific and not satisfied by real-life neural networks. It is thought that a network can approximate any continuous combinational function arbitrarily closely with a feed-forward neural network. This depends on having non-polynomial activation functions. If this is true, we need an arbitrary number of bits. More concerning is that the only functions you can build in this type of neural network are combinational, meaning that anything that requires state cannot be represented. There are theorems that state that NNs are Turing-complete, but how can this be true when you have no memory? RNNs are literally finite state machines, but they have very limited memory. They are effectively regular expressions that can count, and that makes them equivalent to grep.”
The “mistake” is the belief that compression hurts accuracy and thus we should not compress our models. “You want to find structure in data, and distinguish structure from noise. And you want to do this with the smallest number of resources. Random data cannot be compressed because there is no structure. The more structure that you have, the more compression you can do. Learning in principle is compressible, which means it has structure. Information theory can help us create better networks. Occam’s Razor says the simplest model is best, but what does that mean? Any regularity or structure of the data can be used to compress that data. Better compression reduces the arbitrariness of choices that the networks make. If the model is too complicated, you are fitting noise.”
Fig. 7: Types of compression. Source: Perceive
What would perfect compression look like? Teig provided an interesting example. “The best compression has been described by mathematics. It is capturable by the shortest computer program that can generate the data. This is called Kolmogorov complexity. Consider Pi. I could send you the digits 31459, etc., but a program to calculate the digits of Pi enables you to generate the trillionth digit without having to send that many bits. We need to move away from trivial forms of compression, like the number of bits for weights. Is 100X compression possible? Yes.”
Giovanni De Micheli — Professor of EE and CS at EPFL
Giovanni De Micheli started by looking at the hierarchy of cycles in EDA, each linked to the other by some relation. “The cross-breeding of technology and design leads to superior systems,” he said.
After looking at how loops exist historically, in music, art, and mathematics, he then looked at the interactions between actors — between industry, academia, finance, start-ups, and conferences like DAC that provide data exchange. He used all of this to introduce three questions. Will silicon and CMOS be our workforce forever? Will classical computing be superseded by new paradigms? Will living matter and computers merge?
Silicon and CMOS. De Micheli looked at some of the emerging technologies, from carbon nanotubes to superconducting electronics, to logic in memory and the use of optics to speed up computation in machine learning. “Many of these are paradigm changes, but you have to look at the effort that would be needed to take these technologies and make them into products. You need new models. You need to adapt or create EDA tools and flows. In doing this, you may discover things that enable you to make existing things better.”
Research into nanowires has led to electrostatic doping, and this creates new gate topologies. He also looked at tungsten diselenide (WSe2) and showed a possible cell library in which you can very efficiently implement gates such as XOR and the Majority gate. “Let’s go back to look at logic abstraction,” he said. “We have designed digital circuits with NAND and NOR for decades. Why? Because we were brainwashed when we started. In NMOS and CMOS, that was what was the most convenient to implement. But if you look at the Majority operator, you realize it is the key operator to do addition and multiplication. Everything we do today needs those operations. You can build EDA tools based on this that actually perform better in synthesis.”
Fig. 8: Gate topologies and libraries for different fabrication technology. Source: EPFL
After going through all the background for how to use the Majority operator, De Micheli claimed that it could lead to a 15% to 20% delay reduction compared to previous methods. This is an example of his loop where an alternative technology teaches us something about an existing technology and helps to improve it, as well being applicable to new technologies, such as superconducting electronics.
“EDA is the technology enabler. It provides a way in which you can evaluate the emerging technology and see what is useful in the virtual laboratory. It creates new abstractions, new methods and then new algorithms that are beneficial not only for these new emerging technologies, but also on establish technologies. And we do know that current EDA tools do not take us to the optimum circuit, and therefore it’s always interesting to find new paths to optimize circuits.”
Computing paradigm. De Micheli started looking at quantum computing and some of the applications it is useful for solving. The loop here requires adding notions of superposition and entanglement. “This is a paradigm shift and changes the way we conceive of algorithms, how we create languages, debuggers, etc.,” he said. “We have to rethink many of the notions of synthesis. Again, EDA is a technology-independent optimization that will lead to reversible logics. It is reversible because the physical processes are inherently reversible. And then the mapping to a library. You have to be able to embed constraints. Quantum EDA will enable us to design better quantum computers. Quantum computing is advancing the theory of computation which includes the polynomial class of problems that can be solved in polynomial time. That includes, for example, factoring, and this will impact security.”
Living matter and computers. “The important factor in the loop here is about doing correction and enhancement. We have been doing correction for over 1,000 years with the eyeglass. Progress has been tremendous.” De Micheli discussed many of the technologies that are available today and how they are transforming our lives.
Fig. 9: Creating feedback loops in medical applications. Source: EFPL
“This is leading to new EDA requirements that allow us to co-design sensors and electronics,” he said. “But the ultimate challenge is understanding and mimicking the brain. This requires us being able to decode or interpret the brain signals, to copy neuromorphic systems and learning models. This creates interfacing challenges for controllability, observability, and connection.”
That’s step one. “The next level — the future — is from brain to mind, and basically being able to connect artificial and natural intelligence. Advances in biology and medicine, together with new electronic and interfacing technology, will enable us to design biomedical system to live better,” he said.
Conclusion
Executing a task in the most efficient manner involves a lot of people. It involves carefully designing the algorithms, the platforms on which those algorithms run, the tools that are used to perform the mappings between software and hardware, hardware and silicon, or an alternative fabrication technology. When each of these is done in isolation, it can lead to small improvements. But the big improvements come about when each of the actors works together in a partnership. That may be the only way to stop the continuing damage to our planet.
Leave a Reply