Constant updates, more variables, and new demands for performance per watt are driving changes at the front end of design.
Prototyping, an essential technology for designing complex chips in tight market windows, is becoming significantly more challenging for the growing number of designs that include AI/ML.
Prototyping remains one of the foundational pillars of the whole shift left movement, allowing software to be developed and tested before actual silicon is available. That, in turn, enables multiple teams to work on a design simultaneously, and it allows for more experimentation with different options to see what works best for a particular application.
“Design size growth continues to push the limit of the size of hardware and emulation systems,” said Lance Tamura, product management director at Cadence. “Design size has already limited what simulation can support down to small blocks or IP or very short test cases. AI has been a significant driver in the growth of emulation and prototyping platforms as it requires large blocks of customized silicon. As such, AI has driven the growth of design sizes and design starts resulting in growing demand for emulation and prototyping.”
But the rollout of AI/ML, with continual optimizations and changes to algorithms, is adding some uncertainty into a well-oiled design flow, as well. Changes in software can affect how data moves through a chip or package, how and where it is stored, and it can create new stress points that impact performance, power, and reliability.
“As a general rule, pre-silicon system simulation has always been a critical step in any high-performance ‘data plane’ element of an SoC,” said Steve Roddy, chief marketing officer at Quadric. “What’s new today, in 2023, is that the rapid rush of machine learning inference is simultaneously disturbing nearly all types of subsystems. The known characteristics of proven building blocks that might have allowed a team to use heuristic approaches to size system resources — memory, bus bandwidth, I/O bandwidth, power management — are all disrupted.”
This applies to graphics engines, which use ML inferencing for dynamic super-scaling and up-scaling, as well as Wi-Fi sub-systems, which may use a novel ML algorithm to improve beam forming or constellation decoding, said Roddy. It also applies to deep-learning networks, which include voice enhancement, as well as cameras.
“Each of those novel ML workloads could possibly need to run simultaneously, and each is likely rapidly changing as data scientists invent new network graphs nearly every month,” he said. “Simulation followed by FPGA prototyping is practically the only way to fully wring out a design before writing that enormous check to the mask shop.”
Even prior to rollout of AI, prototyping was already being pushed to the limits. While design teams appreciate prototyping’s value, they wish it didn’t take so long or have quite so many iterations along the way.
“Shortening the time to get to a working FPGA prototype is the number one request from customers,” said Juergen Jaeger, director of prototyping product strategy at Siemens EDA. “Once the prototype is running and functional, everybody is ecstatic, because now you have a pre-silicon version of your design. You can boot an operating system and run real network traffic through it to see how the system really behaves.”
How it works
Simulation, emulation, and prototyping are like three legs of a virtual stool. Typically, that starts with simulation, a software-based verification methodology, where a representation of the design is created in software, whether it’s VHDL, Verilog, SystemVerilog or SystemC. “By its very nature, simulation is only as good as the set of questions you ask at the testbench,” Jaeger noted. “And because it’s all running on a computer in software, it’s also the slowest of those methodologies.”
Emulation goes a step further, with the design mapped into the emulator itself. “The design is run on a piece of hardware, and you can view the emulator as a massively parallel compute engine, an accelerator, for the simulation,” Jaeger said. “Then, because the emulation is running fast enough, in addition to the test bench, you also can apply real-world stimuli and connect a real monitor or keyboard or mouse to it. That makes a more complete verification environment because you’re no longer dependent on the questions you ask the test bench.”
Prototyping takes that yet a step further, effectively creating a digital twin of the design, which is mapped into FPGAs, letting the functionality of ASICs be tested before they go into silicon. FPGA-based prototyping is usually the final step in a three-step process.
“All three are complementary to each other and are used in different phases in your project,” said António Costa, director of product marketing at Synopsys. “On a prototype, you can have the real-world interface. If you want to interact with PCIe 5.0, for example, you need a prototype that runs at very high performance. The only way to do this compliance test is to run that interface at speed with the software you are planning to use in your SoC.”
Strategies for change
All of this becomes harder in complex designs and new application areas, where algorithms are in a state of constant flux. While a design may work perfectly post-simulation and emulation, it may behave very differently months or years later with a continual stream of software updates and system optimizations.
That may not be a problem in a consumer device with a life expectancy of a couple years, but it’s a very different story in automotive or data center applications, where chips are expected to function as expected for much longer. In automotive and markets like 5G/6G, regulations and standards have been implemented to ensure that designs don’t stray too far from the original spec. “If you don’t have the compliance, you may not be authorized to sell it if there is something that is not compatible,” Costa said. “This is very important, and is the very last mile of the development that needs to be done every time.”
Costa emphasized this is a crucial and often overlooked failure point, and why prototyping can never be skipped. A chip may pass simulation and emulation with flying colors — in other words, work perfectly in isolation — but then fail in the market because it can’t be connected via standard protocols.
Even within the bounds of this traditional three-step process, design teams may shake things up a bit for different applications. “While some people think simulation, emulation, and prototyping happen in sequence, it’s often the case that they’re happening in parallel because there are certain things that one technology can do better than another,” said Larry Lapides, vice president sales and marketing at Imperas Software. “If you follow that sequence, then by the time you get to silicon or you’ve got FPGA prototypes, what happens if you find a software bug? Simulation is an inherently white-box technology that provides insight into everything that’s going on. Whereas if you’re working with hardware, it’s not always the case where you can get all your information. We have users that start with simulation and use it pre-silicon, but also use it post-silicon because of the observability, the ease of automation, and controllability that the simulation provides.”
Many of these changes are a reaction to growing complexity of designs, and of the need to co-design hardware and software to maximize performance per watt. That creates some predictable problems.
“Hardware and software teams don’t necessarily know how to communicate with each other. They’re coming from different perspectives,” said Lapides. “Hardware folks love models. But if they’re building A) a detailed model, and B) a model of the full SoC, those things are going to slow down the simulation environment for software development. You don’t need to model everything on the SoC. You don’t need the level of detail of a processor pipeline. You don’t need to do a physical implementation of an Ethernet protocol for communications. You need to focus on defining the project with realistic, achievable milestones that build upon each other.”
AI/ML can blur, those lines even further. “The pressure is on the IP core vendors to produce accurate SystemC models of processors and accelerator blocks that the chip builder can combine in early system models to run early software,” said Quadric’s Roddy. “The chip builder likely will prototype much or all of the full SoC in a large emulation system build tape-out. For semiconductor design teams that choose to build NPU acceleration offload engines with in-house hardware teams, this puts a burden on those teams to also hire SystemC modeling experts. In fact, the chip integrator needs the C model well before the accelerator block, because only when the accurate model is running can the integrator know the accelerator has the correct specifications.”
The flip side of this equation is that AI can be used to help sort through these problems, particularly when it comes to floor-planning. “One of the most frustrating things for users is when the place-and-route tool, after 20 hours, says, ‘Oops, I cannot do it,’” says Jaeger. “There’s no solution for it. You try to change something, you run it again, and you wait another 20 hours.”
Self-improving heuristics algorithms that learn from previous attempts, can predict what design might fail place-and-route, and should help to avoid these kinds of problems. “That’s a huge productivity boost because that now means instead of waiting for 20 hours until it fails, you know after a minute that it will fail, and you can fix it before you even run place and route,” said Jaeger.
Changing emphasis
AI also can change the relative value of the three pillars. “If you’re defining an AI system, you’ve got AI algorithms at the top level, and you’ve got to be able to compile from those algorithms down to some intermediate level, which is going to take advantage of the architecture underneath,” said Imperas’ Lapides. “Then you’ve got to have the hardware compiler that’s going to take those individual pieces and compile them down to the core, after which you’ve got to run this for billions of scenarios. To be able to do that in a normal prototyping environment using an FPGA prototype, which is going to run that at, say, 50 megahertz, or do it in a hardware emulator, which is going to run at maybe 5 megahertz — versus doing it in a simulation environment that’s going to run it at hundreds of megahertz — means there’s a real advantage to simulation. It can’t do everything, but it can do a lot to help optimize the AI compile step from algorithms to the distributed AI accelerator on the silicon. We’ve got customers that are building AI devices that are using tens to hundreds of simulation licenses in parallel to help with functional correctness testing, but also optimization.”
Figuring out which tools to use, when to use them, and how to partition the design is a growing challenge in its own right.
“Existing partitioning algorithms are still quite effective but realizing both very fast compilation time and fast run time remains a challenge,” said Cadence’s Tamura. “This is made more difficult with the growth of design sizes. AI algorithms have the potential to not just come up with efficient partitioning solutions for current platforms, but may also influence the architectures of future platforms.”
Finding the best approach may require talking to a variety of vendors, and scouring technical papers to see what works best for what applications. It may also require some narrowing of focus on the part of chipmakers, limiting the number of options to those that are well-tested in silicon.
“Standardize your design methods and tools as much as possible,” advises John Mick, ASIC design manager at Keysight. “Develop test cases to prove out changes to the flow before inserting into the production flow. Use the same process node for multiple designs. Moving to new node means new IP and design rule sets that can add many months to the ASIC development time.”
That said, complexity of these designs, and customization for specific applications and use cases, can have a big impact on what works best. “Customers should understand the key capability they’re looking for,” said Jaeger. “For example, if your goal is to mostly run software, then the highest possible performance is your number one priority. If you want to mostly make sure that your RTL is functional, and you don’t have to go through silicon respin, then debug capabilities are the most important thing for you.”
To this point, EDA tool providers often have tailored offerings. “For example, Keysight is simulator agnostic, so we don’t use a ‘canned’ vendor flow. We optimize our design flow to work with different vendors. This means we have developed hundreds of scripts to handle the tools and make it all work,” Mick said.
Synopsys’ Costa added that all of this advice may appear obvious, but in the middle of a crunch, when a team is worried about time and budget, it’s easy to ignore. That can wind up costing even more time in the end. “If you still need to change the RTL every day, that’s probably too soon to go into prototyping. You should go back to emulation or even to simulation and be sure that your design is in good shape. I always hear people talk about schedule. ‘Oh, I don’t have the time, so I need to just use one platform.’ However, you’re probably losing more time because you don’t have the right tool. You don’t have a hammer, you do want to put your nail in the wood in any regard, but you don’t have the time to go to the shop to buy a hammer. Further, these are very sophisticated tools. If you don’t know the methodology to debug things in a step-by-step manner, you may be lost very quickly.”
And above all, having a solid plan is key to knowing when to stop. “Verification is never ending,” Jaeger said. “There’s always something more that can be checked. Having a plan also means knowing when good is good enough for what you did.”
Conclusion
The number of options available to chipmakers, and the demands of end markets looking for highly customized solutions that include some form of AI/ML, with orders of magnitude improvements in performance per watt, are stretching existing tools and methodologies well beyond their initial targets.
The key for chipmakers will be to limit the number of options where it makes sense, in order to stay focused, as well as utilize whatever new techniques, tools, and methodologies are available in places where designs can benefit from them. Still, there will be more variables to contend with, and more challenges that may be unique for which there may be no single solution. These changes have made the design process much more interesting and innovative, but there’s a price to pay for that level of freedom.
Related Reading
Shift Left, Extend Right, Stretch Sideways
Development flows are evolving as an increasing number of optimization factors become interlinked. Shift left is just one piece of it.
AI Adoption Slow For Design Tools
While ML adoption is robust, full AI is slow to catch fire. But that could change in the future.
Leave a Reply