Whichever method of power aware verification is chosen, don’t wait until three weeks before tapeout to do it.
The addition of low power circuitry can create so many corner cases that many can escape even the best-written testbenches. This has driven the need for so many additional verification cycles to be run that there must be many datacenter managers at semiconductor companies wondering if it is a trick by the power companies to cause an equal amount of power to be consumed by low-power verification as is saved by end-user usage.
Power has become a completely new dimension to the already existing verification problem.
“In traditional functional verification, we have long had the assumption that the power is on across the board,” said Erich Marschner, verification architect at Mentor Graphics. “The entire design is functional, it’s ready to react to input, and that doesn’t change for the entire duration of the functional verification run because we’ve always assumed when the device is physically realized you’re going to turn the power on and have the power on for the whole system whenever its operating. But as soon as you break that assumption, it’s not just one more change to the situation. There is one more possible change at every cycle of execution of the design. This means at every stage, you have to know what the power situation is. Power states in the system or subcomponents of the system can change at any moment, so it adds substantial complexity to the whole problem. It’s almost like every single cycle of evaluation now has to consider multiple different states.”
Marschner noted that if this were to be thought about in terms of formal verification, where sets of states are being evaluated at every cycle as opposed to a single state, the sets of power states must be considered at every cycle. That essentially increases the complexity exponentially at every cycle. “We can’t run that many more simulations. But we at least have to think about all of the power states of the system, and the power state machine of the system, and walk through all of the different power states and the transitions to make sure we can exercise that power state machine just as well as we can exercise the functional state machines that are part of the design. Because there are both functional state machines and power state machines, now we have to take the cross product of the two. That’s the reason why there’s a lot more verification needed. You have to think about those interactions.”
Measuring that extra verification is very difficult to do, however. “Basically everybody agrees that if you want to measure coverage for low power it would be a very depressing day because it would be so low,” said Lawrence Loh, vice president of engineering at Jasper Design Automation. “So we’ve moved away from that measurement. They only use coverage to measure the normal functionality, and use a little bit more of a feature-based different approach to verifying low power. Obviously that presents a problem as to how accurate that is because low power in the aggressive form — what most of the mobile industry is using — is relatively new.”
Finding a way to think through the problems that might arise obviously takes time, just like verification itself. He said today’s normal verification using constrained random took many years to mature, including people’s experience and what they learned from their mistakes. Low power verification is going through this phase no, which is why there are many failsafe mechanisms in low power designs.
“Failsafe is the nice way to position it. In reality what happens is that there are a lot of capabilities that turn off a lot of low power features. What do you do with that? Let’s say I have 15 different domains that I can turn off power. You verify as much as you can and at the end of the day you may have to disable the capability of five of them. You can see that the return is questionable sometimes as for the amount of work you put in,” Loh explained.
The next thing to tackle is how to handle the areas that need attention in low power verification. “You can do things smarter, more targeted or try to come up with the new solutions or new apps that target these areas,” he said.
In fact, this is the approach that Jasper has taken based on customer feedback. “They know where the problems are but they don’t have good solutions,” Loh added.
Low power verification strategies
As far as strategies to address low power verification go, Marschner observed there are a variety of approaches taken. “Users are always concerned about performance and time so they’re never happy about having to invest more simulation cycles or slower cycles, which is sometimes the case with power-aware simulation because there’s more going on now and simulation slows down a little bit.”
He noted that some engineering teams say they only run about 10 test cases for power awareness because most of their concerns involve functionality, and these 10 cases or this 5% of the regression suite is enough to test out what they’re concerned about with regard to the power management.
“On the other hand, we’ve seen other customers who’ve said they really want to run all of their regressions in power-aware mode because they don’t know what is potentially going to cause a problem from a power management point of view, or what scenario is going to be the one that will trigger a failure that highlights a problem with power management,” Marschner continued. “One of the observations that I’ve made over the past couple of years is that the people who really have problems with power-aware verification are the ones who wait until the end to do it because invariably, it’s three weeks before tapeout, they haven’t bothered to do it yet, and they are discovering issues that they should have fixed months ago, and now they don’t have time.”
Vic Kulkarni, senior vice president and general manager at Ansys-Apache, explained that in addition to verification of low-power techniques, the issue is also about achieving the “promised” low power. Often, tools will not account appropriately for the overhead of added RTL logic, which is meant to reduce the overall dynamic power, or switch cells that are added to reduce static power. That could end up increasing the overall area and power consumption.
“It is critical to implement RTL power reduction techniques based on power analysis as opposed to making ‘blind’ structural changes based on purely algorithmic or a formal approach in order to avoid costly design iterations,” Kulkarni said. “As an example, modeling clock tree synthesis behavior upfront during RTL power is critical to predict whether a suggested RTL change will truly lead to reduced clock power.”
Predicting physical-aware power at RTL becomes a critical issue for designers, especially when it comes to clock power. As such, Kulkarni said, designers have to “calibrate” power estimation at RTL in the context of what happens to the RTL changes post RTL-gate synthesis, clock-tree synthesis and place-and-route.
Figure 1 below shows a plot of power savings as a percentage of maximum possible on a given block versus the number of RTL changes made using low-power techniques. “One can see that with a few systematic changes made in RTL which are based on RTL power analysis, they can achieve a higher percentage rather quickly. But then, additional RTL changes have diminishing returns in terms of power savings. In fact, the additional changes start impacting the area of the chip with very little improvement in percent of power savings,” he noted.
Fig 1: Analysis-driven RTL power reduction minimizes design impact and enables designer to make systematic changes. (Source: Ansys-Apache)
Further, Anand Iyer, director of product marketing for the low power platform at Calypto, noted that multicore and distributed processing helps EDA software tools run faster, but added that it’s not a linear improvement based on the number of cores.
Based on previous experience as a designer at AMD, which had huge server farms where 90% of the servers were used for verification and only 10% were used for actual design, even with multicore implementations, partitioning a design is still a challenge because of how to access the boundary conditions. “You can never create a clean partition. Reuse is another way to manage the verification complexity but when you think about power management, it throws reuse out the door because every new use case has a different use model,” he said.
Another approach would be an app infrastructure – a loosely coupled system of smaller programs that would run one job at a time, Iyer suggested.
Marschner agreed that there are many ways in which software can be made to run faster, including multicore implementations and distributed processing in a server farm. “In general, all of those techniques get used—eventually. We’re certainly trying to apply everything we can to speed the process up.”
He said that one common and effective thing that can be done is to simply compile the design with its power management as a separate unit, and then be able to link that without recompiling it into different test environments. This can eliminate a lot of the compilation and build phases.
“These are local optimizations that we can all make in the flow that go without saying there’s no point in not doing some of these things. The larger win that [engineering teams] could achieve would be to organize their power-aware verification more intentionally up front to focus on the issues that they can address in layers,” Marschner said. This would help them avoid finding bugs later in the process that should have shown up earlier as well as eliminate a lot verification activity that might now be no longer valid because there are bugs involved.
At the end of the day, if the power aware verification process is organized well, it’s easier to find pervasive and broader scope bugs earlier in the process, while also making sure that the verification tasks done later are worthwhile.
[…] Boosting Low Power Verification Efficiency Whichever method of power aware verification is chosen, don’t wait until three weeks before tapeout to do it. […]