IP And Power

How can power be optimized across an entire chip when most of a chip’s content comes from third-party IP?


Power is quickly becoming a major differentiator for products, regardless of whether they are connected to a wall outlet or dependent on a battery. At the same time, increasing amounts of a chips content comes from third-party IP. So how do system designers ensure that the complete system has an optimal power profile, and what can they do to tune each of the IP blocks to ensure that overall power is minimized?

The answer is as complex as the devices themselves. There is a fine balance between what the IP industry can provide and the needs of the system designer. But there is almost universal agreement that power optimization needs to be pervasive throughout the development flow.

“If you aren’t always thinking about power at every stage, from the blank piece of paper to the finished layout, you will fail,” asserts David Harold, vice president of communications at Imagination Technologies. “Power is fundamental.”

Others agree. “IP suppliers listen to customers and respond to their needs,” adds , CEO of Flex Logix. “The customer has the perspective of the whole chip and the IP supplier has to take their guidance on the right tradeoffs to make. The customer knows the overall power, the system operating temperature profile, and other attributes of the system. IP blocks are black boxes and there are not likely to be any safe changes a customer can make inside an IP block. An IP block must be designed with options for customers, like multiple variations on Vt masks or various operating modes that the customer can select.”

Power can be optimized in several ways. “Power optimization techniques fall into three categories,” says Alexis Boutillier, functional safety and corporate application manager for ArterisIP. “One is linked to the actual application running at the SoC level, which saves the most power but requires additional logic and interfaces to enter and leave a power state. Then we have an intermediate level, where by correctly architecting your design you can automatically stop the clock for a complete element of the IP. Lastly, we have a low-level optimization done by a synthesis tool, which relies on good design of IP elements.”

Think big
Design of a power-optimized product must start from the very beginning. “The biggest leverage points on power are likely to be algorithms and architecture, not implementation and process technology,” says , CEO of Babblabs. “The right algorithm, or a better parallel architecture, can reduce energy by a factor of 10 or more, while circuit and fab process usually move the needle by no more than tens of a percent.”

Fig 1. SoC consumer portable power consumption trends. Source: ITRS

There are many questions that a system designer needs to be asking, says Tom Wong, director of business development in the IP Group at Cadence. “What is my power budget for each subsystem in the SoC? Do I have to maintain a different Vdd for each of the chip’s (voltage islands)? What is my power management scheme? Do I need to use power gating? Should I consider DVFS ()? In a multi-core environment, do I use a big.LITTLE approach, or should I use an octo-core architecture where only some of the cores are activated at any given moment? How do I make tradeoffs using hardware acceleration to manage power optimization – namely, use of DSP cores, graphics cores, audio cores, communications DSPs?”

Improved architecture and algorithms can have a cascading benefit to power. “Improvement can reduce the number of cycles needed to execute each task and that reduces the necessary clock rate which in turn often lets the voltage be reduced,” says Rowen. “As CMOS power is often dominated by the active term P = CV²f, the simultaneous improvement in voltage and frequency has a triple whammy benefit.”

In some cases, you cannot consider each IP block to be independent. “System-level and chip-level power consumption optimization has been limited in part by the number of analog IP that designers can readily use in their primarily digital designs,” says Muhammad Faisal, president and CEO of Movellus. “As an example, to implement an SoC with a GPU, memory and Arm cores – all running at different frequencies – designers must typically take the main PLL and divide down its frequency to generate different frequencies for the different IPs. This leaves a significant amount of power optimization on the table.”

Faisal suggests a better way. “Automatically generating previously analog PLLs as digital PLLs that do not disrupt the chip-level integration and timing flows dramatically increases the number of available frequencies to run each subsystem at its own individually optimized frequency. This gives architectural freedom to SoC designers, who can reduce chip-level power consumption by as much as 10%.”

Understanding the full context also can open up other possibilities. “One way to deal with performance optimization of ASICs is to use an accurate PVT monitoring subsystem,” adds Ramsay Allen, vice president of marketing at Moortec. “Such in-chip sensing solutions support the semiconductor design community’s demands for increased device reliability and enhanced performance optimization, enabling schemes such as DVFS, AVS and power management control systems.”

But all of this needs to be considered in context. Looking at power without considering thermal can lead to problems. “Selection of IPs with minimum power can have a great impact on the thermal management,” says Farzad Zarrifar, managing director of IP for Mentor, a Siemens Business. “Exceeding thermal limitation can have adverse impact on the cost of packaging and cooling. In some cases, excess power consumption will prevent the productization of the design. Minimization of power will result in key system benefits, such as longer battery life for portable applications and lower packaging cost, which is paramount in high-volume applications such as IoT and portable VR.”

Top-down process and IP selection
An awkward gap exists between the IP supplier and IP user. “For mostly digital IP, the knowledge an IP block developer has about the sensible ways to save power in the IP block is difficult to translate into guidance for the chip integrator,” observes Drew Wingard, CTO at Sonics. “The result is that the integrator feels like they don’t have enough information to be aggressive about using power control with most digital IP. They tend to waste power, or they use very crude, coarse power control strategies where they only control power when they know the block is completely unused. This leaves significant power savings opportunities on the table.”

Each IP block has to be stitched into the SoC. “Interconnect IP must be able to integrate and correctly respond to the SoC power management requests,” says ArterisIP’ Boutillier. “Every part of the IP is associated with the SoC-controlled power domain. When a request is made from the SoC power management, a complete power-down or power-up sequence of the IP is necessary to safely implement SoC power management requests.”

It would seem sensible to power down a block as soon as it has finished doing something useful. “What you don’t know without system and thermal context is how deep a power state you should enter when you shut down,” adds Wingard. “If you don’t know whether you’re going to be asked to do something else immediately, then you don’t know whether it is safe to go all the way to a power-gated state, where it may take some time to turn power back on. That system context is needed to make the choice between power gating and something that is much quicker, like doing coarse-grained clock gating, which saves less power but has much faster response time.”

When standards are involved
Many choices become limited when selecting IP that performs a standard function. “The most common semiconductor IP cores are DDR/LPDDR PHYs, USB PHYs, PCIe PHYs, and Ethernet PHYs,” says Cadence’s Wong. “These conform to standard industry protocol and interface specifications such as DFI for memory interfaces, PIPE for PCI interfaces, among others. These IP subsystems are quite self-contained. In modern SoCs, there is usually a well-defined chip architecture and a NoC/fabric, so integrating the rest of the IP blocks and other chip functionality to the NoC is well-defined. This is why using third-party IP is now the preferred way to design complex SoCs at advanced process nodes.”

When standards are involved, there may be less opportunity to change the algorithms and architecture, too. “The clock rate may be largely dictated by the interface standard,” says Rowen. “In this case, circuits and low-level architecture decisions may dominate. For example, cleanly identifying which subsets of the IP block can be safely turned off, or run at lower rate, can reduce the overall switching activity significantly. All things being equal, the smaller designs are usually lower-power designs, since power usually scales with capacitance. Capacitance per area is usually fairly consistent in a given process node, so the KISS principle (Keep It Simple, Stupid) isn’t not a bad guide.”

Finding a balance
Systems and IP are a balancing act. “eFPGA can be optimized for power first or for performance first-it depends on what customers want in a given node,” says Flex Logix’s Tate. “Generally, on the more advanced nodes the majority of customers want speed, speed, speed. On nodes like 40nm, the customers are much more power conscious and choose features like power-gating, back body biasing and low-voltage state retention over raw performance. But even these customers want the best performance possible within the power constraints imposed.”

Imagination’s Harold seconds that sentiment. “We have to strike a balance. We sell IP, so a perfect knowledge of every end system cannot be achieved. But we work closely with licensee, foundry, and tool partners, so when we design we try to consider a wide range of likely contexts. Our work with tool and foundry partners has been especially important in light of the thermal characteristics of more advanced nodes and our commitment to helping customers get to market quickly and successfully. In today’s SoC designs, complexity and frequencies have steadily increased, while power budgets remain unchanged. Our customers need validated solutions that provide the flexibility to optimize for power, performance or area (PPA) in their processor implementations.”

There is always a temptation to “improve” an IP block. “This is a simple one,” says Wong. “Please DO NOT touch anything inside the IP. When an IP provider delivers an IP core to an end customer, the IP has gone through rigorous testing, QA and silicon validation, and in some cases, interoperability testing between the controller and PHY. This is a way to ensure ‘correct by construction’ and a means to guarantee the quality and performance of the IP and compliance to the standards protocols.”

Other IP vendors offer similar advice. “It is not a good practice to make any change inside IP blocks,” says Zarrifar. “IP users will lose their IP warranty in the majority of cases and may cause unexpected problems. The best choice is to request IP designers to offer selected changes.”

IP providers have to walk a fine line here. “There are opposing pressures at work on IP providers,” admits Rowen. “On one hand, to keep prices reasonable, the IP provider would like to have a finite range of standard products, available off-the-shelf. This may not lead to ‘one-size-fits-all’ but it does encourage ‘a-few-sizes-fit-many’, so that all the expensive design, verification, documentation and support investment can be concentrated on making the highest quality designs for the largest number of designs. On the other hand, IP teams are effectively competing with in-house teams that may be delighted to build a block tuned to the exact requirements. In principle, focusing on just the required features – and optimizing those – can give smaller, faster, lower-power designs. But all that customization can be expensive, and the design may be fragile, in the sense that even small changes in requirements can send the in-house team back to the start of the process.”

However, there is overhead in all power savings techniques. “Plus, the more techniques you incorporate the more you have to design and verify,” adds Wingard. “It’s relatively easier if the design is transparent to you. If the design is a black box, then it is very difficult, and your options are coarse choices like retention voltage. A similar problem is in trying to determine when a is block idle. If the IP block doesn’t make that kind of status information readily available, then designers must rely on heuristic-based approaches.”

“You can also control power consumption at a much lower granularity using global clock gating,” adds Boutillier. “At this level, you don’t know what the application is doing, but you know which element of your IP design is effectively used and which is unused. By having this knowledge, you can automatically stop the clock for an entire block once this block is unused even for a couple of cycles. This technique will drastically lower the power consumption of the clock tree for the entire element inside the IP and is transparent to the customer. This technique can also be exposed to the customer if he is using standard components like Q and P channels from ARM.”

Improving the process
Configurability of IP is an important way to defuse the tension between reuse and perfect fit. “In many cases, IP can be designed so that the major dimensions of usage variation are anticipated in the initial creation,” explains Rowen. “IP creation maps out the range of functions, number of execution units, width of data paths, target processing technologies, and high-speed vs. low-power logic and circuit options – and systematically tests and packages the whole meta-IP. It is more work up front, but it can deliver the best of both worlds. The configurability space may not cover 100% of the system team’s ideal solution, but it can come close. And the speed of configurability makes more time available for both higher-level optimizations at the algorithm and architecture level, and in the back-end to further guarantee the final design quality.”

It all comes down to communications, be that in terms of documentation or in the information that is passed between IP and the system. “When using synthesis, optimization is possible whenever you instantiate a set of flip-flops in your IP design if you ensure there is information that defines when the flop is effectively used,” says Boutillier. “The synthesis tool then tries to combine those qualifier signals to assemble blocks of flops. Having a high-level description language instead of directly writing Verilog ensures your designers are always required to provide this enable information whenever they instantiate flip-flop in the IP.”

There are many ways that the IP vendor can help. “IP vendors can assist the ASIC designers by providing expertise on macro placement and production results, as well as support and guidance on how to implement such optimization and reliability schemes,” says Moortec’s Allen. “This helps designers understand more about architecting and implementing such schemes. It is vitally important to listen to the needs of our customers to constantly improve not only our IP, but the support we offer for implementation and the placement of the PVT monitors within the design.”

Standards could help, as well. “The challenge is we are lacking agreed-upon industry standards for providing guidance,” says Wingard. “Ideally, the IP provider would simply provide the constraint UPF view of the block, which describes the logical places to partition the design and the power savings states that make sense within that partition. It would be good to provide an activity model so that the integrator can figure out how much power savings might be available.”

How could the system be improved? “Every project ends with that question,” concludes Imagination’s Harold. “It requires constant iteration and improvement. That process has to be at the heart of the system.”

Related Stories
IP Challenges Ahead
Part 2: For the IP industry to remain healthy it has to constantly innovate, but it’s getting harder.
Turning Down The Voltage
SoC complexity is making it more difficult to combine functional performance with demands for lower power.
Turning Down The Power
Why ultra-low power is suddenly an issue for everyone.

Leave a Reply

(Note: This name will be displayed publicly)