Trimming Waste In Chips

How much extra circuitry is necessary is a matter of debate, but almost everyone agrees it can be reduced.

popularity

Extra circuitry costs money, reduces performance and increases power consumption. But how much can really be trimmed?

When people are asked that question they either get defensive or they see it as an opportunity to show the advantages of their architecture, design process or IP. The same holds true for IP suppliers. Others point out that the whole concept of waste is somewhat strange, because it is a natural by-product of trying to get a good-enough product to market at a reasonable cost.

There are a number of reasons why chipmakers purposely utilize extra area:

Margining. Adding extra circuitry costs money, but it allows chipmakers to buffer timing issues or process variability.
Maturity. Using IP that has been silicon proven reduces risk, even when it may provide more capabilities than required.
Expandability. Leaving space and connectivity hooks to integrate additional features is critical in new or highly competitive sectors, where chips may be outdated even before they reach the market.
Flexibility. Applications processors are perhaps the least efficient means of performing a given function, but they provide safety, flexibility and expandability.
Evolving standards. Changes to standards are almost inevitable in new or evolving markets, such as automotive or communications. The ability to adhere to constantly updated standards without completely redoing a chip can save both time and money in the long run.

Still, almost everyone agrees that waste can be reduced. The question is how much and at what price.

Waste and economics
“Across the whole design there is rarely a case where someone can say, ‘I used all of the logic perfectly, used all the memory,'” says Steve Mensor, vice president of marketing at Achronix. “Consider an FPGA on a board. Companies probably use half of its functionality and the other half is either of no interest or they couldn’t find the right balance of resources to get to a higher utilization percentage.”

The concept of waste has to be defined in the right terms.

“As an IP supplier we are trying to maximize profit and our customers are trying to maximize profit,” says Geoffrey Tate, CEO of Flex Logix. “They would like to get the most optimized IP they possibly can, but minimizing waste is not the objective. It is to get as close to what they want while factoring in other important factors such as cost, availability and if the IP is proven in silicon.”

Ravi Thummarukudy, CEO for Mobiveil seconds the value of proven IP. “In the physical IP space, the value of something working is more important than an optimized design. If someone has an IP that is working in the right technology process, then even if they have slightly different requirements, it is likely they will choose the one that is proven in silicon even though it may not be the most optimized for them.”

Economics vary greatly from one design type to the next. “There is a big piece of the market that will accept waste in the interest of reducing risk,” says Dan Ganousis, a consultant for Codasip. “Those who build 10nm designs with nine-digit budgets and $15 million to $20 million mask sets cannot afford risk. The further you go into advanced IC design, the less risk is tolerated. This is in contrast to the emerging IoT market. For them, getting to market is more important than risk and they cannot release a product six months after their competitors. We also are seeing people who really care about low power and high security.”

IP selection and configuration
Selecting the right IP is part of the equation. “It takes a lot of time to properly select and qualify an IP,” points out Ranjit Adhikary, vice president of marketing for ClioSoft. “A poor IP selection can cause problems later on in the flow. It must be easy to compare various IPs and their configurations to see nuances such as foundry, who uses this IP, which chips using it are silicon proven, number of open issues etc. Creating a number of IPs with different configurations for different applications mandates a mechanism to categorize them.”

For the IP provider, configurability is essential. “Configurability is key to making sure the customer can create optimal designs,” says Warren Savage, general manager of the IP Division of Silvaco. “However, configurability is something that adds a lot of cost to the development process because all corners have to be verified. Furthermore, we have to take great pains that the customer knows how to configure the IP properly and, for example, make sure any parameter dependencies are checked.”

It is not uncommon for configurability to become a challenge. “We provide a utility for our DDR controller that is an exploration tool and allows the customer to explore different implementations based upon their traffic patterns, address mapping, and based on their area/power/latency requirements,” says Navraj Nandra, senior director of marketing for Synopsys’ DesignWare Analog & MSIP Solutions Group. “There are about 20 parameters they can enter into the tool and it will provide an optimized piece of RTL for the controller. It would be difficult for the customer to have a tailored piece of IP without this kind of utility.”

Selecting those parameters can get to be very complex. “Customers will often have a throughput requirement, an area requirement, a power requirement and may have some specific things such as the number of lanes,” says Mobiveil’s Thummarukudy. “But we are the experts in each of the protocols and so we configure the IP for customer requirements. So wastage in the controller space is less than in some other categories.”

In many cases, utilities for IP configuration may become as complex as the IP itself.

“We don’t want to bloat the IP with too many configuration options, so we add configurations to tools in a bottom-up manner,” adds Nandra. “It means that we are gaining system knowledge. In order to be able to do those configurations you have to know what parameters do. So the IP and tools are both becoming complex. That way, we can configure the IP or our customers can.”

To make that effective requires careful design of the IP. “Minimizing waste in IP relies on the natural intelligence of the IP architect,” says Graham Bell, vice president of marketing for Uniquify, Inc.. “The architect searches for an architecture that is scalable and extendable, and provides the performance needed. New innovative architectures become valuable property for an IP house.”

But there are limits with configurability. “For NoC design, we recognized that we would not be able to use the parametrization capabilities of the existing HDLs,” says Drew Wingard, CTO at Sonics. “We began to annotate RTL using other programming languages to manage the flexibility and configurability so that we could selectively enable features and avoid waste.”

Others look to compilers to create efficient IP. “Compilers can be used to generate optimum IP blocks,” says Farzad Zarrinfar, managing director of the IP division of Mentor, a Siemens Business. “For example, compilers can be used at the architectural level to enable customers to perform tradeoff analysis for speed, area, and power. Or, if features such as redundancy are not required, a compiler would eliminate it and offer size reduction.”

The bottom line is “configurable IP is never going to be as efficient, from a silicon cost point of view, as custom IP,” Savage concedes. “The benefit is that it is much cheaper, faster and safer to get your product to market with configurable IP.”

Synthesizable IP
A natural extension may be to migrate IP to a higher abstraction, but there are several problems associated with that. “High-level design has been thwarted for, among other reasons, the languages,” contends Codasip’s Ganousis. “SystemC is a great verification language, but shoehorning it into becoming an implementation language has proven difficult. One solution is to dumb down the language until you can synthesize it, but in doing that you lose a lot of the power and capability of the language. In addition, the art of how to write a very terse model is fleeting for most RTL people.”

In other areas, technology is lacking. “Synthesis is trying to maximize some objective function,” explains Wingard. “The problem is that the objective function for the performance of an SoC is not one that we have an algebra to describe. So, I could have a synthesis engine that could optimize around a set of latency constraints, and maybe even some throughput constraints, but those constraints do not contemplate the actual behavior of the memory controller because the actual throughput of a DRAM controller depend on the address patterns, the burst length and the time domain behavior of the components that are interacting with it. We are not aware of any synthesis algorithms that can handle the most fundamental challenge associated with meeting the performance requirements of the chip.”

Hard IP
Hard IP is often related to industry standard interfaces. “Tier 1 IP suppliers are all pretty expert at this by now,” says Tom Wong, design engineering director, Design IP Group at Cadence. “I don’t see any one vendor deploying the same IP in the same foundry process node that is substantially better compared to the competition. What is important in hard IP is design margin, quality, maturity and proven silicon.”

There are ways to differentiate at this level. Synopsys’ Nandra explains why form factor can be a differentiator: “Large application processors are trying to put a lot of interface IP on the edge of the chip and they are I/O limited. They are running out of pins. Going down in feature size does not help because you are not adding pins. These customers want the PHYs to be tall and skinny so that the IP has an aspect ratio that does not dominate the beachfront. In the datacenter market, they are all about performance and use sophisticated bump plans on the top level of their chip. They want the IP to be wide and short so that the signal and ground pins match their redistribution layer at the top layer of metal.”

Cadence’s Wong adds another way to differentiate. “In some cases, overhead may be inserted with a combo DDR/LPDDR PHY where a single PHY can support DDR3/3L, DDR4, and LPDDR3/4 interfaces. The benefit is that you have backward and forward compatibility in the SoC that may exist in the marketplace for five or more years. You can interface to different types of memory based on when the price crossover occurs as one memory type becomes obsolete and a new memory is actually cheaper. Having a combo memory PHY in the SoC allows you to extend the lifecycle of the chip.”

In some cases, a single die may be the core for multiple products. “A design may have additional SerDes for multiple packaging alternatives or different price points,” adds Mobiveil’s Thummarukudy. “This again is an economic decision.”

The PHY can also be integrated with the controller for additional saving. “When you integrate the two, it removes all of the wastage between the PHY and the controller in terms of interoperability requirements,” adds Nandra. “The complete solution reduces gate count and we have seen examples where this can provide 20% lower latency and lower area.”

Adding robustness
Pushing everything to the limits is not always the best approach. “Many aspects of design go by rules of thumb,” points out Thummarukudy. “They may have asked for 30% slack so they can provide a timing buffer. This could take care of challenges during physical design or to handle process variability. Is this wastage? I think it is more like insurance for problems during physical design or a process related issue.”

The danger is adding too much. “I could specify my system with enough buffering between the processor and memory such that even if the memory is maximally loaded and I have the maximum latency on the transactions flowing between the processor and memory, that I can cover however many transactions the processor can ever issue,” explains Wingard. “If my average memory latency is short enough such that fewer transactions would have been enough, then I have overdesigned. When do people decide that this is merited? If you designed the chip for the average case, then that may not work. When you get contention, things will slow down and you will fall below your required resource for some amount of time and if you haven’t built in some excess capacity then you may never catch up. So you have do some amount of over design and the skill in the performance architecture is recognizing how much is appropriate.”

Rightsizing is important and defining realistic scenarios or use-cases is one way to ensure that important performance requirements can be met. “The emerging Portable Stimulus Standard will provide system architects with a valuable tool to be able to define important scenarios,” says Adnan Hamid, CEO of Breker Verification Systems. “These use-cases can then be used as the starting point for the verification team and to verify the implementation meets the specification.”

Architectural waste
In addition to rightsizing the interconnect fabric, the architect also has to provide the right amount of processing power and this is becoming a more difficult task. “CPUs are largely maxed out, although they still continue to move up the curve with Moore’s Law—but much slower than before,” points out Achronix’s Mensor. “Single-core performance has basically been capped out and the number of cores that can be added without declining improvement is finite. So there is a ceiling.”

Some are looking toward better processors. “Instruction set architecture (ISA) affects area and power consumption in processor-based designs,” says Uniquify’s Bell. “The new RISC-V ISA allows for customization to provide just the instructions needed in a design. This translates to lower power and smaller area by eliminating redundant instructions and silicon.”

Ganousis adds that “to eliminate waste you have to get rid of the transistors that do nothing. They leak. You have to realize that no EDA company or foundry wants to eliminate waste. If you advocate for the customer, you come up with a different answer than will be provided by the EDA and foundries.”

All processors require memory, and this has to be sized, as well. “People write their C code and then they know they will upgrade the code in the future,” adds Flex Logix’s Tate. “How much extra memory should they put in? It is a risk-and-reward situation, and spending the extra silicon provides more flexibility but costs in short run.”

Some even see the CPU as being too inefficient and wasteful. “There is plenty of evidence that FPGAs are more power efficient than a CPU cluster implementation,” says Mensor. “CPUs are high-powered solutions, and while very flexible they can consume an order of magnitude more power than using an FPGA to do the same function. The challenge with an FPGA is that when programmed it is intended to do a specific function, whereas a CPU is intended to do any function. Its intent is to be programmed, whereas an FPGA’s intent is to run the function after it is programmed. So there are things like partial reconfiguration and on-the-fly configuration to make the changing of the function faster and flexible, but this is still a minority case.”

FPGAs themselves provide some interesting tradeoffs. “We implement the FPGA using standard cells so that we can implement the design quicker and can cover multiple process nodes,” Mensor says. “However, we do optimize a couple of the standard cells, such as the multiplexer. We use them a lot in the switch matrix and when the FPGA is programmed the routing changes the multiplexers. They do not have to switch fast from one input to another because they don’t do that very often. That allows for a very optimized cell.”

It is not possible to fully utilize an FPGA. “In any FPGA you will put in overhead,” he adds. “While a design may change very little, at some point you will probably get closer to 100% utilization. Then you will have difficulty doing place and route. You must have some overhead to continue to have flexibility. Even the smallest change could mean that there is insufficient flexibility to make that change.”

Tate says the current thinking is that teams should put in an extra one-third capacity, and over time they will gain experience and learn if that is a good number or not.

Conclusion
All designs contain waste, and the rapid progression of Moore’s Law almost has encouraged it. Getting designs to market has been more important than optimizing them. But those staying in older technologies are beginning to face different challenges regarding waste. Rightsizing the design and eliminating excessive margins becomes a way to reduce silicon area, power consumption and fabrication cost even though it will require a higher development cost. The IoT is challenging many “rules of thumb,” and that may trickle all the way through the design chain.

Related Stories
Starting Point Is Changing For Designs
Market-specific needs and rules, availability of IP, and multiple ways to solve problems are having a huge effect on architectures.
Multi-Physics Combats Commoditization
In a world of billion-gate designs, it is increasingly difficult to create a differentiated product without incorporating multi-physics elements.
Power Modeling And Analysis
Experts at the Table, part 3: Juggling accuracy and fidelity while making the problem solvable with finite compute resources and exciting developments for the future.