Maximizing Value Post-Moore’s Law

The value of a semiconductor can be difficult to measure because it involves costs and benefits over time. As market segments feel different pressures, maximizing value is going in several directions.

popularity

When Moore’s Law was in full swing, almost every market segment considered moving to the next available node as a primary way to maximize value. But today, each major market segment is looking at different strategies that are more closely aligned with its individual needs. This diversity will end up causing both pain and opportunities in the supply chain.

Chip developers must do more with a comparable number of gates. Foundries are creating more specialized processes that focus on a variety of optimization criteria. IP developers struggle to support them all and delivering on the needs of the vertical markets. System integrators care about chip counts on the board. And the people who deploy those boards care about the power and cooling costs at one end, or how long a battery will last at the other. Some industries do not care about chip lifetimes because their products are considered disposable, while other industries are looking to extend chip lifetimes to 15 or 20 years in products meant to last that long.

“Assessing value is really hard because it is over the lifetime,” says Kurt Shuler, vice president of marketing at Arteris IP. “A lot of chips are disposable. Consider your cell phone. You don’t really care if it’s working 10 years from now. For the data center guys and the AI chips, it’s the same thing. Certain industries do want that chip to last for 15 or 20 years, and that’s automotive, industrial — those kinds of things where there’s a huge capital cost component to that piece of equipment and people are not going to be throwing it away.”

Many industries are redefining business models or dealing with changing geo-political situations. “In upcoming markets, like electro-mobility, new usage scenarios like 24/7 operation lead to rising relevance of long-term reliability topics,” says Roland Jancke, head of department for design methodology at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Recent developments in the worldwide trade arena are raising the importance of stable supply chains and trustworthiness of the parts and components. Therefore, new topics and priorities need to be taken into account in system planning and architectural decisions.”

Data centers
The data center has a single overriding cost component. “About 30% of their costs are spent on cooling,” says Arteris’ Shuler. “That’s why a lot of them are next to hydroelectric dams. Traditionally, there was always a newer chip and a newer machine that would be lower-power and faster-performing. But as Moore’s Law is slowing down, this may not be true in the future.”

The pressure is then on the suppliers. “Should a design team do lots of optimization on a design to reduce power consumption even if it will increase development cost?” asks Tom Wong, director of marketing for design IP at Cadence. “Will the end user finish up saving enough on their power and cooling bill to offset the higher product cost? From a total cost of ownership perspective for a data center, operating costs trump everything else, including the delta in design costs to minimize product power.”

This focus creates opportunities. “Data centers are looking to optimize performance/watt,” says Dhaval Parikh, director of segment marketing, Infrastructure Line of Business at Arm. “As the demand for scalable cloud-native applications grow, hyperscale data center operators can greatly benefit by deploying servers that maximize their revenue per deployed server or reduce operating cost when used to deploy an internal service. The AWS Graviton2 CPU is a good example of this, offering up to 40% more application performance at a 20% lower price. We also see a clear trend toward enterprise applications benefiting from acceleration to greatly improve workload performance.”

In some cases, this is creating a new breed of chip developers. “I’ve been talking to some companies that are looking at Arm servers, or hardware accelerators for AI attached to x86,” says Shuler. “In the big scheme of things, they want to find more efficient ways to do it because it does burn a lot of power. For AI processing, whether it’s training or inference, the power issue is huge. That is why we see them investing money and creating their own chip teams.”

To get there is calling for some new paradigms. “System advances in accelerated computing platforms such as CPUs, GPUs and FPGAs, heterogeneous systems on chip (SoCs) for AI acceleration and high-speed networking/interconnects have all pushed chip integration to unprecedented levels,” says Wong. “Facing the true limits of Moore’s Law, the semiconductor industry is coming up against barriers to continued performance improvement. We have reached practical limits that we haven’t encountered previously, so CPUs, GPUs and FPGAs have all started the march to a disaggregated multi-chip solution. Chiplets offer a compelling value proposition because of a number of factors, including yield improvement, physical implementation reuse, being able to pick the best node and process for each part, and shortened design cycles.”

Pressures in automotive
Some markets demand extreme levels of flexibility, often because systems will be deployed for long periods of time. The ability to extend or adapt those systems over time increases value.

“The automotive industry is going through a period of significant change,” says David Fritz, senior director for autonomous and ADAS SoCs at Mentor, a Siemens Business. “In the past they would have hundreds of ICs and PCBs, but today they are putting all that functionality together into just a few large SoCs, all targeting a bleeding-edge node. They have to consider power, performance, area, and thermal, and they all have to be modeled before you even make a decision about which CPU should be used, how many CPUs should you use, how big should the cache be, do I really need a GPU, should I put an NPU in there, what is the right combination of these large IP blocks.”

This flexibility is also driving new business models. “You will have the ability to download applications over the air,” adds Fritz. “Those updates can change the fundamental behavior of the vehicle.”

But along with this is a thornier side where safety, security and reliability come together. “There’s one issue that hasn’t gotten a lot of attention yet — reliability,” he says. “Assuming failures are going to happen, how do you safely recover from those failures, without having physical redundancy of every subsystem in the entire vehicle, which would cause the cost to skyrocket. There’s a concept called dynamic redundancy. This basically means that in the event of failure you take over resources that were being used for a low-priority task and utilize them as a replacement for the failed capability.”

Reliability is more than just chip aging. “Consider the utilization of reset, to either reconfigure or re-power your device, in the middle of functional operation of the rest of the chip,” says Prakash Narain, CEO of Real Intent. “This can introduce metastability, which can cause your device to fail in a very subtle manner. Clock domain crossing (CDC) is a very pervasive failure mode in the sense that if you do not take care of it, then your device will have a small mean time between failure (MTBF). The reset domain crossing (RDC) failure mode has been present in the designs for a long time, but it is not as pervasive, meaning the MTBF for RDC failures is much larger than CDC. In the past, people have ignored this and accepted the need to reboot your device approximately every three weeks. The root cause could be RDC but it is very difficult to correlate these failures to RDC failures.”

Safety is being addressed, as well. “Safety is complicated because it may take a while before a company gets burned with the consequences of a poor functional safety flow,” says Sergio Marchese, technical marketing manager for OneSpin Solutions. “Take the Toyota unintended acceleration case as an example. The feedback process was very long and hugely expensive, much worse than a re-spin. At least there are standards like ISO 26262 that provide a common, state-of-the-art set of requirements for all the players in the supply chain.”

But security remains a high risk. “There are no established standards and processes,” continues Marchese. “There are no metrics to clearly assess how much more security you get per extra dollar spent. And while we have seen the dire consequences of security attacks on IT systems and IoT devices, cyber-physical systems are an entirely new game. The stakes are much higher. An attack could cause loss of human lives and damage to physical properties.”

Flexibility
Automotive is not the only market seeing increasing demands for flexibility. “The IoT is characterized by having 20 different versions of basically the same little chips,” says Shuler. “From a software standpoint, they’re using the same software built for a whole bunch of different products. That saves money. They try to re-use as much of that software as possible because that’s a major part of the cost. Everybody who’s been successful in these extremely low power IoT type chips has had a platform strategy where they’re creating one central platform and they’re creating 10 or 20 different products from it.”

Flexibility can be directly built into the chip itself. “The real use case for an eFPGA is when things like interfaces are evolving in ways that people cannot foresee,” says Yoan Dupret, managing director and vice president of business development for Menta. “It could be replacing an I2C or an SPI interface by another version, or adapting to a new sensor, which may require a new digital interface. They can save a lot in terms of time to market, and it enables them to use a single chip for a variety of products.”

Often, more functionality has to be packed into the chip. “What we are finding is that designs are becoming more multimode,” says Real Intent’s Narain. “They’re looking to extract more out of their silicon and their architecture. The designs are becoming more configurable and more power managed. You have to architecturally make the design a little more configurable so that it can operate in one mode or the other without having to actually explicitly replicate these modes. So there is increasing optimization and architectural work going on. This makes verification a larger problem.”

It also creates optimization challenges. “Pretty much every SoC today has multiple modes,” says Mo Faisal, president and CEO of Movellus. “They are all individually separate modes — sleep mode, standby mode, etc. Customers want to individually optimize each mode of the SoC, reduce power, and push the performance up across different modes. A lot of times a device will be in standby mode, and you don’t want to be using mission-mode high-performance IP. Multi-mode is an opportunity and also a challenge.”

Cost sensitivity
Some markets are cost-sensitive. Buyers will choose the system that provides the necessary performance or capability at the lowest cost. “One really effective lever in maximizing value is just picking a cheaper foundry,” says Movellus’ Faisal. “That can cut your cost by as much as 50%. Some market segments do not have this luxury, so if you’re doing very high-performance processors, then you have to be at 7 or 5nm, and that leaves you with few options. Once you have picked the foundry, you have to pick the best process. This is putting pressure on the IP ecosystem because you have to have very efficient processes to make analog IP portable.”

Design teams also need to be careful about which costs are minimized. “Consider the inclusion of embedded FPGA,” says Menta’s Dupret. “When we are talking to chip makers, their first reaction is that eFPGA takes a whole lot of area. And if they increase area, then the die cost will increase and they will have to sell at the higher price. When talking to the customers of our customers, who are integrating the chips, they often see the value because they have risk insurance inside their boards. This may enable them to get rid of extra components, such as FPGAs outside the chip. While they may pay 10 cents or 20 cents, or even $1 more for those chips, it may save them $10 on the board.”

Another aspect of cost sensitivity is a reduction in chip development costs. “From an algorithmic point-of-view, the baseband processing for a low-power edge device is essentially the same as in the access point or base station, at least for whatever modes the edge device supports,” says Dave Pursley, product management director in the Digital & Signoff Group at Cadence. “Yet the power, performance, and area requirements of those two applications are completely different. It’s different enough that usually completely different RTL IP must be developed for the two applications. High-level synthesis (HLS) allows the same IP to be reused for both applications. For the base station you would likely push the tool to create the highest performance implementation possible. On the other hand, for the edge device, power and area are your likely targets, as long as a minimum threshold of performance is met. The IP is also likely to last for several generations of products, since creating implementations for different technologies and clock speeds is a simple as a one-line change in a Tcl script.”

Significant cost reductions can come from attacking pain points in the flow. “As design complexity increases, we see an increase in the violations report volumes from 10,000 to 1 million violations, depending on the design,” says Himanshu Bhatt, senior manager for applications engineering at Synopsys. “Manual debug of these violations is a challenge, time-consuming and requires expertise and domain knowledge. From an EDA tooling perspective, traditional static verification tools are challenged by the design complexity and struggle to group similar violations and identify the root cause. The use of machine learning (ML) techniques is ideal in this area. ML-enabled root cause analysis can cluster similar violations and identify the exact root cause, which significantly reduces debug time. Debugging becomes greatly simplified, as the user only need to focus on debugging the violation clusters rather than debugging individual violations one by one.”

Shortening iteration times is also an effective way to reduce cost. “There has been increased pressure on being able to do design faster, or in a more efficient manner,” says Narain. “That is accomplished by defining a methodology that simplifies the design, and a verification process that minimizes iterations. Static signoff is really all about enabling customers to ensure that their methodology is being followed. Consider multi-mode DFT analysis. The architectural considerations that go into defining the DFT scheme happen at the test architect’s level. Once those schemes are decided upon, you want to ensure that the process is being followed all through the design creation process so that when you enter the ATPG phase, you will not have to cycle back and fix your design just because something wasn’t defined correctly.”

Conclusion
The way in which value is defined is changing. Moore’s Law meant that new designs inherently delivered more value, while improved design and verification techniques kept costs in check. Today, it is becoming increasingly difficult to add value without increasing costs. But people are stepping up to the plate and defining ways to increase values that are important to various industries. This creates challenges, but also enables an era of additional creativity that will advantage agile players who fully understand how value is perceived.



Leave a Reply


(Note: This name will be displayed publicly)