Part 2: Heterogeneity and architectures become focus as scaling benefits shrink; IP availability may be problematic.
Pushing to the next process node will continue to be a primary driver for some chips—CPUs, FPGAs and some ASICS—but for many applications that approach is becoming less relevant as a metric for progress.
Behind this change is a transition from using customized software with generic hardware, to a mix of specialized, heterogeneous hardware that can achieve better performance with less energy. Over the past decade, the trend has been to add more functionality into software because it is easier to fix and update. But that approach is slower, uses more power, and it is less secure. And because there is no longer an automatic improvement in power and performance with each new process node, chipmakers are stepping back from trying to do everything in software.
This is evident in a variety of applications, but none more so than in a data center, where the need for performance has been synonymous with . “Moore’s Law is slowing down,” said Kushagra Vaid, general manager and distinguished engineer for Microsoft’s Azure infrastructure. “CPU releases are slowing down. There is a von Neumann bottleneck with caches and multicore. This fundamental design is running out of steam. Performance per watt is being challenged, and the cost per transistor is increasing. In the cloud, there are diverged workloads, and they don’t run efficiently on a general-purpose CPU.”
Rather than relying just on hardware or just on software, the industry is shifting toward software-defined hardware. This has several major implications:
This has led to changes in design strategies within chip companies and systems companies.
“Many companies will determine their software needs, and then choose their processor,” said Bill Neifert, senior director of market development at ARM. “But what we’re seeing is that what they initially think they need is often different from what they actually need, so they end up choosing a different processor.”
The key metric used in making these decisions is performance, which is ironic given that one of ARM’s main differentiators is low power. But behind this is a concurrent shift to smaller, application-specific processors where low power is a given. “The people making these decisions are often not looking at the super high end of monolithic processors,” Neifert said. “They’re looking at chips like advanced microcontrollers. And then they go back and modify the software to take better advantage of the processor. The trend is toward smaller processors to tackle more specified tasks using more specified software. Software is more important, but there is not one processor that has to run any one of 100 possible loads. Instead, it may have to do three or four things well.”
That view is being echoed across the semiconductor industry. “What you’re starting to see are different architectures for different workloads,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “There will be chips for image recognition, SQL, machine learning acceleration. There will be different chips for different workloads, and those workloads will be used to design chips.”
More markets, more options
Underlying these developments is a broad shift in the market for semiconductors. There is no new single platform that will drive sales of 1 billion or more chips based on a single SoC design. Apple and Samsung have split the high end of the smart phone market, with creeping erosion of their positions by companies such as Huawei, Oppo, Vivo and Xiaomi, among others.
Fig. 1: Market share of smartphone vendors. Source: Statista.
This does not point to a shrinking of demand for semiconductors. Far from it, in fact. The market for semiconductors is looking quite robust—particularly in terms of volume. But there is no single platform generating the same kinds of single and/or derivative design volumes as smart phones, and within an array of new markets the reasons for moving to the next process node are less obvious.
So rather than create a single roadmap for semiconductors, which is what the International Technology Roadmap for Semiconductors managed until last year, the IEEE is breaking that down into a number of specific market areas under the moniker of the International Roadmap for Devices and Systems (IRDS). Among the focus areas so far are such things as big data analytics, feature recognition, optimization for self-driving cars, and graphics for virtual and augmented reality.
“What we are creating is much more application-driven,” said Tom Conte, co-chair of IEEE’s Rebooting Computing Initiative and a professor of computer science and electrical and computing engineering at Georgia Tech. “All of the focus teams will create mini roadmaps.” He noted that parallel to this effort, Japan is developing its own application-driven semiconductor roadmap called the System & Device Roadmap of Japan.
This is especially important outside of mobility, which will continue to push for greater density.
“For some applications, like mobile and infrastructure, they have to drive the performance,” said Lip-Bu Tan, president and CEO of Cadence. “They are racing down from 10nm to 7nm, and they will move to 5nm. But the challenge is that a 2X improvement in performance, power and price—that scaling is slowing down. Cost is definitely going up. You’re not seeing the huge performance and power difference, so other companies may stay at 16nm because there is no compelling reason to move to 7nm. And then some will skip the node. It depends on when the product will come out, the development cycle, and the delta in performance, power and cost. It takes time to improve yield. Some companies are also moving into new packaging approaches and adding parallelism. It’s not just pure computing. There are multiple choices for how to achieve the same goals.”
IP constraints
Compounding the difficulty of moving to the next process node is the availability of IP. Developing IP at the most advanced process nodes is expensive, and returns are uncertain. For one thing, every foundry’s process is different enough below 40nm that it’s a burden for IP vendors to keep up with all of the different possible nodes and implementations at the same node. For another, even within a single foundry’s process, the development process at the most advanced nodes is so complex that IP vendors complain they are often starting well below version 1.0—sometimes version 0.1 or earlier.
“You need ultra-high-performance IP, whether that’s SerDes blocks or interfaces, and you need to figure out what is qualified on what process,” said Mike Gianfagna, vice president of marketing at eSilicon. “This is driving part of the decision for scaling. It has to be proven IP. In a perfect world, this would be a 1.0 PDK, but the reality is that you start developing these at 0.5 at best, and struggle to get to 1.0. So now you have companies skipping nodes because the manpower to do a finFET is astronomical. You need compute power, EDA licenses, storage, and if you do a chip for one finFET node there is not a lot of re-use at the next node because you have to optimize for power and signal integrity.”
That creates a nightmare on the IP management side, as well. “Finding IP is only one part of the problem,” said Ranjit Adhikary, vice president of marketing at ClioSoft. “Integrating it adds a whole different set of problems. You want to know what IP, what flows, what foundries, and often you have no idea what has been taped out.”
At 10nm or 7nm, IP has to be taped out before it can even be seriously considered, because not everything prints as expected. “You also need a comparison of different versions of the IP,” said Adhikary. “So you may have version 1.0 and version 1.1. You need to be able to make that comparison.”
Complexity and uncertainty
That carries over into the SoC world, as well, where there is a growing emphasis on bigger integrated blocks and subsystems rather than individual IP blocks.
“Modern SoC projects are built of different types of interconnected, scalable subsystems,” said Zibi Zalewski, general manager of Aldec‘s Hardware Division. “Final configurations are created based on the target market, or the customer’s requirements, and the scalability of subsystems allows them to grow in size and complexity very quickly. So it is not a problem to scale from dual to quad core those days, for example. But it may be an issue to catch up with the proper tools. In addition, the hardware part of the project is no longer the dominating element. The software layer is adding significant complexity to the project. So it’s not just about transistor number counts. It’s also the target function.”
There has always been a substantial amount of uncertainty with each new node, and most chipmakers working at these nodes accept that as one of the challenges of working at the leading edge. But two important things have changed. First, there are more factors in flux at each new node and more things that can go wrong because of that. Second, the markets themselves are in transition because most of the big opportunities in the future are in new areas, unlike in the past, where there was an evolutionary path to follow from the mainframe to the PC to the smartphone/tablet/phablet. That makes obsolescence a big problem, which is obvious today in cars that are several years old because they don’t support texting or search using 4G phones.
Obsolescence is costly, which is one of the driving factors behind IEEE’s application-specific roadmaps. Software initially was one way to deal with that, and it will continue to play a role because it’s easier to modify software than hardware. But FPGAs are growing in popularity, as well, because as the name indicates, they can be programmed in the field.
This is particularly important because many of the future areas of growth for semiconductors are in markets that are rapidly changing, such as autonomous vehicles, medical and industrial electronics, and artificial intelligence. “There are a lot of protocols and interfaces that are either changing or not fully defined yet,” said Kent Orthner, system architect at Achronix. “This is clear with CCIX (Cache Coherent Interconnect for Accelerators), which adds cache coherency over PCIe. That simplifies programming and there is a lot of interest in it, but the specs are not fixed. So for companies that want to tape out now, they want to add some programmability. It’s the same for cars. You want to get technology into a car now, but the algorithms are so new that it’s scary to put something into an ASIC.”
Dealing with the data flood
Another change that affects Moore’s Law is an explosion of data. Ever since the introduction of the PC there has been debate over whether to centralize or distribute data. While some of those debates were political in nature—pitting IT departments against mobile users, and big-iron companies and their ecosystems against mobile device makers and their ecosystems—those arguments are now largely moot. The sheer volume of data makes it much more efficient to process some of the data locally and move only a subset of that data. In effect, the processing moves closer to the data, with the chips optimized for certain types of data, rather than the other way around.
“This is forcing paradigm changes,” said Steven Woo, distinguished inventor and vice president of marketing solutions at Rambus. “Moore’s Law is not working anymore for modern scaling. The growth of digital data is growing far faster than processing capabilities. It would be okay if everything was doubling every two years. And if you want to analyze that data or search through it, it’s different than what they architecture was built to do.”
One such change involves rethinking how much data really needs to be shipped to memory, and how much can be stored locally. “There is spatial locality and temporal locality for data,” said Kurt Shuler, vice president of marketing at ArterisIP. “When you add cache, you take advantage of both. The key is using real estate more intelligently.”
So rather than send everything through to memory, multiple caches and proxy caches can shortcut the flow of data from accelerator chips to different devices. While this is technically still a von Neumann architectural approach, it’s a much more fine-grained version of it. The big difference is that the starting point is the data and follows how it moves, rather than relying just on a centralized chip architecture to handle everything. In effect, it puts the burden on the architecture, which is defined by the software, rather than the speed or process geometry of any single chip.
Fig. 2: Von Neumann architecture. Source: Semiconductor Engineering
Security
A new factor that comes into play across all of this is security. It’s harder to hack hardware than software, because if it’s architected correctly it requires a device to be physically present. Software, on the other hand, is easier to breach remotely. Ultimately this makes a stronger case for putting more functionality across multiple hardware components. But it also adds to the cost, which so far has limited its adoption.
“There are all sorts of technologies available to us that we have for designing chips that are more secure,” Wally Rhines, chairman and CEO of Mentor, a Siemens Business, said during a recent CEO panel discussion at the Electronic System Design Alliance. “The problem is that people who design those chips and build them and sell them really don’t want to pay a lot for that capability. My forecast is that sooner or later we’re going to have an embedded Trojan in a chip that causes someone to lose a lot of money, or causes physical harm, and then the purchasers of chips will come to their semiconductor suppliers and say, ‘Oh, by the way, would you mind adding this sentence to the purchase agreement that there are no embedded Trojans within the chips your selling us?’ And then you’ll go back to the lawyers and say, ‘It that okay to add that in?’ They’ll say, ‘Absolutely not.’ And then we’re going to get into a mode of what is best in class and what are people willing to pay for, and it will become a big part of what you design into an integrated circuit, like power analysis.”
Aart de Geus, chairman and co-CEO of Synopsys, agreed. “It’s a very complex problem,” he said. “There are issues on the hardware side and on the software side, but the biggest vulnerabilities are sitting at the intersections. These are the least understood and they’re new. And if you look at the more flamboyant hacks, such as the hack of the Jeep where they came in through the infotainment system, these are very sophisticated. The solution will be multifold. One is that we systematically build this into things. While it may not be secure, at least it will live up to regulations of security. Regulation is the partial answer. Second, it needs to be secure by construction. To do this after the fact is a hopeless in an enormous state machine. You’ll never find those issues. In our case, we have invested substantially in six or seven companies to build security into the software, or to detect what we can detect automatically. But gradually the expectation will rise, and you will need to standardize to a certain degree.”
Still, security adds yet another element to software-driven design that needs to be factored in.
“Security is an aspect of workload, too,” said NetSpeed’s Mohandass. “If you have 10 operating systems running, they should not have any knowledge of each other.”
Automation tools
What is evident is there are many routes to the same goal from an engineering perspective. And while Moore’s Law is often synonymous with shrinking features, the underpinnings are economic. The goal is smaller, faster and cheaper, but without the cheaper part the other two factors would never happen.
As scaling continues to slow, the real challenge is continuing the economic benefits of Moore’s Law, and that’s where EDA companies see a big opportunity.
“Small architectural changes can make dramatic changes in performance and power consumption, and this is where tools like high-level synthesis can make a difference,” said Dave Kelf, vice president of marketing at OneSpin Solutions. “Tools like that shape the design cycle so you can spend more time getting better power and performance out of a design. You also can take a design that’s already done and iterate more quickly. You’re stuck with a schedule, but you can shrink the design cycle and verification, which gives you the equivalent of a node of improvement.”
That kind of gain can delay the need to move to the next node every two years. On the flip side, faster tooling and better training on those tools can make a dent in how much time, and therefore money, is spent on the design side.
Conclusion
Moore’s Law is alive and well in one respect. From a digital logic standpoint, it is possible to continue shrinking devices at least to 5nm, and perhaps beyond. But it is getting harder, more expensive, and the benefits don’t match up well with many market segments.
Increasingly, there solutions are being designed for specific markets using a variety of components that are heterogeneous, software-defined, and much better suited for different tasks. The era of one-size-fits-all is coming to a close, and that is making broad statements about semiconductor development far less relevant.
Related Stories
Moore’s Law: A Status Report (Part 1)
The ability to shrink devices will continue for at least four more nodes as EUV begins to ramp, but it’s just one of a growing number of options.
Custom Hardware Thriving
Predictions about software-driven design with commoditized IoT hardware were wrong.
Bridging Hardware And Software
Part 1: Different goals and methodologies have long divided hardware and software engineering teams. Some companies have solved these issues, others are working on them.
New Architectures, Approaches To Speed Up Chips
Metrics for performance are changing at 10nm and 7nm. Speed still matters, but one size doesn’t fit all.
RISC V ISA based low cost smaller area CPU cores with custom features and specific SW to run on them could be future trend for specific applications.
Almost all specialized applications in almost all fields (which are not consumer centric) are moving towards (on the fly) customizable hardware to some extent since they are programmable while maintaining performance. This trend is going to increase. Only the extremely cost sensitive consumer centric devices are going to be driven by Moore’s law till the very end when the law itself becomes obsolete.