An emphasis on customization, many more packaging options, and rising costs of scaling are changing dynamics across the industry.
Several chipmakers and fabless design houses are racing against each other to develop processes and chips at the next logic nodes in 3nm and 2nm, but putting these technologies into mass production is proving both expensive and difficult.
It’s also beginning to raise questions about just how quickly those new nodes will be needed and why. Migrating to the next nodes does boost performance and reduce power and area (PPA), but it’s no longer the only way to achieve those improvements. In fact, shrinking features may be less beneficial for PPA than minimizing the movement of data across a system. Many factors and options need to be considered as devices are designed for specific applications, such as different types of advanced packaging, tighter integration of hardware and software, and a mix of processing elements to handle different data types and functions.
“As more devices become connected and more applications become available, we’re seeing exponential growth in data. We’ve also seen fundamentally different workloads, and can expect to see more changes in workloads as data and different usage models continue to evolve. This data evolution is driving changes to hardware and a different need for compute than what was historically experienced,” said Gary Patton, vice president and general manager of design enablement at Intel, during a keynote at SEMI’s recent Advanced Semiconductor Manufacturing Conference. “We absolutely need to continue to scale the technology, but that’s not going to be enough. We need to address heterogeneous integration at the system level, co-optimization of the design in the process technology, optimization between software and hardware, and importantly, continue to drive AI and novel compute techniques.”
So while transistor-level performance continues to be a factor, on the leading edge it’s just one of several. But at least for the foreseeable future, it’s also a race that the largest chipmakers are unwilling to abandon or concede. Samsung recently disclosed more details about its upcoming 3nm process, a technology based on a next-generation transistor type called a gate-all-around (GAA) FET. This month, IBM developed a 2nm chip, based on a GAA FET. Plus, TSMC is working on 3nm and 2nm, while Intel also is developing advanced processes. All of these companies are developing one type of GAA FET called a nanosheet FET, which provides better performance than today’s finFET transistors. But they are harder and more expensive to make.
Fig. 1: Planar transistors vs. finFETs vs. gate-all-around Source: Lam Research
With 3nm production expected to commence by mid-2022, and 2nm slated by 2023/2024, the industry needs to get ready for these technologies. But the landscape is confusing, and announcements about new nodes and capabilities aren’t quite what they seem. For one thing, the industry continues to use the traditional numbering scheme for different node, but the nomenclature doesn’t really reflect which company is ahead. In addition, chipmakers are moving in different directions at the so-called 3nm node, and not all 3nm technologies are alike.
The benefits are for each new node are application-specific. Chip scaling is slowing and price/performance benefits have been shrinking over the past several process nodes, and fewer companies can afford to design and manufacture products based solely on the latest nodes. On the other side of that equation, the cost of developing these processes is skyrocketing, and so is the cost of equipping a leading-edge fab. Today, Samsung and TSMC are the only two vendors capable of manufacturing chips at 7nm and 5nm.
After that, transistor structures begin to change. Samsung and TSMC are manufacturing chips at 7nm and 5nm based on today’s finFETs. Samsung will move to nanosheet FETs at 3nm. Intel is also developing GAA technology. TSMC plans to extend finFETs to 3nm, and then will migrate to nanosheet FETs at 2nm around 2024.
IBM also is developing chips using nanosheets. But the company has not manufactured its own chips for several years, and currently outsources its production to Samsung.
Scaling, confusing nodes
For decades, the IC industry has attempted to keep pace with Moore’s Law, doubling the transistor density in chips every 18 to 24 months. Acting like an on-off switch in chips, a transistor consists of a source, drain and gate. In operation, electrons flow from the source to the drain and are controlled by the gate. Some chips have billions of transistors in the same device.
Nonetheless, at an 18- to 24-month cadence, chipmakers introduce a new process technology with more transistor density, thereby lowering the cost per transistor. At this cadence, referred to as a node, chipmakers scaled the transistor specs by 0.7X, enabling the industry to deliver a 40% performance boost for the same amount of power and a 50% reduction in area. This formula enables new and faster chips with more functions.
Each node is given a numerical designation. Years ago, the node designation was based on a key transistor metric, namely gate length. “For example, the 0.5µm technology node produced a transistor with a 0.5µm gate length,” explained Nerissa Draeger, director of university engagements at Lam Research.
Over time, gate length scaling slowed, and at some point, it didn’t match the corresponding node number. “Over the years, the technology node definition has evolved, and is now considered more of a generational name rather than a measure of any key dimension,” Draeger said.
And for some time, the node numbers have become mere marketing names. For example, 5nm is the most advanced process today, but there is no agreed-upon 5nm spec. The same is true for 3nm, 2nm and so on. It’s even more confusing when vendors use different definitions for the nodes. Intel is shipping chips based on its 10nm process, which is roughly equivalent to 7nm for TSMC and Samsung.
For years vendors more or less followed by transistor scaling specs as defined by the International Technology Roadmap for Semiconductors (ITRS). In 2015, the ITRS work was halted, leaving the industry to define its own specs. In its place, IEEE implemented the International Roadmap for Devices and Systems (IRDS), which instead focuses on, among other things, continued scaling (More Moore) and advanced packaging and integration (More Than Moore).
“What remains the same is our expectation that node scaling will bring better device performance and greater power efficiency and cost less to build,” Draeger said.
That hasn’t been easy. For years, vendors developed chips using traditional planar transistors, but these structures hit the wall at 20nm a decade ago. Planar transistors still are used in chips at 28nm/22nm and above, but the industry needed a new solution. That’s why Intel introduced finFETs at 22nm in 2011. Foundries followed with finFETs at 16nm/14nm. In finFETs, the control of the current is accomplished by implementing a gate on each of the three sides of a fin.
FinFETs enabled the industry to continue with chip scaling, but they are also more complex with smaller features, causing design costs to escalate. The cost to design a “mainstream” 7nm device is $217 million, compared to $40 million for a 28nm chip, according to Handel Jones, CEO of IBS. In this case, the costs are determined two or more years after a technology reaches production.
At 7nm and below, static leakage has become problematic again, and the power and performance benefits have started to diminish. Performance increases are now somewhere in the 15% to 20% range.
On the manufacturing front, meanwhile, finFETs require more complex processes, new materials and different equipment. This in turn drives up manufacturing costs. “If you compare 45nm to 5nm, which is happening today, we see a 5X increase in wafer cost. That’s due to the number of processing steps required,” said Ben Rathsack, vice president and deputy general manager at TEL America.
Over time, fewer companies had the resources or saw the value in producing leading-edge chips. Today, GlobalFoundries, Samsung, SMIC, TSMC, UMC and Intel are manufacturing chips at 16nm/14nm. (Intel calls this 22nm). But only Samsung and TSMC are capable of manufacturing chips at 7nm and 5nm. Intel is still working on 7nm and beyond, and SMIC is working on 7nm.
Moving to nanosheets
Scaling becomes even harder at 3nm and below. Developing low-power chips that are reliable and meet spec presents some challenges. In addition, the cost to develop a mainstream 3nm chip design is a staggering $590 million, compared to $416 million for a 5nm device, according to IBS.
Then, on the manufacturing front, foundry customers can go down two different paths at 3nm, presenting them with difficult choices and various tradeoffs.
TSMC plans to extend finFETs to 3nm by shrinking the dimensions of 5nm finFETs, making the transition as seamless as possible. “TSMC’s volume ramp of 3nm finFETs is planned for Apple in Q3 2022, with high-performance computing planned for 2023,” IBS’ Jones said.
It’s a short-term strategy, though. FinFETs are approaching their practical limit when the fin width reaches 5nm, which equates to the 3nm node. The 3nm node equates to a 16nm to 18nm gate length, a 45nm gate pitch, and a 30nm metal pitch, according to the new IDRS document. In comparison, the 5nm node equates to a 18nm to 20nm gate length, a 48nm gate pitch and a 32nm metal pitch, according to the document.
Once finFETs hit the wall, chipmakers will migrate to nanosheet FETs. Samsung, for one, will move directly to nanosheet FETs at 3nm. Production is slated for the fourth quarter of 2022, according to IBS.
TSMC plans to ship nanosheets FETs at 2nm in 2024, according to IBS. Intel also is developing GAA. Several fabless design houses are working on devices at 3nm and 2nm, and companies such as Apple plan to use that technology for next-generation devices.
A nanosheet FET is an evolutionary step from a finFET. In a nanosheet, a fin from a finFET is placed on its side and is then divided into separate horizontal pieces. Each piece or sheet makes up the channels. The first nanosheet FET will likely have 3 or so sheets. A gate wraps around all of the sheets or channel.
Nanosheets implement a gate on four sides of the structure, enabling more control of the current than finFETs. “In addition to having a better gate control verses a finFET, GAA-stacked nanosheet FETs offer higher DC performance thanks to higher effective channel width,” said Sylvain Barraud, a senior integration engineer at Leti.
Nanosheet FETs have other advantages over finFETs. In finFETs, the width of the device is quantized, which impacts the flexibility of designs. In nanosheets, IC vendors have the ability to vary the widths of the sheets in the transistor. For example, a nanosheet with a wider sheet provides more drive current and performance. A narrow nanosheet has less drive current, but takes up a smaller area.
“The wide range of variable nanosheet widths provide more design flexibility, which is not possible for finFETs due to a discrete number of fins. Finally, GAA technology also proposes multiple threshold voltage flavors thanks to different workfunction metals,” Barraud said.
The first 3nm devices are starting to trickle out in the form of early test chips. At a recent event, Samsung disclosed the development of a 6T SRAM based on a 3nm nanosheet technology. The device addresses a major issue. SRAM scaling shrinks the device, but it also increases bitline (BL) resistance. In response, Samsung incorporated adaptive dual-BL and cell-power assist circuits into the SRAM.
“Gate-all-around SRAM design techniques are proposed, which improve SRAM margins more freely, in addition to power, performance, and area,” said Taejoong Song, a researcher from Samsung, in a paper. “Moreover, SRAM-assist schemes are proposed to overcome metal resistance, which maximizes the benefit of GAA devices.”
IBM, meanwhile, recently demonstrated a 2nm test chip. Based on nanosheet FETs, the device can incorporate up to 50 billion transistors. Each transistor consists of three nanosheets, each of which has width of 14nm and a height of 5nm. All told, the transistor has a 44nm contacted poly pitch with a 12nm gate length.
Still in R&D, IBM is targeting the chip for 2024. But at any node, nanosheet devices face several challenges before they move into production. “There no limit of the number of challenges,” said Mukesh Khare, vice president of hybrid cloud research at IBM. “I would say the biggest challenges include leakage. How do you reduce power? How do you improve performance in that small dimension when your sheet thickness is 5nm and in the channel length is 12nm? How do you get reasonable RC benefit in 2nm? At the end, the chip has to be superior compared to the prior node.”
Making a nanosheet FET is difficult. “In gate-all-around nanosheets/nanowires, we have to do processing underneath the structure where we can’t see, and where it’s much more challenging to measure. And that’s going to be a much more difficult transition,” said David Fried, vice president of computational products at Lam Research.
In a process flow, a nanosheet FET starts with the formation of a super-lattice structure on a substrate. An epitaxial tool deposits alternating layers of silicon-germanium (SiGe) and silicon on the substrate.
That requires extreme process control. “In-line monitoring of the thickness and composition of each Si/SiGe pair is essential,” said Lior Levin, director of product marketing at Bruker. “These parameters are key for the device performance and yield.”
The next step is to develop tiny vertical fins in the super-lattice structure. Then, inner spacers are formed. Then, the source/drain are formed, followed by the channel release process. The gate is developed, resulting in a nanosheet FET.
Fig. 2: Process flow for stacked nanosheet FETs. Source: Leti/Semiconductor Engineering
More than transistors
Still, transistor scaling is only part of the equation. And while the scaling race continues, competition is becoming equally fierce on the heterogeneous integration side. Instead of just one monolithic chip developed at a single process node, many of the most advanced architectures incorporate multiple processing elements, including some highly specialized ones, and different types of memories.
“Distributed computing is driving another trend–a growing range of architectures that are domain specific,” Intel’s Patton said. “Another trend we are seeing is domain-specific architectures that are disaggregated from the whole, mainly driven by AI and tailored for efficiency gains.”
Advanced packaging, which integrates complex dies in a package, is playing a role. “Packaging innovations are now starting to play more of a role in achieving improvements in product performance,” Patton said.
“There’s definitely more factors involved in performance, power and area from one node to another,” said Peter Greenhalgh, vice president of technology and fellow at Arm. “If the world was relying just on the fab for all of its gains, you’d be pretty disappointed. Arm provides one piece of the LEGO design. That LEGO is added to other LEGO pieces to build a really interesting chip. There are many expensive ways to do this, but there also will be some level of commoditization and harmonization.”
Concurrent with the shift toward heterogeneous architectures is the build-out of the edge — which spans everything from IoT devices to various levels of server infrastructure — as well as moves by systems companies such as Google, Alibaba, AWS and Apple to design their own hardware to optimize their particular data flow inside of enormous data centers. This has set off a frenzy of design activity that incorporates both custom and non-custom hardware, non-standard packages, and a variety of approaches such as in-memory and near-memory processing that never gained much traction in the past. It also has put a focus on how to partition processing, which components and processes need to be prioritized in a microarchitecture, and what is the optimum process node for various components based upon a particular heterogeneous design.
“A great example of that is video acceleration,” said Greenhalgh. “If you’re a cloud server company and you’re doing huge amounts of video decode and encode, you don’t want to do that on a CPU. You want to put a video accelerator in there. This is a paradigm shift.”
So there are more and different kinds of processor elements. There also are more extensions being developed for existing processor cores.
“We’ve always had the ability to extend the architecture (for ARC processors) by adding custom instructions or bolting on custom accelerators,” said Rich Collins, senior segment marketing manager at Synopsys. “What’s different now is that more and more customers are taking advantage of that. AI is a big buzzword and it means a lot of different things, but behind that term we’re seeing a lot of changes. More and more companies adding a neural network engine onto a standard processor.”
These changes are more than just technological. It also requires changes inside of chip companies, from the makeup of various engineering teams to the structure of the companies themselves.
“It used to be that you would invent a bunch of products, put them in a list in a bunch of data books, and people would try to find them,” said Shawn Slusser, senior vice president of sales, marketing and distribution at Infineon. “That is not going to work anymore because of the complexity and longevity of devices. We’re now looking at a model that is more like a superstore for semiconductors. If you want to link the real world to the digital world, everything is there in one place, including the products, the people and the expertise.”
Bigger companies have been developing this expertise in-house. This is evident in Apple’s M1 chip. The chip was developed using TSMC’s 5nm process. It incorporates Arm V8 cores, GPUs, custom microarchitectures, a neural engine, and an image signal processor, all of which is bundled together in a system-in-package. While that design may not perform as well as other chips using standard industry benchmarks, the performance and power improvements running Apple applications are readily apparent.
As of today, some 200 companies either have developed, or are currently developing accelerator chips, according to industry estimates. How many of those will survive is unknown, but the move toward disaggregation is inevitable. On the edge, there is simply too much data being generated by cars, security systems, robots, AR/VR, and even smart phones, to send everything to the cloud for processing. It takes too long and requires too much power, memory and bandwidth. Much of that data needs to be pre-processed, and the more the hardware is optimized for handling that data, the longer the battery life or lower the power costs.
This is why VC funding has been pouring money into hardware startups for the past several years. Over the next 12 to 24 months, the field is expected to narrow significantly.
“On the inferencing side, the window will start to close as companies come to market and engage with customers,” said Geoff Tate, CEO of Flex Logix. “Over the next 12 months, investors will start to get hard data to see which architectures actually win. For the last few years, it was a matter of who had the best slide deck. Customers view acceleration as a necessary evil to run a neural network model. ‘For my model, how fast will it run, how much power will it take, and how much will it cost?’ They’re going to pick the horse that’s the best in their race or for their conditions.”
Designs are changing on the cloud side, as well. In the cloud, faster processing and the ability to determine exactly where that processing happens can have a big impact on energy efficiency, the amount of real estate required, and the capacity of a data center. For example, rather than just connecting DRAM to a chip, that DRAM can be pooled among many servers, allowing workloads to be spread across more machines. That provides both more granularity for load balancing, as well as a way of spreading out heat, which in turn reduces the need for cooling and helps prolong the life of the servers.
“You’ve got tens of thousands of servers in some of these data centers, and many tens of data centers worldwide,” said Steven Woo, fellow and distinguished inventor at Rambus. “Now you have to figure out how to lash them together. There are some new technologies that will be coming out. One is DDR5, which is more power efficient. And a little further out is Compute Express Link (CXL). For a long time, the amount of memory that you could put into a server has been limited. You can only get so much in there. But with the ability to do more work in the cloud, and to rent virtual machines, there’s a much larger range of workloads. CXL gives you this ability to have a base configuration in your system, but also to expand the amount of memory bandwidth and capacity that’s available to you. So now you can suddenly support a much larger range of workloads than before.”
Conclusion
The race is still on to reach the next few process nodes. The question that remains is which companies will be willing to spend the time and money needed to develop chips at those nodes when they may achieve sufficient gains through other means.
The economics and dynamics of different markets are forcing chipmakers to assess how to best tackle market opportunities with a maximum return on investment, which in some cases may extend well beyond the cost of developing an advanced chip. There are many options for achieving different goals, and often more than one way to get there.
Related Stories
Breaking The 2nm Barrier
New interconnects and processes will be required to reach the next process nodes.
Challenges At 3/2nm
New structures, processes and yield/performance issues.
New Transistor Structures At 3nm/2nm
Gate-all-around FETs will replace finFETs, but the transition will be costly and difficult.
Big Changes In Tiny Interconnects
Below 7nm, get ready for new materials, new structures, and very different properties.
Moving To GAA FETs
Why finFETs are running out of steam, and what happens next.
The problem of cost-effectiveness is becoming increasingly critical. Making new nodes very fast (two to three years) and very expensive damages the profitability of previous nodes, which were already exponentially more expensive than the previous generation nodes. Something similar to the Apollo project. They got to the Moon and then realised it was too expensive to go to Mars. So China will catch up with them in 2025.
So much effort being made in order to stall the improvements at a linear scale while it should be obvious long time ago that optical compute and cnt’s is more than achievable as long as the industry want the leap forward. Hopefully China will push the greedy companies into making the leap!