Performance Increasingly Tied To I/O

Chipmakers look beyond processor speeds as rate of performance improvements slow.

popularity

Speeding up input and output is becoming a cornerstone for improving performance and lowering power in SoCs and ASICs, particularly as scaling processors and adding more cores produce diminishing returns.

While processors of all types continue to improve, the rate of improvement is slowing at each new node. Obtaining the expected 30% to 50% boost in performance and lower power no longer can be achieved just by shrinking features or increasing clock speeds. Moreover, for many new applications processor speeds are just one of several key performance metrics. Equally important are the speed at which data can be moved back and forth between devices, such as a smartphone and a data center, or between a processor and various types of memory.

“Speeds are going up, but not just because they are required by enterprise applications,” said Navraj Nandra, senior director of marketing for Synopsys‘ DesignWare Analog & MSIP Solutions Group. “They’re also being driving by mobile applications. But it’s not just about the processor anymore. It’s how fast you get the data to memory and I/O.”

There are several key drivers for this shift:

  • Thermal effects are becoming more problematic. After a sharp reduction in leakage current with the introduction of finFETs at 16/14nm, the issue is resurfacing with a vengeance at 10nm and 7nm. That leakage manifests itself as heat, which can cause electromigration, reduce reliability, and sometimes destroy a chip. The simplest way to address processor heat is to lower clock frequencies, but that means chipmakers have to utilize more processor cores running at lower frequencies or turn to alternative approaches. There is a limit to how many cores can be utilized, though. While some applications can take advantage of multiple cores, the vast majority cannot take advantage of many cores simultaneously.
  • Processing is becoming more distributed. Unlike PCs and smartphones, where most processing is done locally, there is a whole new wave of devices that require a mix of local and external processing. How much is done where depends on the application and the architecture, but the basic compute model is changing. Input/output (I/O) is the enabler.
  • The volume of data is rising, and so is the value of that data. But the value is different for different people, sometimes even for the same data. That puts a premium on moving data back and forth as quickly as possible using the least amount of power.

Add up all these factors and it explains why I/O is suddenly getting so much attention in new designs. But, as detailed below, I/O has its own set of issues.

Architecting For I/O
For most designs, logic and memory fill the center of a chip and I/O is on the periphery. The challenge is getting signals to and from the I/O infrastructure, a problem that is becoming more difficult as wires shrink and the distances that signals need to travel increase. Signals still need to be routed across a chip’s real estate, which in most cases is almost evenly split between memory, logic and I/O.

In chips for networking and data center applications, the I/O is almost entirely consumed by SerDes, said Prasad Subramaniam, vice president of R&D and design technology at eSilicon. “We’re working on two types of devices. One is for networking, where there is hardly any general-purpose I/O. All of the pins are high-speed, which translates into about 120 to 200 lanes of SerDes. That’s increasing every generation, too. Last year, we had 15G SerDes. Now it’s at 28G and moving to 56G—all within a matter of two or three years. The second device we’re working on is a memory interface. We’re still seeing DDR4, but not as much as high-bandwidth memory (HBM), which provides up to 2 gigabits per second per pin. If you add 24 pins and eight channels, that’s huge I/O between the ASIC and memory.”

The architectural challenge is getting data to the I/O in the first place. “The real issue is pin access,” Subramaniam said. “As you shrink the logic, there is not enough room to route the signals. You add in multiple metal layers and lots of complex circuitry. There could be a crossbar to route from different I/Os, or you may have a bus so signals need to be routed to the bus. You can address that by adding pipeline stages, but that introduces latency, which is a growing problem. The customer has to say how much latency they’re willing to tolerate.”

Those issues get worse at 10nm and 7nm. At 5nm, where quantum effects begin entering the picture, they are expected to require a rethinking of the overall chip architecture.

In other markets, this is arguably even worse because there are more I/O protocols to contend with. On the mobile side, there are Bluetooth, Zigbee, Z-Wave, WiFi, 4G, 5G, LTE, Z-Wave, WiGig. Some of these have proliferated because no protocol does everything perfectly. Some are simply next-generation standards that have taken on a life of their own. Even the data center is awash in I/O protocols for PCI Express, SerDes, Fibre Channel, and a host of others that can tap into legacy systems.

Most of these protocols are open-ended development efforts, meaning they are backward-compatible and designed for future growth. Collectively, they increase the complexity of the I/O scheme because chips need to be able to support whatever protocols are available. Consider a smartphone SoC, for example. It needs to be able to connect to a car’s infotainment system using Bluetooth, a new car’s built-in WiFi hotspot, various cellular base stations while driving along at high speed (including 3G, 4G, LTE, and in some countries 5G), and to move that call seamlessly into the home environment when the driver leaves the car and enters their house. It also has to be able to send text or images, stream videos, or carry on a video conference using the same design.

“There’s a lot of discussion about next-generation I/Os,” said Synopsys’ Nandra. “They also have to be low-power, because you can’t have watts of power per pin. And then the conversation usually heads to reliability. Whenever you design chip I/Os, you use the human body model, the charged device model (CDM) and the machine model for ESD testing. But CDM requirements are increasing because packages are getting so large, so that increases the speed requirements at the same time.”

There are other challenges, as well. “The SoC core voltage is now less than the I/O voltage,” he said. “At 7nm, the gate oxide thickness is so low that it cannot withstand high I/O voltage. So now the question is whether we can keep everything at 1.8 volts. For the I/O you want that voltage. If you make the I/O smaller, there will be a challenge in meeting the dynamic range of the I/O.”

At 5nm, quantum effects begin creeping into the picture as well, which directly affects I/O voltage. “There is pressure to design to a lower I/O voltage, but that raises the issue of radiation latch-up, which can really mess up reliability,” Nandra said. “It affects the P-N junction where you flip the device into an ‘on’ and ‘off’ state, which traditionally has been handled by the design rules from the foundry. Those determine how to space devices.”

Complex as this has become, at least this is understood in devices such as smartphones. In other markets there is less certainty about how devices ultimately will be used and what kind of connectivity will be required or deployed. Consider a piece of commercial machinery, for example, that has never been connected to anything electronically. Adding connectivity for remote diagnostics and on-site alerts also requires the ability to update those protocols over time, and at this point it isn’t clear which protocols will become widespread.

Growing value of data
Along with these technology challenges, there is another fundamental shift underway that affects I/O—the increasing value of data to many companies and people within those companies. This is more than just mining the data for marketing trends and anomalies. In some cases, it’s the core of the business itself.

“With e-commerce sites, faster response means you can sell more items,” said Steven Woo, distinguished inventor and vice president of solutions marketing at Rambus. “But you can’t ship all of this data to the cloud, so there is more computing being done at the edge and you transmit only the data that is necessary.”

The problem is that compute architectures are not optimized for this. For decades, the common approach has been to speed up the processor and add fixed amounts of memory. The focus is now focused on customized hardware acceleration and customized amounts of memory to facilitate more distributed data processing.

“We’re now in the post-Moore era, because Moore’s Law is not working anymore for modern scaling,” said Woo. “The growth of digital data is far faster than ever before, and the ability to analyze that data or search through that data is different than what existing architectures were built to do. You need to change the memory, the I/O and the overall architecture, and those are all active areas of research.”

There also are a number of new memory technologies under investigation to fill the gap between DRAM and SRAM to help eliminate some of these data bottlenecks. The best known are 3D-XPoint, ReRAM and MRAM, but there are many others. Woo expects one or two of them will succeed.

More I/O issues ahead
Along with all of these issues and shifts, I/O is being added to chips that never supported I/O in the past, such as microcontrollers.

“From a cost perspective, the ability to turn out designs using MCUs provides an advantage in certain markets,” said Drew Wingard, chief technology officer at Sonics. “But there is a disadvantage if you have to use proven wireless solutions. Some have a much higher data rate than what you can support on these chips, and the infrastructure is not friendly to high bandwidth interfaces. So you tend to do some of this in software even though the I/O itself is done in hardware. That brings up concerns about security because there are more ways of attacking a device.”

It also raises issues for price-conscious markets about the number of protocols that need to be supported. “The IoT requires a wide variety of protocols, and there is a tendency to over-design,” Wingard said. “But you can’t over-design into a market that is cost-sensitive. You also can’t add all of this I/O to some applications because they won’t work if the battery doesn’t last for more than a week or a month or a year, depending upon the device. You can’t have interfaces dissipating power, and you have to actively power manage these things.”

One solution might be to simply reduce the number of protocols, and there seems to be at least some movement in that direction. Scott Jacobsen, Cadence‘s director of verification IP product marketing, said that Bluetooth 5.0 could well replace ZigBee, which has struggled between WiFi in the machine-to-machine space, and Bluetooth in the short-range peer-to-peer area.

“The new version of Bluetooth is lower power, higher performance, and it introduces mesh networking to address point-to-point. That means when you connect a pair of headphones to your car stereo, it’s not just limited to one device. And if a user walks into an office where there are Bluetooth printers and file storage, all devices can be registered on the mesh. In the new standard, data capacity is up 800%, speed is up 2X, and range is up 4X. That means higher signal coverage and double the speed.”

But nothing is perfect in this market because so much is in constant flux. Jacobsen noted that a lot of companies are not doing separate Bluetooth chips. As a result, they are wrestling with isolation issues when the combine multiple RF elements. There also are challenges in dealing with higher speeds of the new standard.

Conclusion
I/O will continue to dominate chip design in the future. Compute models have changed from standalone computing to connected computing as the amount of data grows beyond the capability of any single device.

In response, what used to be a rather straightforward architectural afterthought is now becoming a strategic one that can make the difference between a successful product and one that performs badly across all metrics. That importance will continue to grow at each new process node and as the IoT and Industrial IoT ramp up the importance of moving data in addition to processing it.

Related Stories
New Architectures, Approaches To Speed Up Chips
Metrics for performance are changing at 10nm and 7nm. Speed still matters, but one size doesn’t fit all.
Building Faster Chips
Why better performance is back in vogue.
Rethinking Processor Architectures
General-purpose metrics no longer apply as semiconductor industry makes a fundamental shift toward application-specific solutions.