New data flow, higher switch density and IP integration create issues across the design flow.
Cloud data centers have changed the networking topology and how data moves throughout a large data center, prompting significant changes in the architecture of the chips used to route that data and raising a whole new set of design challenges.
Cloud computing has emerged as the fast growing segment of the data center market. In fact, it is expected to grow three-fold in the next few years, and by 2021 it is forecast to account for 95% of all data center traffic, according to Cisco’s Global Cloud Index Forecast. A key part of that equation is virtualization, which allows for dynamic allocation of compute instances and workloads to keep up with the dynamic nature of cloud services.
Looked at from a different angle, more than 75% of the traffic now flows in the east-west direction, from server to server, within the datacenter. That raises the first set of issues because the traditional three-tier network topology is optimized for north-south client-server traffic, so it cannot efficiently handle this kind of data flow.
To address that dataflow shift and minimize any latency or bottlenecks, cloud data centers are moving to a leaf-spine topology, where each leaf server can access any other leaf server with a single hop through the spine.
Source: Cadence Design Systems
“In a leaf-spine topology, each leaf needs to connect to every spine switch,” said Muthukumar Vairavan, senior product marketing manager for interface IP at Cadence Design Systems. “As a result, the number of hosts that can be supported, as well as network bandwidth scaling, now become a function of number of ports and bandwidth per port of the switch equipment. The switch equipment bandwidth is primarily determined by the switch ASIC and the number of optical modules that can be fit into a single rack unit.”
The state-of-art switch ASIC today has up to 256 lanes of PAM-4 SerDes running at 56Gbps each, for a total bandwidth of 12.8Tbps. In total, the switch can support up to 32 ports 400GbE (8 lanes of 56Gbps each), he said. “But with the ever-increasing appetite for bandwidth in the hyper scale data centers, switch vendors are looking to double the silicon bandwidth to 25.6Tbps. Traditionally this has been achieved by doubling the SerDes lane speeds and the optical interconnecting forum(OIF) is working on defining 112Gbps SerDes specification, to enable this.”
The challenge at these speeds is that the channel losses are very high and the SerDes needs a lot of equalization. Sophisticated DSP techniques are used to recover the signal, which can result in significant power dissipation. To account for this, platform designs will need to move to better PCB materials and use active cables and re-timers to keep the channel losses manageable at these speeds, Vairavan suggested. “Another emerging technology is Optics-On-Board (OBO), where the optics chip is placed close to the switch ASIC on the board, hence reducing the electrical channel. OBO also offers better density and cooling by moving the optics away from the faceplate. OIF specifies many classes of SerDes specifications, like Long-Reach, Medium-Reach, Short-Reach, etc., so that the right SerDes performance/power tradeoff can be done for a particular switch configuration.”
While there are many pieces of IP needed to build one of these chips, four items typically lead the discussion – SerDes, HBM PHY, networking-class on-chip memory and TCAM.
“SerDes is required to implement high-speed off-chip communication,” said Mike Gianfagna, vice president of marketing at eSilicon. “The HBM PHY is needed to interface HBM memory stacks in a 2.5D package to the ASIC. Networking-class on-chip memory are things like two-port and pseudo two-port memories that are optimized for very high speeds, and TCAMs (content-addressable memory) are used to implement efficient network packet routing.”
There are two additional dimensions to the IP problem, Gianfagna noted. “First, silicon-proven, high-quality IP is important, but it’s not enough. The IP also must be verified to work together. Things like testability strategies, operating points and metal stack—this kind of compatibility significantly reduces integration risks. Second, the IP must be configurable with the end application in mind. That includes things like compiled memories and TCAMs to support different configuration requirements and programmable performance for the SerDes.”
Location, location, location
No single approach works for everything, though. What a systems company is looking for from its IP providers depends on the application the chip is being designed for.
“There are multiple ways to look at this,” said Manmeet Walia, product marketing manager, high speed SerDes at Synopsys. “First, at a broader level is what market you’re servicing, whether it’s enterprise, campus, datacenter — what are now being referred to as the hyperscale data centers — or telecom infrastructure. If you put them in scale, it starts with enterprise, which are the smaller data centers. This moves out into the cloud providers — the Googles, the Facebooks, the Amazons — which is what is being referred to as hyperscale data centers. Finally are the service providers, the ATTs, etc. Depending on who you’re servicing, the requirements are different from these three segments.”
A second factor involves the specific functions needed by each of those companies. “Whether it’s a CPU chipset, a GPU, some accelerator, an adapter card, switch, storage array or security, depending on what function they have, they can again have different sets of requirements,” said Walia. “Third, from the PHY perspective, it matters where they are sitting within the system, whether it’s all within a blade card, whether they’re sitting on a mezzanine card, whether it’s top of the switch rack. So where they are residing within the chassis dictates their requirements. The market overall is very fractured because it’s getting complicated.”
Beyond this, there is another set of developers talking about chiplets as they are being driven to the edge of the die sizes or reticles, Walia said. “They want to get into chiplets now, and we’re getting requirements from customers who want to do what is being referred to as a USR (ultra-short reach) SerDes. That’s another market that needs to be addressed.”
Today, the majority of networking design activity is in the cloud, much of which is is being driven by artificial intelligence and machine learning applications. “What is interesting here is that all these web companies are now trying to follow that vertical integration model where they’re even trying to do their own chipsets,” he said. “In China, whether it’s the Alibabas or Tencents or Baidus, Facebooks, or Googles, all are attempting to do their own AI chipsets. They do not want to go with the merchant silicon. So at least from an IP perspective, our business metric is not chip volumes. It’s more in terms of design starts, and absolutely that’s where we see the cloud guys driving most of our IP business.
Farzad Zarrinfar, managing director of the IP Division at Mentor, a Siemens Business, agreed. “The basic processor doesn’t cut the mustard, so we see key OEMs, leaders of search, leaders of gaming, and leaders of communication all developing their own ASICs. And in that ASIC, obviously, there is a lot depending on the application. If it is a datacenter application or if it is some kind of automotive application or IoT application, we are seeing a lot of building blocks. If you’re looking at a multi-gigabit Layer 3 switch, for example, then you have 1 Gig and 10 Gig MACs (media access controllers). Some people include with the ASIC the transceiver and SerDes and PHYs inside the chip to further increase the level of integration and minimize the cost. Some people want to keep it based on their architecture. They keep their PHYs and SerDes with a larger geometry, and they use it off chip, and they use larger geometry for SerDes. Then, for the purely digital portion, they go down to 10 and 7nm, and even moving down to 5nm finFET technologies.”
This pattern has emerged quickly over the past few years, Walia said. Initially, these companies were trying to do lower-end applications such as cameras, but now they are migrating into high-end datacenters where they’re trying to do more and more.
For these AI/ML applications, the chips there are mostly a lot of SerDes with Arm 64-bit processors. “It’s an array of Arm high end processors with SerDes around them,” Walia noted. “What they’re doing through these SerDes is enabling these cores to talk to each other at a very high speed, and then these SerDes are also talking in a box-to-box manner with other similar devices. Essentially, it’s an input and an output, but what happens in between is an instruction set that allows them to get trained over a period of time by repeatability and reading human behavior or other data, so it allows it to self-program. It allows it to learn over a period of time, which is why they need massive processing power.”
Another aspect of networking design involves density requirements. Today, these translate to integration challenges, Walia said. “Integration challenges are becoming more and more important, and we have been talking to customers that want to integrate as many as 300 lanes of SerDes, even going up to 500 in some cases. That requires us as an IP vendor to provide them a lot of these services so they can integrate these SerDes. The area needs to be very small, i.e., the beach front needs to be very narrow to put more and more SerDes along the edges of the PHYs. Also, we need to have the PHYs available in both orientations now so that design teams can optimally put them in on all four sides of the die. This is because below 28nm, TSMC does not allow us to flip the poly directions. But that means we have to have two distinct designs, two distinct layouts so that they can effectively put these SerDes on all four sides. Beyond that, we have to allow them to have multiple levels deep inside—basically two-, three-level deep SerDes going inside the IP.”
Along with this, OEMs are getting fancier with the packaging bump technology rather than going with standard C4 bumps, he noted. “Very likely they’re using interposer-type approaches, so they need to go wider, and bumps may need to have sacrificial pads or microbump technology. Essentially, a lot of bump customizations are required along with a lot of backend services. How do you put all these things together? How do you bring all these signals on the package substrate? How do you take them inside the SoC? How do you do timing closure at 1GB/second or similar speeds? All of that is a massive effort, so when we provide these IPs we have to provide a lot of guidelines around how to use them. Our motto in the past used to be that IPs are very hard to design but easy to integrate. That’s not true anymore. They are hard to design and very hard to integrate.”
Managing IP
Many of the advanced networking chipsets and ASICs being designed today target datacenters, where power, performance, and area are all critical factors. It’s also one of the most profitable segments for chipmakers.
“This is the cloud where everything is going on,” said Ranjit Adhikary, vice president of marketing at ClioSoft. “Everybody wants to keep things in the cloud, and the market is growing there. Given that, every datacenter company wants to keep their costs down, so power becomes a really important component, along with reliability. When you’re talking about next-generation networking switches, when you define a platform, you need to make sure that the I/O bandwidth and memory subsystems all can deliver on the required performance. So you’re basically doing a plug-and-play of all the IPs which you have. And you need to make sure they all use the same metal stack, reliability requirements, operating ranges, control interfaces—even the same DFT methodology. When you talk about those things right now, an IP management platform becomes a serious component because at the end of the day you want to be able to get the IPs you want. You want to find out what the parameters are, download it, and check it without having to go through a long cycle to find out if an IP works or doesn’t work or where the supporting data is for the IP.”
Finding the various IP blocks and determining whether they’ve been used anywhere else is critical. “A lot of companies will have different PDKs for different foundries, and the design team needs to know if an IP has been foundry-verified, or if another customer within the same company is using it,” said Adhikary. “You would like to find out the reliability of the IP. Specifically, has it gone into production? Have there been any problems with it? Ultimately it is a question of how well is the power, performance and area documented. What we find in most companies we go into is that the information is not all there in one place, and it becomes important for the IP management system to be tied to the document control systems so that everybody is in sync. And this is just the beginning.”
Looking ahead
As cloud service providers look beyond 25.6Tbps switches to 51.2Tbps, it seems unlikely that traditional scaling techniques will suffice.
“Switch ASICs are implemented in the advanced technology nodes to leverage the increased density and lower power, but are hitting reticle and yield limits,” Walia said. “One option that is rapidly gaining popularity is die disaggregation, where a large monolithic die is broken up into manageable die sizes and then integrated inside a multi-chip module (MCM) package using a high-speed, low-power in-package interconnect. The other option is to separate the logic die and SerDes and put them together in an MCM. Greater optical density needed for 51.2Tbps can be achieved by shifting to in-package optics, where optical dies are integrated with switch ASIC dies in an MCM for a heterogenous system-in-package (SiP). The fibers can then be traced out to the faceplate or the pigtails.”
Switch ASICs and high-speed optical and electrical interconnects are the backbone of cloud-era datacenters. To meet the explosive growth in bandwidth in these datacenters, switch ASIC makers, optical and interconnect IP vendors and standards bodies will need to build ecosystems. The future revolves around heterogeneous solutions that will meet new performance, power and scalability requirements, tailored for specific applications and unique data flows.
Related Stories
Machine Learning Shifts More Work To FPGAs, SoCs
SoC bandwidth, integration expand as data centers use more FPGAs for machine learning.
Data Center Power Poised To Rise
Shift to cloud model has kept power consumption in check, but that benefit may have run its course.
M2M’s Network Impact
Why new architectures are required as machines begin talking to machines.
Leave a Reply