Increasing Performance With Data Acceleration

In-line acceleration boosts performance for radio access networks.


Increasing demand for functions that require a relatively high level of acceleration per unit of data is providing a foothold for in-line accelerator cards, which could mean new opportunities for some vendors and a potential threat for others.

For years, either CPUs, or CPUs with FPGA accelerators, met most market needs. But the rapid increase in the volume of data everywhere, coupled with the need to process it in multiple places — at the end point, the edge, and in the cloud — has made faster movement of that data a priority. The big questions now are how to best approach this problem, and whether new technology will displace or replace existing solutions.

One approach is hardware acceleration, or the use of more silicon to accelerate data processing and relieve some of that burden from the CPUs. Advocates of this approach have noted that Moore’s Law cannot accurately predict the evolution of computing in the cloud era, meaning CPUs cannot scale as effectively as hyperscalers and network operators want them to. If this is true, it poses a challenge to the company Gordon Moore co-founded. Today, Intel supplies many of the CPUs running the servers that handle cloud workloads, and it dominates the emerging virtualized broadband networks.

Intel says its CPUs and FPGAs are fully capable of accelerating data as needed, and the chip giant has yet to introduce an in-line accelerator card to process data before it reaches the CPU. Meanwhile, many of Intel’s competitors are charging into this market, claiming off-the-shelf servers need some help with certain workloads.

In-line accelerators are included in the assets AMD acquired through its recent $49 billion purchase of Xilinx. Jamon Bowen, AECG DCCG product planning director at AMD, said the company is targeting network virtualization offload in radio access networks, as well as other complex use cases such as high-performance computing, security offload, financial trading, and video encoding.

While in-line acceleration is not new, Bowen noted that custom in-line processing traditionally has driven demand for FPGAs. “It is new for us to have standard products working with more defined software frameworks,” he said.

Radio Access Networks may need hardware acceleration to evolve
Virtualized radio access networks are one of the primary use cases for in-line acceleration. The radio RAN has lagged behind telco core networks when it comes to virtualization, largely because of the compute-intensive Layer 1 functionality. A number of traditional network equipment vendors, as well as new market entrants, are working on software that can replace network appliances, often with standardized open interfaces.

Stefan Pongratz, vice president and industry analyst at Dell’Oro Group, said RAN virtualization has proven successful for some narrowband deployments. “We have not seen that same success extended to wideband. The performance gap is significant there, and so is the cost.” He predicts RAN virtualization in 5G will be “more similar to traditional RAN, which uses optimized dedicated silicon.”

Marvell, Qualcomm and AMD are bringing in-line accelerator cards to market with telco and cloud customers in mind. AMD is positioning these devices as data center workhorses, which include FPGAs as part of the design. Marvell and Qualcomm, meanwhile, are directly targeting wireless network operators with ASICs.

These developments are consistent with efforts in other markets, where an emphasis on performance and low power are prompting chipmakers to develop more customized solutions. So rather than just a CPU or GPU, many of these chips are being design with a variety of processing elements, including some that are specific to an application.

For baseband processing, the math is far too complicated for a general-purpose CPU, said Joel Brand, senior director of product marketing at Marvell. The company’s baseband processor line includes a suite of 5G Layer 1 inline hardware accelerators, along with an Arm Neoverse core.

Fig. 1: 5G complexity at the physical layer of the network stack. Source: Marvell

Fig. 1: 5G complexity at the physical layer of the network stack. Source: Marvell

“Think of this as a 5G NIC,” said Peter Carson, Marvell senior director for solutions marketing. “It’s similar to what has been used for years on smart NICs in the cloud. What’s new to the cloud implementation is the Layer 1 5G baseband.”

Qualcomm also introduced a PCIe inline accelerator card, an ASIC designed to boost performance of cloud-native and virtualized 5G network deployments by offloading compute-intensive 5G baseband processing from server CPUs.

Gerardo Giaretta, senior director of product management at Qualcomm, described the card in a press briefing ahead of Mobile World Congress. “The main brain from a processing perspective is our Qualcomm distributed unit chipset that is inside the card,” he said, describing it as an SoC that combines DSPs, ARM CPUs, and an integrated front-haul NIC functionality. (Front-haul is the connection between the baseband and the radio.)

“In order to maintain a decent level of cost and power for the solution, you need these accelerators,” said Giaretta. “There are cases where you go to higher-order massive MIMO, where the difference is significant.” He noted that a dense processing solution is also helpful to network operators when they choose to pool baseband units in a central location.

“Qualcomm is a new entrant to the sub-6 GHz macro silicon domain,” said Dell’Oro’s Pongratz. “Marvell’s baseband silicon is proven in the field.”

In fact, Marvell claims five customer wins so far with network operators and cloud infrastructure providers. The only customer the chipmaker has been cleared to name so far is Dell. Marvell has partnered with Dell to integrate its accelerator into server hardware.

Qualcomm, meanwhile, has teamed up with HPE. Geetha Ram, worldwide head of Open RAN at HPE, described her company’s work with Qualcomm as strategically important. “When you combine the … low power, high performance card with a server that is edge-suited, which supports a very high density, low power consumption, it’s kind of like a marriage made in heaven,” she said during a recent presentation.

Ram said HPE already has deployed several thousand customer RAN sites using a lookaside (pre-calculated cache) architecture, where Layer 1 functionality is split between server CPUs and PCIe cards, and network connectivity is handled by a separate card. Her team compared performance, price and power consumption of these deployments against inline acceleration, running distributed RAN and centralized RAN traffic scenarios and using the same server CPU types in all scenarios. The research indicated total cost savings could approach 60% with inline acceleration, she said.

Marvell and Dell also are promising benefits for network operators with their combined solution. Their PCIe card is designed to turn off-the-shelf servers into virtual distributed units (vDUs) in operator networks. It includes a front-haul NIC, and handles timing, synchronization, beamforming, and all other vDU Layer 1 functions.

Andrew Vaz, vice president of product management in Dell’s telecom systems business unit, said Layer 1 functionality currently consumes up to two-thirds of the CPU’s processing in a typical vDU. Freeing up the CPU will give telcos and cloud service providers a way to use servers for purposes unrelated to the RAN, he said.

What’s next?
As enterprises invest in private cellular networks, virtual RAN servers could be prime candidates for edge compute workloads, if they have the capacity. Inline acceleration could be one way to accomplish this. Not only does each card offload a significant amount of processing from the CPU, but multiple cards can be mapped to a single CPU, Vaz said.

“You functionally have freed up a large amount of performance capability on those servers with these offload cards to go and really offer some very interesting architectures and massive cost savings,” Vaz told reporters.

Some of those cost savings would be realized by deploying fewer servers. Intel, which has dominated the virtual RAN space with its x86 Flex RAN solution, argues this is the wrong approach. While the company has products that support inline acceleration, it considers inline accelerator cards “inflexible” and contends that virtualized radio access network should accelerate workloads as needed.

“Our approach is to take into account the entire system to determine what enhancements to build into the CPU instruction sets, and where to place the acceleration, so that customers get the best combination of power, performance, and flexibility,” said Sachin Katti, CTO of Intel’s network and edge group and co-chair of the O-RAN Alliance’s technical steering board. “For our chips with integrated acceleration, we are integrating on-die FEC acceleration with Xeon cores that have 5G-specific instructions to simultaneously deliver the benefits of inline acceleration with the flexibility of general-purpose CPUs. These options provide flexibility to our customers in how they map the right acceleration hardware to the right part of the vRAN workload, and thus best meets the needs of the operators versus a solution that forces an entire Layer 1 into an inflexible hardware accelerator.”

Intel, which bought FPGA maker Altera in 2015, advocates the use of FPGA-based acceleration, claiming programmability increases agility for network operators. The company validates and qualifies FPGAs offered by various OEMs. One of the big advantages of programmable logic is that it can evolve as acceleration needs change, or as new protocols or technologies are introduced or updated. The company also has created an open FPGA stack, delivered via Git repositories.

So far, Intel has inked more than a dozen partnerships for FPGA-based accelerator solutions. Affirmed Networks (owned by Microsoft), Altiostar (soon to be owned by Rakuten), Algo Logic, Arrive Technologies, Benu Networks, Bigstream, Cast, CTAccel, F5, Juniper Networks, Megh Computing,, Napatech and rENIAC all support workload acceleration via Intel FPGAs.

Intel also has broad OEM support from major server manufacturers, including those that have partnered with makers of inline accelerator cards. Dell and HPE both build servers that use Intel’s FPGA-based acceleration cards, as do Fujitsu, Quanta Cloud Technology, Supermicro, Inspur and Kontron.

Intel says its FPGAs can be used in radio access networks to perform lookaside acceleration for forward error correction. The company noted that the same FPGA that performs forward error correction can be used to perform inline fronthaul compression in the RAN, as well as virtual switching and routing in the network core. Other uses of the card include inline acceleration of crypto functions on IPSec and DDOS mitigation.

Still, operators may not get maximum performance from virtualized networks using existing solutions. HPE’s Geetha Ram said queries from network operators asking how to boost hardware performance led to her company’s decision to partner with Qualcomm on an inline accelerator card. Likewise, Dell pointed to better performance as the leading benefit that operators could expect from integrating Marvell’s inline accelerator.

“This is a step in the right direction that will hopefully address some of the shortcomings,” Dell’Oro’s Pongratz said of the new dedicated acceleration hardware. He added that if operators want to deploy virtualized massive MIMO, some customized hardware will be necessary.

Open questions
Opening of network interfaces is a new wrinkle in the tug-of-war between Intel and its new vRAN rivals. Intel’s hardware and software are present in most virtualized environments now, and they will continue to play a central role for many. But if chipmakers and their OEM partners continue to deliver solutions with open interfaces, inline accelerator cards will become one more option in an increasingly heterogeneous hardware world.

Leave a Reply

(Note: This name will be displayed publicly)