Will Co-Packaged Optics Replace Pluggables?

New options open the door to much faster and more reliable systems.

popularity

As optical connections work their way deeper into the data center, a debate is underway. Is it better to use pluggable optical modules or to embed lasers deep into advanced packages? There are issues of convenience, power, and reliability driving the discussion, and an eventual winner isn’t clear yet.

“The industry is definitely embracing co-packaged optics,” said James Pond, principal product manager for photonics at Ansys. “The reason it’s going to come is not necessarily to replace data center transceivers, but to enable everything else that people want to be able to do.”

Pluggable optics are the default today based on their use for long-haul optics. Co-packaged optics (CPO) hold some promise, but that technology needs to overcome reliability concerns before it can be fully embraced. Once it achieves commercial status, it may usher in new applications rather than displacing pluggables.

For many years, the predominant application for optics has been for long-haul communications. “Optical transceivers probably wouldn’t have taken off in the same way without something like pluggable transceivers,” said Pond.

Pluggable transceivers typically use laser light generated by VCSELs and embedded into modules. The form factors for those modules have evolved, but the inherent convenience of plugging in a fiber connection hasn’t changed.

“Pluggables have been very, very successful,” said Pond. “The reason is that they’ve been highly modular. As long as they meet the specifications for those communications standards that you’re targeting, you can swap them in and out from any vendor.”

Fig. 1: A quad small-form-factor pluggable (QSFP) module. Source: Jesse Schulman/CC BY-SA 3.0

Fig. 1: A quad small-form-factor pluggable (QSFP) module. Source: Jesse Schulman/CC BY-SA 3.0

Data centers are evolving, however, with increasing bandwidth demands. In addition, there’s a growing need to move data not just within a rack, but across the entire extent of a data center — or even to another data center to improve scalability. Optics are attractive for this application, but the distances are far shorter than what’s required for long-haul communications.

Data center power is always a concern, and data movement is a strong contributor to that power. “Data centers have been growing at a rate of 50% or more per year,” observed Andy Bechtolsheim, chairman and chief development officer at Arista Networks, in a presentation at this year’s Hot Interconnects conference. “And the network power per bit has declined at roughly half that rate, which does imply that the amount of power being used up at the network, including the optics, is growing at an unhealthy rate.”

This has helped to generate a debate on the best way to provide laser power. By default, pluggable optics are the incumbent technology. And so-called “on-board optics” (OBO) are a first step toward moving optics deeper into a server or other system. But integrating a laser source inside a chip package has fired imaginations, and now it’s getting a hard look by developers.

Getting signals in to where they’re needed
With pluggable transceivers, an optics module can be plugged into the side of the system, and that’s pretty much all that’s needed. The laser source is in the module, so it’s not part of the computing system. “If one of them breaks, you just unplug it and plug in a new one,” said Pond.

But the place where the signal is plugged isn’t where the signal is used for computing. “The downside of the pluggable transceiver is that you have to electrically communicate from the chip or board up to the pluggable transceiver,” explained Pond. Given today’s speeds, that connection is typically an electrical SerDes link.

“You may also have retiming issues, depending on what the standard is for the pluggable transceiver,” Pond said. “So you’re starting to consume a lot of power in that process, and as you go to higher speeds you have more RF losses just getting to the front plate of the pluggable transceiver.”

With CPO, the laser source can be placed inside the same package as critical computing electronics. I/O chiplets can serve to convert received laser light into electrical signals, or to drive optical signals out based on electrical inputs that need to move an exceedingly short distance from the neighboring ASIC.

“You bring the fiber completely into the system, very close to the switch ASIC,” explained Twan Korthorst, director of photonic solutions at Synopsys. “And you create optical I/O chiplets close to the switch ASIC that receive the optical connection.”

In this scenario, the fiber drives all the way into the server, eliminating the need for the electrical SerDes connection. That holds the promise of lower overall system power while still contributing to increased bandwidth.

Fig. 2: On the top, a QSFP module plugs into the edge of the system, where the signals are converted to electrical and sent to the ASIC over a SerDes link. The bottom shows a co-packaged option, where the fiber runs all the way to the advanced package, where the I/O chiplet makes the electrical conversion for immediate delivery to the adjacent ASIC. Source: Bryon Moyer/Semiconductor Engineering

Fig. 2: On the top, a QSFP module plugs into the edge of the system, where the signals are converted to electrical and sent to the ASIC over a SerDes link. The bottom shows a co-packaged option, where the fiber runs all the way to the advanced package, where the I/O chiplet makes the electrical conversion for immediate delivery to the adjacent ASIC. Source: Bryon Moyer/Semiconductor Engineering

But because such a laser is buried deep within a package that’s mounted on a board, a failed laser means replacing the entire board — an expensive prospect compared with the relative ease of switching out a pluggable module.

Generating laser light inside a package
There are a variety of options for generating laser light, but none of them involves silicon. While silicon can be used to direct and modulate laser light, it is an indirect-bandgap material, which means that it can’t readily create laser light. Some other material must be used — typically a III-V substance like indium phosphide (although not all III-V materials have direct bandgaps).

“People want on-chip lasers, but silicon can’t bring in light,” said Korthorst. “So you have to have some heterogeneous-integration solution where you can put material on silicon.”

Bulk semiconductor sources like VCSELs are giving way to smaller sources like quantum wells and quantum dots. “Quantum wells have higher gain and higher power output [than quantum dots],” said Jigesh Patel, technical marketing manager at Synopsys. “But if you are doing coherent modulation, then you don’t need that much power.”

Quantum dots are proving to be more robust and temperature-stable, but they’re less efficient. While the power delivered by of all of these is lower than that from long-haul lasers, that’s not considered a problem. “Distances are completely different when you go from one side of the data center to the other side or from one data center to another,” said Korthorst. “So you can live with relaxed optical power budgets.”

Long-term, there are research projects trying to determine whether materials added to silicon can get it to lase. “People are investigating if they can find a way to lase light from silicon by doping with germanium or erbium,” said Patel. “But I haven’t seen any big deployment or commercial interest.”

Another option is hetero-epitaxy, where the lasing material is literally grown atop silicon to create a quasi-monolithic structure. “By first creating a small edge trench with a certain shape, you can relax mechanical stresses when you have these mismatched materials growing on top of each other,” said Korthorst. Again, this is still a research topic.

Power considerations
A significant consumer of energy is the generation of laser light itself. For long-haul applications, that power needs to be high enough to travel a long distance.

“The original motivation behind CPO was to reduce electrical power by changing the electrical interface from a high-powered service, like LR [long reach], which is the standard service on chips today, to a much lower-power service that would be just strong enough to drive the CPO on the multi-chip carrier, and then go optical from that chip to the front panel,” said Bechtolsheim.

An attempt was made to define an “extremely (or extra) short-reach,” or “XSR,” standard that would need about 20% lower laser power. “This actually didn’t happen,” he continued. “The service would have required a full dedicated five-nanometer tape-out for both the switch chips and the DSPs. And there just hasn’t been the economics to make that happen.”

Instead, a very-short-reach (VSR) approach was taken for a 15% power savings, with a 3-nm development further reducing power below the prior 7-nm generation. The move to coherent modulation holds the promise of further lowering power.

“The power situation is improving by roughly a factor of two for every two process generations,” said Bechtolsheim. “And this is true for the optics as well.”

But these power improvements would apply whether pluggable or CPO were used. So, while these are useful developments for lowering data center power, they don’t necessarily move the needle more towards one or the other solution.

The one beneficial power change that only CPO brings is the elimination of the SerDes link. “That SerDes uses a lot of electrical power,” noted Korthorst.

Energy consumption grows with speed, although moves to more aggressive silicon nodes can improve things. Removing that link should give CPO the nod from an overall system power standpoint.

One way of addressing the modularity while co-packaging the transceiver is to move only the laser to the edge, but this has negative power consequences.

“To achieve high availability for CPO, people have adopted the notion of an external light source — a pluggable module that’s on the front panel that can be replaced if the laser in that module fails,” said Bechtolsheim. “There are additional coupling losses compared to a conventional pluggable solution, which actually requires additional laser power.”

Dust becomes yet another issue for this setup. “You most likely also need expanded beam connectors to avoid dust contamination, and expanded beam connectors have higher losses than conventional single mode connectors,” Bechtolsheim explained. “The combination of the splitter and the polarization give losses between 0.6 to 1.2 dB per connector. By the time you have these additional connectors, you end up with roughly 2 dB extra loss on the optical side, which increases the laser power by 50% compared to a pluggable optical module.”

Reliability considerations
Reliability issues with CPO can be expensive to manage. “If the laser is on-chip and it fails, you have to replace the whole board,” said Korthorst.

That would make no economic sense. “From an overall reliability, manufacturability, and serviceability model, you don’t want to replace a very high-performance, high-powered system just because a laser has failed,” said Bechtolsheim.

Long-reach lasers tend to be reliable. But shrinking and integration can create issues, driven by electro-optical and thermal effects. Some of the issues can be managed, but the integration makes that more difficult. Excess heat also can affect the performance of the co-packaged electronics if not properly handled.

The smaller quantum lasers tend to have low yields in the first place, with as many as 50% of them failing to work properly. Then there’s an aging issue — and, in particular, high infant mortality. This is where the advantage of a pluggable module becomes clear.

“You manufacture 1,000 lasers of the same design, but out of 1,000, only a certain number will meet your design specs,” said Synopsys’ Patel. “And out of those, over a certain period of time, output power will drop in half. That’s just the nature of quantum mechanics.”

Reliability is, of course, important for any part of a server system — and yet we don’t have modular options for CPUs or GPUs, for example. “If an interconnect breaks between CPU and memory, you might just buy a new motherboard,” observed Pond.

The difference is that reliability of lasers is significantly lower than the other components. If that reliability could be increased to be on par with the other chips, then modularity would be much less of a consideration. “Either you make it more modular, or you make it much more reliable,” said Pond. “We have a long way to go to get to that level of reliability.”

Thermal issues complicate matters. “Optics is very sensitive to heat, so you’re constantly thermally tuning,” said Pond. “We use thermal effects to tune the optics to keep them working. But the problem is that, as everything else starts to heat up and heats up the optics, then your tuning can start to get out of control, and you have to start consuming a ton of power just to keep heating up the optics to keep it all in tune.”

Quantum dots may provide some relief. “The roughness of the material and the lasing angle determine reliability, and quantum dots are more insensitive to these variables,” said Patel. “Quantum-dot advocates say that reliability issues can be overcome by the benefits of quantum dots.”

While the fundamental reliability mechanisms are researched, redundancy is being explored with many of the laser options. The idea is to provide multiple lasers, with only one of them operating at a time. If that one fails, then a spare can be fired up for continued operation without having to remove the board.

It’s easy to confuse this redundancy with a recent approach from Ranovus that creates roughly 30 million quantum dots on a single chip. They provide multiple colors for deep wavelength division multiplexing (DWDM), but that still leaves a huge number of dots per color.

“They’re stacked on top of each other,” said Hamid Arabzadeh, CEO of Ranovus. “And these are self-assembling dots, so there’s no mask for it.”

Here, positioning of the laser can depend on the number of channels being generated. “If you go beyond 32 channels, customers want the laser to be outside,” Arabzadeh said. “If it is less than 32 channels, then we can attach the laser on top of the silicon photonics. For the internal laser source, we have cavities inside our monolithic chips that the laser gets dropped into. When you drop the laser inside, the light gets coupled from the edge of the laser into the photonics.”

Fig. 3: The top image shows a typical co-packaged configuration, with electronics mounted atop the optics, which sit on a substrate that holds the complete optical engine. The bottom shows an integrated chip that includes the driver, transimpedance amplifier (TIA), and optics. The optics use a huge number of quantum dots to generate multiple wavelengths for DWDM signals. Source: Ranovus

Fig. 3: The top image shows a typical co-packaged configuration, with electronics mounted atop the optics, which sit on a substrate that holds the complete optical engine. The bottom shows an integrated chip that includes the driver, transimpedance amplifier (TIA), and optics. The optics use a huge number of quantum dots to generate multiple wavelengths for DWDM signals. Source: Ranovus

These aren’t redundant lasers in the sense having only one on with spares waiting in the wings. They are aggregated to give higher power, but also by sheer numbers, they can withstand the failure of some number of dots. They are designed to be coherent, so the combination acts as if it were one coherent laser.

Ranovus says it improves reliability by running its chips through burn-in as part of the normal production cycle to eliminate the infant-mortality fallout before shipping. In addition, quantum dots’ better temperature stability can simplify some of the laser tuning as temperatures fluctuate.

Raising our perspective from just the laser to the entire system, eliminating the SerDes link can help to improve overall reliability. “Right now when transmitting a signal, you’ve got this complex copper interconnect to get to the pluggable,” said Pond. “And then, on the receiving end, you’ve again got a complex copper interconnect. Now you’ve got three interconnects that you could replace with one. You potentially increase overall system reliability due to the fact that you have fewer pieces in your link.”

One-design-fits-all vs. custom design
CPO presents one additional challenge — analysis of the entire contents of the advanced package must be performed for each chip/laser/package combination. These packages have many effects to take into account, making multi-physics tools necessary for simulation and analysis. That includes electronic analysis, optical analysis, signal integrity, thermal analysis, and the ability to model fundamental physics — notably, quantum effects.

“You’re adding a whole new layer of physics into an already complex set of electrical and thermal system problems,” noted Pond. “And now you add optics on top of that. It just makes it more challenging.”

More advanced lasers can make matters worse. “Simulating quantum dot lasers is a lot more complicated than more traditional MQW (multi-quantum-well) edge-emitting lasers,” he added.

Similar challenges apply when modeling an entire subsystem. “What happens in the modeling and simulation world is that when we need to simulate a combined link that has electrical, optical, and electrical components, now we need to figure out how to model the behavior of the optical transmitter and the fiber in the receiver, which is fundamentally different from the way electronic circuits work,” said Todd Westerhoff, product marketing manager, PCB division at Siemens EDA.

That can complicate the design of a CPO solution. While those issues are also important for pluggables, the modularity means that it can be done once for a module, and that work will serve for all of the applications that use the module. CPO breaks that modularity, meaning that each advanced-package application must include detailed analysis as a part of the design flow.

Incumbent vs. upstart
Given that pluggable modules are already in use, their obvious convenience means they have supporters. SerDes elimination appears to be the only strong power story that can knock pluggables out of the socket.

There is a certain attractiveness, however, to CPO in terms of possible efficiency and overall system power. In addition, the removal of SerDes connections simplifies the system design and improves system reliability. But that comes at the cost of a more complex packaging scenario, so it’s not yet clear whether it will net out as an overall simpler approach.

“I still see pluggables sticking around for a long time because of their modularity and ease of use,” said Pond. “CPO is going to enable all kinds of other possibilities. That will probably start with niche applications like, for example, ultra-high-performance computing where you say, ‘I desperately need this speed of access to this much memory, and I’m willing to pay a lot of money for it.’”

It also may be an enabler for the data-center disaggregation architecture. But it may or may not slowly make inroads against pluggables. “In the longer term, they may completely supplant the pluggables,” said Pond. “But that would take a long time. Optics takes away two major problems. One is that the losses do not increase once you go faster. And the length through which you propagate doesn’t change the power loss. Once you solve these problems at both ends, you can transmit at any speed over almost any distance with no additional cost. That means the future of interconnect is going to be all optical.”

The question, then, is what kind of optics. For now, CPO appears to have some proving to do before it can really give pluggable optics a run for its money. But CPO promises to expand the role of optics within the system, making it a likely development — even if it means co-existing with pluggables.



Leave a Reply


(Note: This name will be displayed publicly)