What’s After PAM-4?

Second of two parts: Parallel vs. serial options


[This is part 2 of a 2-part series. Part 1 can be found here.]

The future of high-speed physical signaling is uncertain. While PAM-4 remains one of the key standards today, there is widespread debate about whether PAM-8 will succeed it.

This has an impact on everything from where the next bottlenecks are likely to emerge and the best approaches to solving them, to how chips, systems and packages are designed. (Part one of this series, which addressed serial connections and alternatives, is here.)

“PAM-8 is not right now being that actively considered,” said Saman Sadr, vice president of product marketing for IP cores at Rambus. “The amplitude that we can reliably transmit is about 1V peak-to-peak.”

Go too far above that and you risk gate-oxide breakdown – unless you use a thick-oxide transistor, which will be slower. Go too much smaller and you risk losing the signal-to-noise on longer runs.

New standards can draw more power as long as the performance increase is high enough. “The typical trend is to double bandwidth with 30% to 50% more power,” Sadr said. If it takes too much more power, then it’s no longer worth doing.

He’s not alone. “A couple of IEEE standards use PAM-12 and PAM-16, but at lower speeds with lots of lanes,” said Brig Asay, director of strategic planning at Keysight Technologies. For example, 10GBASE-T uses PAM-16 with four lanes transmitting 800 million symbols per second each. This represents a modest clock rate as compared with the 200+ GHz clocks at the leading edge.

For those who don’t see PAM-8 in the future, then what’s next for serial connections? “The smallest differentiable voltage at the receiver can be microvolts,” Sadr said. He provided the following analysis of signal attenuation down various line lengths:

  • Signals that go through a backplane or other connectors are referred to as long-reach (LR), and across that channel, the signal loses in the range of 35 to 36 dB. The signal drops to roughly 1% of its transmitted value when it arrives at the receiver at 112 GHz. Power efficiency is 5 pJ/bit.
  • Very-short-reach (VSR) signals, which go from chip to chip or to another module on a board, lose 18 dB, leaving roughly 10% of the original signal at an efficiency of 2.5 pJ/bit.
  • Extra-short-reach (XSR) and ultra-short-reach (USR) signals go from chip to chip within a package, and they lose 5 – 10 dB at a cost of 1 pJ/bit or less.

Packaging plays a role in this decision. “If outside the package, you’d use serial,” said Manuel Mota, product marketing manager for high-speed SerDes PHY, analog, and Bluetooth IP at Synopsys. “If inside the package, then it depends on the packaging.”

What happens if we try to double the rate? “We could go LR with PAM-8, but it would take too much power,” Sadr said. The maximum reach would have to decrease, with something else taking over for backplanes.

Wendy Wu, director of marketing in the IP Group at Cadence, agreed. “A copper backplane with 200GHz will be very power-hungry. It could burn more power just to double the PAM-4 rate. With multi-chip assembly, the reach will be much shorter (XSR). PAM-8 might make sense.”

The power it takes to receive a clean PAM-x signal appears to be the key concern here. “It takes lots of power to clean up PAM-4,” said Amin Shokrollahi, CEO of Kandou.

So, if PAM-8 isn’t next, then what is? One option Sadr sees is a hybrid parallel/serial arrangement, with multiple serial lines in parallel. This kind of arrangement is already in use and is an obvious way to scale bandwidth as long as new skew issues aren’t introduced.

Another option is to go the route that wireless has gone. While wired connections have relied solely on amplitude for coding, wireless signals also use phase as a variable, giving rise to “quadrature amplitude modulation,” or QAM. This could be deployed as well on wired signals.

QAM is implemented by taking multiple bitstreams and applying PAM on each one. Each stream is then phase-shifted to a position unique to that stream, and the multiple streams are then mixed onto the wire for transmission. Because of orthogonality, those streams can be separated off the line at the receiver.

“For 1.6 Tbps, PAM-8 and -12 are being looked at heavily. Even PAM-16, but with a QAM signal. They’re concerned that they can’t do clean eyes on PAM-8,” said Asay. “A couple of big inflection points are coming, for both serial and parallel. It could be a move to QAM.”

Fig. 1: 16-QAM encoding. Source: Wikipedia

Once electrical options have been exhausted, the next move would be to optical. Sadr sees this as what will succeed PAM-4 on backplanes. The question will be whether optical ends up being lower power and/or lower cost than the alternatives. While the optical signals have less channel loss, they are more complex to build – monolithic circuits are no longer possible. But they still may have a cost advantage.

Said Sadr, “Our industry is striving to reduce power, enabling more sophisticated modulation schemes — which historically was the solution to service longer reach. However, this no longer seems to be adequate — as it requires both power increase and higher cost of materials. The viable solution for the next gen (e.g., 224 Gbps), trends towards delivering this signal in NRZ or PAM-4 (differentially) to the optical engine and taking it from there in the optical domain. As the market adopts these solutions, the material cost of the optical and electrical integration will moderate. Early adopters may tolerate the higher costs given the performance benefits, but power will ultimately be king.”

Kandou sees this possible disruptive change as part of the value of Chord, which it claims will keep signals cleaner without going to optical. “We increase the life of copper,” said Shokrollahi.

Parallel connections
Modern parallel connections can involve thousands of signals. This works because technologies like silicon interposers, used for interconnecting dice or chiplets within a package, allow much finer line pitches than package wires or standard PCBs do. Meanwhile, signal pads on chips have changed from large pads that accept looped wire connections to bumps to micro-bumps to micro-pillars, all facilitating higher connection density.

With this many signals, board or interposer yield can be an issue, so redundancy is employed. In his article, “Parallel-Based PHY IP for Die-to-Die Connectivity,” Mota wrote, “To maximize yield, the parallel die-to-die PHY includes redundant lanes distributed per channel, lane testing capabilities, and circuitry to re-route signals from lanes that are identified as defective to the redundant lanes.”

Clocking for parallel connections is modest — in the low-GHz range. Bandwidth is achieved not by the speed of any particular wire, but by the accumulation of more wires carrying data. Performance scales up with the width of the bus more than the speed of the clock.

The major challenge faced by parallel buses is skew. Longer signals are more prone to higher skew because there is more opportunity for different lines to differ in length. Even given a fixed length, however, higher clock speeds increase skew concerns because what might have been negligible differences in length at a lower frequency become significant when clocking faster. This is why serial channels are used for longer reaches.

There are two ways in which the transmitted clock can be sent to the destination. Using a common clock introduces clock/data skew issues that can be reduced by forwarding a clock signal from the transmitter to the receiver. But that creates two clock domains: the main transmitter domain and the main receiver domain. While these clocks may originate from the same source, they’ll be slightly different in phase (a relationship known as “mesochronous”). So there must be a way to provide a transition from one domain to the other.

One way to do this is by forwarding the transmitter clock alongside the data. Once received, the data then can be resynchronized to the receiver clock domain. Alternatively, the receiver clock can be sent back to the transmitter, with that clock then used for transmission. The clock-domain transition then occurs in the transmitter prior to the I/O stage.

There are few signal formatting options with parallel connections. The vast majority of them are non-return-to-zero (NRZ) signals. “The whole concept of parallel implementations is simplicity,” said Mota. It is possible to double the data rate by having the receiver respond to both edges of the clock for double-data-rate (DDR) signaling.

Fig. 2: An illustration of NRZ signaling (with return-to-zero, or RZ, shown for contrast). For single-data-rate (SDR) clocking, a transition happens on only one edge of the clock (rising in the example shown here). With DDR clocking, both clock edges cause a transition, allowing data symbols to move at twice the rate of the clock. Source: Bryon Moyer/Semiconductor Engineering

Will this change for future parallel connections? Some think the answer is no, with the only evolution being a ramping up of the clock rate. “Parallel is likely to exceed 10GHz over the years,” said Mota.

Keysight agreed on the faster clocks. “There are some new oscillators coming out that let designers create really clean clocks,” said Asay. But even so, change may already be happening: “DDR-6 may not handle NRZ,” he said. “Or it might jump over PAM to QAM.”

How to probe
Keysight, being in the tester/analyzer business, also notes the difficulties that these new formats pose for probing signals. Asay believes that access to the signal is the key challenge. “Five years from now, actually seeing what’s going on will become a nasty challenge.”

The industry is long past the time when you could probe signals at the edges of packages with through-hole pins. It’s increasingly difficult to probe bumps under the die, especially when stacking dice for memories and other 3D packaging approaches. Likewise, probing chiplets is hard due to the incredibly fine lines that can be created on an interposer or an Intel EMIB bridge.

The signals on the line are themselves increasingly delicate, and probing the wires runs the risk of altering the signal. Asay noted that the probes must have high impedance to avoid affecting the signals, but it’s hard to design for high speed with high impedance. And AC coupling makes the signals even more sensitive, since they’re literally floating, from a DC standpoint.

“There isn’t anything coming that’s going to make this easier,” he said.

Is the future serial?
With all of the attention that serial connections are receiving, it’s natural to wonder whether parallel connections will ultimately give way to serial ones. With parallel connections, one needs thousands of wires, and those take up significant space along the edge of a die – what’s referred to as “beachfront.” For large dice, this may be an acceptable tradeoff for lower latency. As Asay noted, “Parallel has better latency that AI and memory prefer.”

Serial connections have fewer wires, but they also require more sophisticated circuitry. In the end, it all boils down to cost. Cheaper packages may require serial connections, which may pencil out if the overall solution is cheaper than using a more expensive package with simpler signaling. For short-reach signals inside a package, it may even be possible to eliminate the clock embedding and, instead, forward the clock on a separate link and share that clock within the package — as long as skew remains manageable.


Updated on 6/26/20:

The original version of this story included a statement that QAM meant not needing clean eye diagrams. This was challenged in a comment, and I went back to the source of that statement, and they provided a modified statement, from which I quote:

“You would not need to ever see an ‘eye,’ however a constellation diagram of the demodulated I-Q would still be the basis of an EVM [error-vector magnitude] measurement.  EVM would be the primary measure of the [transmission] output quality – just as it is with optical QAM (coherent) signals… We have not seen QAM in electrical yet, but when it occurs, many of the same measurement concepts from optical (which were already stolen and repurposed from wireless MIMO) will see life in high-speed electrical interconnect. One of the things that came from the OIF 200G call is that there will probably be multiple modulation schemes used for some of the same reaches, as the tradeoffs needed to make 200G electrical work are too great for an ‘one size fits all’ standard covering that reach.  QAM could play where the latency is not an issue.” – Brig Asay, Keysight Technologies

Related Stories
High-Speed Signaling Drill-Down
First of two parts: Different schemes emerge for moving signals down channels more quickly.
High-Speed SerDes At 7/5nm
How to place macros inside a PHY in 7/5nm SoCs.
Full-Duplex Wireless Remains A Promise And A Challenge
Rising complexity, available spectra and lack of standards are slowing performance improvements.


Hiro Suzuki says:

Excellent article, my customers are much interested in Japan.

Leave a Reply

(Note: This name will be displayed publicly)