The physical layer interface is necessary for a chip to access the outside world, but it threatens to consume increasing portions of the power budget. What can be done to prevent a PHY limit?
Physics has been on the side of chipmakers throughout most of the lifetime of Moore’s Law, but when dealing with the world outside the chip, physics is working against them. Pushing data at ever-faster rates through boards and systems consumes increasing amounts of power, but the power budget for chips has not been increasing.
Could chips be constrained by their interfaces? While a chip with no physical data connections is theoretically possible, it is not practical. It would mean that all processing and memory were integrated on a single die, all sensors and actuators contained internally or interfaced wirelessly. And it would have no means to be accessed, updated, or even tested except through those wireless interfaces. As soon as a data signal passes from the chip to the outside world through the pins of the package, life becomes more difficult and it is the job of the physical layer interface, or PHY to deal with that high parasitic, noisy environment.
The problem is created by a mismatch. The voltages and currents within a chip’s logic circuitry have been on a downward trajectory along with node sizes, but the dimensions of the outside world have not changed in the same manner. Wires outside the package are hundreds of times longer, and they have much higher capacitance and resistance. Inside the chip, processing has become faster, and that depends on getting enough memory throughput—or access to the raw data required—which in turn means that the external interfaces must operate at an ever-faster rate.
How bad can this get? “There is one customer who has had several generations of ASICs and each chip is 400mm2,” says Bill Isaacson, senior director for ASIC product marketing at eSilicon.” They need a lot of memory bandwidth and they are doing that through DDR DRAMs. 50% of the chip is their logic and 50% is taken up with the PHYs. On top of that, especially when you have a large number of DRAM channels, you have collateral effects, which mean that you need a large package because of the pins and the isolation of those pins. That drives the board where they need 20 layers. Without that we could get away with 4 or 6 layers. All of this translates into cost.”
PHYs need high drive capability and often contain circuits that try to compensate for the bad things that happen between the chip and the other end. That means they are also power hungry. Isaacson adds that “a third to a half of the total power is consumed just for the PHYs.”
Power is an issue for all chips independent of the power they consume. Low-power chips often are powered by batteries, so every mW matters. High-end servers often run into thermal problems and therefore have to fit within the limits imposed by their operating environment.
Scaling the PHY
Most PHYs contain both analog and digital components, and the analog piece does not scale well with the technology node. That means the PHY could consume a greater percentage of the overall chip at each new technology node. “If you follow the design rules for smaller geometries, then the PHYs do not scale, and in some cases may get bigger,” says Navraj Nandra senior director of marketing for Synopsys’ DesignWare Analog & MSIP Solutions Group. “The smaller geometries have tighter design rules and that can make them more challenging. If you do nothing, the best case is that it does not scale. The worst case is that it gets larger.”
The voltage disparity also could increase. “The interface voltage is usually higher because the core voltage can shrink with the process but the interface is tied to the channel and the signaling standard,” says Kishore Kasamsetty, director of product management and marketing for Cadence. “Half of the power budget could be consumed by the memory sub-system, and that could be 1W for high-end systems using a 72-bit wide interface. You can build systems that are only 16 bits wide and have a lot less power, but also lower bandwidth. So you run into the memory bandwidth wall and the power wall because of the memory access architecture.”
Thankfully, reality is not quite as bleak as all of this sounds. “We have managed to reduce the area of our PHYs when moving to the next node, but it requires going back to basics and looking at the architecture and finding different ways to do the same function,” says Nandra. “When we moved from bulk 28nm technology to 16nm finFET technology, it took almost a year to re-architect to get the scaling.”
Parallel versus serial
Most chips will contain two types of interfaces, parallel and serial. “A serial interface has a serializer/deserializer (SerDes), and the data is run as fast as you can,” explains Kasamsetty. “With the memory interface, or other parallel interfaces, it is accessing more data at a time. You need parallel access to get enough data throughput.”
The PHY design for each is substantially different, and it basically comes down to the timing reference used for each interface. In a serial link, each lane in a data link is self-sufficient. For the parallel interface, the clock is not embedded so you have to worry about skew between signals. That difference impacts the maximum frequency of the interface. For serial links, speeds of 28GHz are common, whereas parallel interfaces are running at 200MHz.
This impacts the surrounding circuitry. A serial connection requires clock and data recovery circuitry. Parallel interfaces require a PLL or DLL to de-skew each data line.
Figure 1: Block diagram for a DDR PHY
But clock rate is not everything. “Parallel interfaces have a latency advantage because you don’t have to squeeze everything through a serial channel,” says Nandra.”To get the same throughput for a parallel interface, you need many parallel lines. Consider the transformation of PCI. When PCI moved from PCI to PCIx to PCI expres,s it went from being a parallel interface and converted to being a serial interface. This increased the bandwidth and is now going to 16G transactions/S, but the challenge has always been latency.”
There is another problem associated with the memory interface. “With a SerDes interface, it is balanced in that both end are built on an ASIC process,” points out Kasamsetty. “With memory, the far end is on a non-logic process and is usually much slower. That means it is asymmetric in terms of the responsibility of managing the interface. The burden is on the SoC side.”
This comes about because the memory industry is very cost sensitive and the fabrication process is highly optimized for storage density, not for fast logic. This limits the transfer speed, and the DDR interface is a reflection of the current assumptions associated with the boards, packages, connectors and DRAM technologies available.
So what changed to make DDR5 possible? “All pieces of the interface have to be adjusted,” explains Kasamsetty. “DRAMs are only marginally better, but the SoC is a big factor. DDR4 was designed for 28nm and now we have DDR5 targeting 7nm which means you have faster logic. The PHY is a mix of analog and digital and while the I/O transistors do not scale much, we do benefit from the logic scaling. Things, such as skew, tend to get moved to the digital side. There is also a cost factor. What happens when a company like Intel does a server reference design for the board and the connectors? The volume that is created drives down prices. What may be too expensive today will change when it becomes mainstream. To make DDR5 happen you do need more layers on the board. These things are evolutionary.”
Getting more out of the PHY
In a lot of cases the memory is constantly being written to, and read from. “That is difficult to deal with from a power standpoint,” says Isaacson. “With ASICs based on finFETs, the process really helps with the leakage issue. That means we are really dealing with active power, and if you are doing non-stop reads and writes to the memory, what can be done to reduce power? At that point, we can only talk about scaling voltage or adjusting frequency. It is a lot simpler to scale voltage. Scaling frequency becomes quite complex.”
This interesting thing is there is no standard for the PHY. “JEDEC is the standards body and they define what the device should look like and how it should comply,” points out Kasamsetty. “It is up to the PHY provider or SoC companies how they want to use it. You can be compliant to, let’s say DDR4, but how you do frequency scaling is up to the PHY. In mobile application processors, they do a lot of frequency scaling to save power. As you try and push performance, power efficiency goes down.”
One way to deal with that in the mobile world is to use dynamic frequency scaling at the SoC level, and also at the interface level. “How you manage that is up to the PHY and the controller,” continues Kasamsetty. “This sets the latency between the frequency switches and you could turn off circuits when you reduce frequency, but how many do you turn off, what are the frequency points, will you bypass the PLL and shut them down? There is a lot of low-power state management left to the PHY.”
As is often the case, low-power states add latency. When frequency switching is performed, the PHY needs to be trained and initialized for that frequency. When you change frequency, you could save the training state of the lower frequency point so that when you switch back, there may be no retraining required.
Nandra provides another example of the ways in which the system can be optimized. “When the PHY and the digital controller are combined, you can reduce issues such as latency in the digital layer by removing redundant sideband signals that are required for compliance requirement, such as for PCI express. When combined you don’t need those compliance requirements and they can be removed and that helps both area and latency.”
As the design of PHYs becomes more complex, there are a dwindling number of people capable of producing high quality PHYs at the most aggressive technology nodes. “We develop our own High-Bandwidth Memory (HBM) PHY and we develop other memory interface PHYs when they are not otherwise available in the market,” says Isaccson. “But when it comes to things that are more serialized, we typically go externally to get those.”
Many companies have switched from make to buy and increasing numbers of them no longer have the necessary expertise in house. “Within the networking space, SerDes is so critical that there are more companies that will design their own,” says Kasamsetty. “In the application processor space, there are more people who do the memory interface themselves and they do that because it affects the value proposition of the product. They may also do it to get a time to market advantage. DDR5 or LPDDR5 is not finished, but for the big players, they can work with the vendors and come up with the designs even before the standard is published.”
A discussion about PHYs would not be complete without looking at the impact that new packaging technologies may have on PHY design. “The environment in terms of LRC improves dramatically when you go from a flipchip or wirebond into a 3D implementation where you are dealing with microbumps,” says Nandra. “The distance reduces so you see less capacitance, and the inductance is next to nothing because you are pushed right up next to the microbump.”
One of the early interfaces in this area is HBM. “If you integrate memory on chip, you are really stuck with SRAM,” says Isaacson. “Those memories can be configured for tremendous bandwidth. A DDR can do 3.2G transfers per second across 16 pins. On an ASIC, I can do a 1,000-bit-wide channel at a similar speed, so there is no comparison from a bandwidth standpoint and no comparison from a power standpoint. It is much more efficient. With HBM, we are dealing with things in package. We can get tremendous memory bandwidth at very low power simply because of how the overall spec for the memory has changed. We are taking advantage of that memory and its impact on the ASIC and the PHY.”
There are other advantages, as well. “If you consider the speed that each of those parallel lines is running at with HMB2, it is not concerning from a crosstalk perspective,” adds Nandra. “We are looking at 1,024 pins, and the amplitude of those signals and the distances they are traveling is very small. We have HBM2 silicon for a graphics chip and we are running that at 2TB/S.”
But there are downsides. “The major market today is the graphics market and they are using it for high-end systems,” points out Kasamsetty. “These are very expensive and the cost is not viable for many other applications today. When will the cost curve come down enough for the next big market, such as networking? It will take a while and for consumer applications, such as phones. It is still some time away.”
Isaacson also points to a technical problem that needs more attention. “With external DRAM, they are thermally coupled through the board, but it is relatively loosely coupled. If you are talking about integrating a memory into the package, the thermal coupling becomes very strong. An ASIC has a wider temperature operating range than DRAM so, if they are all integrated together, the least tolerant to heat defines the spec for the whole system. It becomes a double whammy from a thermal perspective. It is not just dissipating heat. It is maintaining a junction temperature considerably below what you otherwise might be fine with.”
The design and integration of PHYs cannot be done in isolation. “We have to consider the overall system,” concludes Isaacson. “The expertise that goes into building one of these is not just the design of the analog portion of the PHY, but also in the package design, the board design, the signal integrity, the power delivery, the thermal delivery and, depending upon the memory type, may also lead to a discussion around manufacturing capability and mechanical capability. And if you actually build it, then how do you take the risk out of it? It can encompass a wide range of disciplines to be able to come up with a working product at the end of the day.”
Integration IP Helps IP Integration
The MIPI Alliance has been the plumber for the mobile industry and now hopes to migrate that success into the automotive and IoT spaces.
The Week In Review: Design (02-17)
How Testing MEMS, Sensors Is Different
These devices require more than an electrical input and output.
A Primer For The 802.XX Physical Layer (Oct’15)
Part 2: What the PHY is all about and why it’s so important in the IoE.