Innovative new clocking schemes in the latest LPDDR standard enable easier implementation of controllers and PHYs at maximum data rate as well as new options for power consumption.
Earlier this year, JEDEC released the new standard, JESD209–5, Low Power Double Data Rate 5 (LPDDR5). Those that contributed to the development of the standard come from a diverse technology background and represent both manufacturers and consumers of SDRAM memories. Now we have a new memory standard to help enable the future that requires more compute power, higher reliability, and lower power.
This first in a series of articles highlighting the new LPDDR5 standard compares the signal clocking architecture of the LPDDR5 standard as compared to its predecessor (JESD209-4, LPDDR4).
The LPDDR5 standard offers several feature enhancements compared with the existing LPDDR4/4X standard, including support for larger densities, higher speed operation, a flexible bank architecture, enhanced Reliability, Availability, Serviceability (RAS) capabilities, new low-power features as well as a new clocking architecture. LPDDR5 memories will soon be found in applications such as smartphones, automotive, artificial intelligence (AI), embedded applications, SSDs, and various consumer applications.
High-speed external clocking
One of the key aspects of LPDDR5 is the introduction of a new clocking scheme. In all previous generations of LPDDR (and DDR for that matter), a single clock from the host to the device essentially synchronized the interface between the host and device. This clock signal (CK) was used to set the transfer rate of the command and address (CA) signals passing from the host to device. In addition, it fixed the rate at which data (DQ) and the data strobes (DQS) were transferred between the host and device (writes) or the device and host (reads). See Figure 1.
Figure 1: Synchronous CK and bidirectional DQS in pre-LPDDR5 (LP)DDR system
When considering LPDDR4, both the clock signal and the data strobes operate at a maximum rate of 2133 MHz. In LPDDR4, the CA bus is a single data rate (SDR) bus, meaning with every clock cycle one packet of information is transferred from the host to the device. Since the LPDDR4 CA bus is SDR, the maximum effective rate of information transfer on the CA interface is 2133 Mbps. In LPDDR4, the data bus is, as the name implies, double data rate (DDR). Since the data bus is DDR, with every clock two packets of information are transferred, making the maximum effective rate on the data bus 4266 Mbps. See Figure 2.
Figure 2: Waveform showing SDR CA bus and DDR DQ bus as specified for LPDDR4-4266 (Only one of two differential signals shown for CK and DQS)
It should be noted, in LPDDR4 the data strobes are implemented as a differential pair and are bi-directional. The LPDDR5 standard evolved to implement two different pairs of differential signals – both effectively unidirectional signals with one going from host to device and one going from device to host. The signal going from host to device is called the write clock (WCK) and the signal going from device to host is called the read data strobe (RDQS).
This change in clocking between the host and device is indicative of a change in the fundamental way the device itself works. An LPDDR5 device relies on WCK to not only capture the write data from the host, but it uses WCK to generate RDQS and push out DQ on reads from the device. This change brings about both opportunities and challenges. See Figure 3.
Figure 3: CK, WCK and RDQS* in an LPDDR5 system
* There are some special cases where RDQS is bidirectional.
The new clocking architecture allows the decoupling of the traditional clock signal from the host to the device and the data strobe signals. In fact, while the new maximum rate of WCK and RDQS in LPDDR5 are 3200 MHz to enable a data transfer rate of up to 6400 Mbps, the fastest rate the CK will run from the host to the device is only 800 MHz (even when the data channels are operating at 6400 Mbps).
Decoupling the clock signal from the strobes, and thus allowing the clock signal to run significantly slower than the data strobes, allows the CA bus to evolve from an SDR bus in LPDDR4 to a DDR bus in LPDDR5. Even though the CA bus has been changed from SDR to DDR, since the CA clock has been capped at a maximum rate of 800 MHz the maximum transfer rate of information on the CA bus is now 1600 Mbps. While LPDDR4-4266 requires a CA transfer rate of 2133 Mbps, LPDDR5-6400 only requires a CA transfer rate of 1600 Mbps, as seen in Figure 4.
Figure 4: Waveform showing DDR CA bus and DDR DQ bus as specified for LPDDR5-6400 (Only one of two differential signals shown for CK, WCK and RDQS)
Decoupling the CK and WCK is challenging because the LPDDR5 SDRAM requires internal synchronization of these signals in order to process any data transfer to or from the device. The synchronization of CK to WCK takes several CK cycles, meaning there is a real penalty involved when performing the synchronization operation, so it will be advantageous to avoid this whenever possible. Additionally, there is a specific sequence for how the WCK must behave for synchronization to occur, starting with static assertions for at least one CK, followed by one CK of half rate activity, followed by a variable number of CKs of full rate activity based on the operating frequency. An example of the synchronization procedure is outlined in Figure 5.
Figure 5: Simple illustration of clock and WCK synchronization (Only one of two differential signals shown on CK)
There are two options regarding synchronization of CK and WCK. The easy option is simply to synchronize the signals once and then keep WCK running constantly to maintain synchronization (this is known as free running mode). While this option requires little ingenuity, it does come at the expense of system power. Given the most prolific use of LPDDR5 devices will be in the mobile market, the desire to save power will be strong, which means the system must turn off WCK whenever it isn’t absolutely required. Turning off WCK requires a resynchronization of WCK to CK before any data transfer can occur. In order to manage this efficiently the LPDDR5 memory controller will need to be very clever in how it schedules commands, so the synchronization operation does not add unnecessary latency.
High speed internal clocking
The decision to decouple the CA clock and the data strobes affects not only the interface between the host and the device – it also affects the interface of the LPDDR5 controller and LPDDR5 PHY inside the host.
Inside a typical host, a controller and a PHY communicate with the external memory. The interface between the controller and PHY is commonly implemented with a specification known as the DDR PHY Interface (DFI). The DFI specification allows SoC designers to separate the design of the (LP)DDR controller, which typically converts system commands into (LP)DDR commands, and the (LP)DDR PHY, which typically converts the digital domain on the SoC to the analog domain of the host to device interface. Having a defined interface between the (LP)DDR controller and (LP)DDR PHY provides SoC designers a large amount of flexibility when selecting the (LP)DDR controller and (LP)DDR PHY solution.
If we examine an LPDDR4-4266 solution from an internal LPDDR4 controller and LPDDR4 PHY perspective, it is notable that while the PHY will typically run at the same speed as the memory, or a maximum of 2133 MHz, the interface between the LPDDR4 controller and PHY (e.g., the DFI interface) will typically run at half that speed, or 1066 MHz. This is commonly referred to as a DFI 1:2 frequency ratio solution since a single LPDDR controller clock covers two memory clocks. This approach is used to achieve a reasonable maximum clock frequency to close timing within the ASIC design flow for the digital logic of the controller.
The internal LPDDR5 controller and LPDDR5 PHY have a different clocking relationship when used in an LPDDR5-6400 solution. The data interface between the host and device is running at a maximum rate of 3200 MHz. Mimicking the LPDDR4-4266 internal DFI 1:2 frequency ratio would mean that the interace between the LPDDR5 controller and LPDDR5 PHY would be running at 1600 MHz, which is not a reasonable expectation for an LPDDR5 controller of any significant complexity. Instead, it is ideal to transition from a DFI 1:2 frequency ratio to a DFI 1:4 frequency ratio which allows for four clocks on the memory for every single LPDDR5 controller clock. This will allow the interface between the LPDDR5 controller and LPDDR5 PHY to run at 800 MHz, even while the LPDDR5 PHY runs the data interface to the memory at 3200 MHz.
However, remember that the CA interface between the host and device is running at a maximum transfer rate of 800 MHz, which should not be stepped down to 200 MHz at the DFI simply because the data transfer rate requires a DFI 1:4 frequency ratio. The LPDDR5 PHY must already manage multiple clock rates to interface to the memory, so it is ideal to contain the clocking complexity within the LPDDR5 PHY. By doing this one maintains a DFI 1:1 frequency ratio for the LPDDR5 commands while moving to a DFI 1:4 frequency ratio for LPDDR5 data and keeping the LPDDR5 controller and the entire DFI running at 800 MHz. This new mode of LPDDR5 controller and LPDDR5 PHY interoperation is known as a DFI 1:1:4 frequency ratio – DFI 1:1 for commands and DFI 1:4 for data. See Figure 6.
Figure 6: Illustration of clock domains for an LPDDR5-6400 Solution using DFI 1:1:4 frequency ratio
Lower speed clocking options
The above sections discuss the external and internal clocking when running at the maximum data rate, 6400Mbps, as defined by the new LPDDR5 standard. However, there are use cases when it is advantageous to run the interface slower, for example to conserve power when maximum bandwidth to the memory is not required. In such use cases, the LPDDR5 standard offers options to maximize lower speed performance while minimizing power consumption.
The first option is the ability for the CA clock rate to adjust when lowering the data strobe and data transfer rates. Once the data transfer rate drops to 3200 Mbps or slower, it is possible to change the CK to WCK ratio from 1:4 to 1:2, allowing the user to keep the CA transfer rate at 1600 Mbps while the data transfer rate is slowed to 3200 Mbps. See Figure 7.
Figure 7: Waveform showing DDR CA bus and DDR DQ bus as specified for LPDDR5-3200 with CK:WCK ratio of 1:2. Only one of two differential signals shown for CK, WCK and RDQS.
By providing an option to slow down the data bus while keeping the CA bus running at the same data rate, the system has the option to adjust internally as well.
When the CK to WCK ratio is 1:4, the DFI interface operates internally at a 1:1:4 ratio. When the CK to WCK ratio is operating in a 1:2 mode, the DFI operation is updated to work in a 1:1:2 mode. In each case the LPDDR5 controller, DFI, PHY core and CK run at the same speed. However, the DFI frequency ratio for the data operations change to either 1:4 in the case where the LPDDR5 SDRAM data transfer rate is greater than 3200 Mbps and the CK to WCK ratio is 1:4, or 1:2 in the case where the LPDDR5 SDRAM data transfer rate is 3200 Mbps or slower and the CK to WCK ratio is 1:2. This adjustment of the DFI operating frequency ratio allows the LPDDR5 controller and DFI domain portion of the LPDDR5 PHY to run at up to 800 MHz for any speed of operation, keeping the latency through the internal LPDDR5 controller and LPDDR5 PHY as low as possible for all speeds of operation.
Differential, single ended, and strobeless operation
During high-speed operations (the assumed majority mode of operation when not in a low-power state), the LPDDR5 device will use CK, WCK, and RDQS in differential mode to provide maximum performance. However, there are use cases for running the interface slower. The LPDDR5 specification has some built-in power saving capabilities for these use cases.
One power saving option provided by the LPDDR5 specification offers the ability to change the three differential signals CK, WCK, and RDQS into single-ended signals when running at data rates at or below 1600 Mbps. If we take the assumption of running the CK to WCK ratio as 1:2, then CK will be running at 400 MHz and WCK (and RDQS) at 800 MHz when CK, WCK, and RDQS are placed into single-ended mode operation.
The user also has the option to place CK and WCK in single-ended mode of operation and turn off RDQS entirely. Intended for low-speed operation, this is known as strobeless mode and requires the LPDDR5 PHY to generate an internal strobe to capture read DQ from the device.
When switching CK and WCK from differential to single-ended mode of operation and changing RDQS from differential to either single-ended operation or strobeless mode, it is required to disable device termination for CK, WCK and RDQS as well as to the CA signals, the DQ signals, and the data mask inversion (DMI) signal. Moving signals from differential mode to either single ended mode or turning them off entirely saves power, and not terminating most of the signals of the LPDDR5 interface saves additional power.
There are choices and restrictions to consider when setting CK, WCK, and RDQS into a single-ended mode. WCK and RDQS may only be configured for single-ended mode when CK is also configured for single-ended mode. It is also possible to enable single-ended mode for CK while keeping both WCK and RDQS in differential mode. If WCK is put into single-ended mode, then RDQS must also be placed into single-ended mode (with the same polarity chosen for the active signal for both WCK and RDQS) or placed in strobeless mode. Table 1 lists all the valid combinations for CK, WCK, and RQDS.
Table 1: Allowed combinations of CK, WCK and RDQS
Summary
The introduction of the LPDDR5 specification not only enables the implementation of a new low-power SDRAM standard, promising larger density devices and faster data rates, it also outlines some innovative new clocking schemes which allow for easier implementation of LPDDR5 controllers and LPDDR5 PHYs when running at the maximum data rates allowed by the specification. Additionally, the specification offers a few options for power savings with the clock and data strobes when the memory cannot be placed in a low-power state but does not need to run at higher data rates.
Synopsys, the memory interface IP leader, offers a complete LPDDR5 IP interface solution including a configurable LPDDR5 controller, LPDDR5 PHYs available in a wide variety of technology nodes, and LPDDR5 Verification IP. Synopsys is an active member of JEDEC helping to drive development and adoption of the newest memory standards. Synopsys’ configurable memory interface IP solutions can be tailored to meet the exact requirements of SoC’s for applications such as AI, automotive, mobile and cloud computing.
During low-speed operations, an internal strobe can be used to capture read DQ from the device, why not use the internal strobe during high-speed operations?Read synchronization problem?
probably because the internal strobe wouldn’t be able to meet timing requirements of a higher speed
LPDDR5 for 5G : The topic for a follow-up article… Thanks.
Really Thank you so much for educating me on LPDDR5, greatly appreciate.
is it against the rules to use 4:1 at 3200Mbps or lower? Is it not recommended? Is bank group mode allowed at the fringe speed of 3200Mbps?