Using Multi-Channel Connections for Optimized LPDDR4 Power & Performance

To deploy LPDDR4 effectively, understand the fundamental changes to its architecture.

popularity

LPDDR4, the latest double data rate synchronous DRAM for mobile applications, includes a number of features that enable SoC design teams to reduce power consumption of discrete DRAM in mobile devices. Desktop devices like PCs and servers commonly utilize DDR devices mounted on dual inline memory modules (DIMM) hosted on 64-bit wide buses. This board-level solution allows field-upgradeable DRAM capacity expansion, but requires long and more heavily loaded interconnects which consume more power than short traces. Systems using LPDDR2, LPDDR3 and LPDDR4 tend to have fewer memory devices on each bus and shorter interconnects, and thus consume less power than DDR2, DDR3 and DDR4 devices. Design teams can call on power-saving options within the LPDDR4 DRAM. These features include reduced voltage and I/O capacitance; a reduced width, multiplexed command and address bus; eliminating the on-DRAM DLL; providing lower power standby modes with faster entry and exit; and enabling faster, less complex frequency changes.

LPDDR4 Use Models for Mobile Devices

Realistically, mobile users only benefit from the highest operational frequency of LPDDR4 for a small percentage of their device’s usage. That’s when the user is capturing or displaying high definition video, playing games with very intensive graphics needs, processing images, or booting or loading new software. For part of the time, the memory drops to the LPDDR3 speed grade. This level of performance is sufficient to support texts, calls, web browsing, photography, simple gaming: all features that don’t place too many demands on the CPU or GPU.

For the majority of the time, when the mobile device is not in use and in a pocket or at a bedside, the DRAM is switched off or in low speed mode. It will have one channel of the memory active just to perform ‘always-on, always-connected’ tasks. In this mode, the device is performing background activities such as maintaining cell contact, receiving messages, receiving / displaying push notifications, synchronizing mail, and displaying the time.

However, it is the performance of the device during the highest use time that drives many mobile users to upgrade their devices, which is why it is so important to provide an outstanding user experience in this use mode.

LPDDR4 Architectural Changes

The LPDDR4 specification defines a range of performance and feature improvements over its predecessors. But most importantly, LPDDR4 incorporates a fundamental change of architecture: LPDDR4 devices are arranged as two independent channels on each die. DDR2, DDR3, and DDR4 devices offer one command address bus input and one data bus per package, and most commonly one die per package. LPDDR2 and LPDDR3 may offer one to four dies per package. In the case of two-die and four-die packages for LPDDR3 and LPDDR2, generally two independent command address input and data busses (channels) are provided. In other words, multi-channel has partial enablement in LPDDR2 and LPDDR3 as they offer two independent channels per package. LPDDR4 forces the issue into the forefront as there are two independent channels per die and four channels in most packages.

Connecting Multiple Channels

The LPDDR4 architecture is natively two-channel (Figure 1), in that each die has two command address inputs and two data buses per die. Four independent channels are available on a LPDDR4 2-die package. To deploy LPDDR4 effectively, designers must understand how this architectural change affects the system architecture.

Synopsys1

Figure 1. LPDDR4 two-channel architecture

A single DRAM device with one channel (for example, a single-die package of LPDDR3) can only be connected one way — with the command/address bus on the SoC to the command/address bus on the DRAM and the SoC data bus to the DRAM data bus (Figure 2). A chip select enables the DRAM when it is required.

Synopsys2

Figure 2. A standard way to connect a single DRAM device

Having two DRAM devices, or one DRAM device with two independent interfaces like LPDDR4, supports four possible configurations: parallel (lockstep), series (multi-rank), multi-channel, and shared command/address.

Parallel (lockstep) connection

The most familiar option for designers experienced in DDR2/DDR3/DDR4 is the parallel, or lockstep, configuration. The parallel configuration (Figure 3) is appropriate to two or more DRAM dies or two channels of a LPDDR4 connected to the same command/address bus. They use the same chip-select, but each has independent data channels. In this parallel connection, all of the DRAM devices receive the same command and address, but they transmit their data over different byte lines. All of the devices are accessed simultaneously, so both of the DRAM devices are always in the same state. They always have the same page of memory open and access the same column, although the data stored in each DRAM is different.

synopsys3

Figure 3. Parallel (lockstep) connection

Series (multi-rank) connection

A second option is to connect devices together in series or multi-rank configuration (Figure 4). This is equivalent to putting multiple DIMMs into the same channel on a PC. The command/address and data buses are connected in common to both of the DRAM devices, but access to the two DRAM devices is controlled independently using two different chip selects on any particular command cycle. The two devices may be in varying states with different pages of memory active. Typically, the SoC controls arbitration of the shared data bus to ensure that the DRAMs do not transmit at the same time.

synopsys4

Figure 4. Series (multi-rank) connection

Multi-channel connection

The multi-channel connection (Figure 5) provides each channel of DRAM or each DRAM device with an independent connection to the SoC, where each device or channel has its own command/address bus, data bus and chip select. This flexible configuration enables each DRAM device (or group of devices) to operate completely independently of the other. They may be in different states, receiving different commands and different addresses, and one may be reading while the other is writing. A multi-channel connection also allows for the DRAMs to operate in different power states. For example, one memory might be in a standby self-refresh mode, while the other is fully active.

synopsys5

Figure 5. Multi-channel connection

Shared command/address (CA) connection

The final configuration option, which is used more commonly in non-low-power DDR installations, is multichannel with shared command/address (CA) or shared AC (Figure 6). In this configuration, both of the DRAM devices receive the same command and address, but like the serial implementation, the chip selects determine which DRAM device is listening on any particular clock cycle, so each device may be in a different state. The DRAM commands are arbitrated between the two channels at the SoC, but each DRAM can transmit data independently.

synopsys6

Figure 6. Shared command/address (CA) connection

Recommendations for connecting channels

Each of these configuration options has its own advantages and disadvantages (Figure 7). For example, the parallel implementation has only eight banks available, and the minimum amount of data that can be fetched is 64 bytes at a time over a 32-bit wide bus. The parallel implementation is also less suited to package-on-package (PoP) implementation.

Synopsys7

Figure 7: Recommendations for connecting channels

The series connection is also less suited for PoP implementation. It does save some DQ pins, but because the DRAM devices share a data bus it offers half the bandwidth of the other solutions, which makes this approach less attractive.

A shared-CA solution saves some pins but the characteristics of LPDDR4 command/address cycles mean that the fetch size would need to be increased to 64 bytes for practical LPDDR4 applications, and the shared-CA structure might be difficult to route in a PoP package.

The recommended solution for most designs would be the multi-channel implementation, which has the benefit of the most available banks (16), the most flexible operation (2 channels operating independently), the highest bandwidth (32 DQs), and the smallest fetch size (32 Bytes) of the solutions presented. While the multi-channel solution requires more CA pins than other solutions, the benefit of having extra CA pins is that it makes PoP package routing easier. All this plus the ability to have different DRAMs in different power states makes the multi-channel approach a winner in many applications.

For more information on to handle 2-die and 4-die packages with multi-channel connections, download the new white paper: Optimizing LPDDR4 Performance and Power with Multi-Channel Architectures.