The Power Of eDRAM

It’s not just the clock speed that increases performance of a processor. Memories play a big role.

popularity

In last month’s article we looked at different aspects of technology nodes and the multiple techniques that are used to keep scaling on its path of increasing density. From an energy standpoint, it’s expensive to move data around and with the high bandwidth that’s needed to keep processors “fed,” engineers are looking at ways to keep data closer to the processing logic and minimize the energy that’s needed to get that data there as well as getting it there quicker.

Embedded DRAM (or eDRAM) has been in existence for a long time and has met with varying success. As we noted last month, the process development teams for processors (logic) and memory went off on their separate ways many years ago because their end goals weren’t fully compatible. Incorporating both (logic and DRAM) on one silicon die and getting both to work well is a challenge. Because of the growing needs to keep the data closer and certainly as more pieces of the system continue to get incorporated onto the same die, there appears to be renewed interest in eDRAM. TSMC and Renesas have been using eDRAM in chips that have gone into Xbox and Nintendo Wii gaming systems. IBM started incorporating eDRAM into its Power processors back at the 45nm node with its Power7. A notable new entry into eDRAM is a company with DRAM manufacturing history, but which left that market to pursue processors instead. Intel’s Haswell family now also includes an eDRAM version, although not necessarily in the same manner as some of the previously mentioned designs.

In a previous Power8 article, the performance and scaling benefits of IBM’s eDRAM capability were mentioned. One thing that should be stated upfront is that Haswell and Power8 are targeting different parts of the market. Intel’s Ivytown is a closer match to Power8, 15-core vs. 12-core, in terms of large processors, so any comparisons made here are meant to give the reader a better feel for what’s happening with the technologies rather than direct part-to-part comparisons.

IBM_Power8_Core
Figure 1. IBM’s Power8 [1]

Figure 1 shows a layout diagram for Power8 that clearly shows the L3 cache implemented in IBM’s eDRAM on the chip. Power7 (in 32nm SOI) had 80MB of on-die eDRAM and Power8 has 96MB (8MB per core shared). Intel’s 15-core Ivytown processor has 37.5MB of L3 cache implemented in SRAM. Both of these processors are implemented in their own “22nm” technologies. The IBM die is 649mm2 with 4.2B transistors whereas Intel’s is 541 mm2 with 4.31B transistors

IBM_Centaur_Core
Figure 2. IBM’s Centaur Memory Buffer Chip [2]

Figure 2 shows a layout diagram for the companion Centaur memory buffer chip. It has 16MB of on-die cache that allows Power8 to expand its L4 cache size to 128MB. As shown, the memory is embedded into the Centaur chip, but is off-chip from the Power8.

Haswell_Package_Fig
Figure 3. Haswell package layout diagram [3], [4]

Figure 3 shows the package layout for Intel’s Haswell part with eDRAM. One thing that is immediately noticeable is that the eDRAM is sitting alone in its own chip. Intel has implemented an On-Package IO (OPIO) interface to the memory. Clearly, this is more efficient than having to go off-package but not as efficient as staying on-chip. Figure 3 also shows a picture of the metal-insulator-metal (MIM) structure of the capacitor for the bit cell. IBM by contrast uses a “trench” approach for their eDRAM. There’s an excellent comparison article by Chipworks here of several different structures used in the industry for eDRAM along with pictures of the IBM process and others.

Intel claims 102.4GB/s at 1W using their OPIO. IBM claims 3TB/s L3 bandwidth (across 12-cores @ 4GHz) with its on-chip eDRAM implementation. Intel’s implementation has provided clear power and performance benefits for its parts, but the IBM results show that there is still more to be had if that eDRAM is placed on chip.

In terms of density comparisons with SRAM, Intel reports a cell bit area of 0.108um² for its SRAM and 0.029um² for eDRAM. IBM has 3 versions of SRAM with the performance-density balanced (6-transistor) version at 0.144um² and their eDRAM at 0.026um2. There are also a lot of other factors that come into play when evaluating DRAM, such as performance and data retention, but this at least provides a quick high-level comparison. Given the growing needs for higher bandwidth and better energy efficiency, it’s likely that we’ll continue to see more focus on memories, whether it’s stacked, on-package, or embedded DRAM.

[1] E. Fluhr, et. al., “Power8™: A 12-Core Server-Class Processor in 22nm SOI with 7.6Tb/s Off-Chip Bandwidth”, Session 5.1, ISSCC 2014.
[2] J. Stuecheli, “Power8”, HotChips25, August 2013.
[3] F. Hamzaoglu, et. al., “A 1GB 2GHz Embedded DRAM in 22nm Tri-Gate CMOS Technology”, Session 13.1, ISSCC 2014.
[4] N. Kurd, et. al., “Haswell A Family of IA 22nm Processors”, Session 5.9, ISSCC 2014.