Powerful Memories

As memory takes up a larger portion of the SoC, questions abound about what can be done to reduce power and improve performance.

popularity

Memory consumes more of the surface area of a die than any other component. So what changes have happened over the past few years to reduce the power consumption of memories, and where are the big opportunities for saving power? Let’s take a closer look.

A Growing Concern
One of the key drivers for SoCs is the desire to reduce product costs, reduce form factors, reduce power, increase performance and increase the amount of functionality. The only way to do this has been to pull increasing amounts of the total system into a single chip.

“As you integrate more functionality, you have to bring more memory onto the chip because you are limited by I/O,” says Prasad Saggurti, product marketing manager for embedded memory IP at Synopsys. “You also have extra space to use because you get more space for the same number of I/Os. It is easier to add storage than additional logic, but as you add logic, it generally consumes more memory.”

Anand Iyer, director of product marketing for the Low Power Platform group at Calypto Design, provides another reason for increasing memory content. “One of the ways to increase throughput is to increase the number of pipeline stages,” he says. “Increase in pipeline stages requires overall memory requirement to increase.”

How much of the chip’s area is consumed by memory? Patrick Soheili, vice president and general manager for IP Solutions and vice president of business development at eSilicon says, “Three years ago when we were at 40nm and we may have seen 35% to 40% of the die being consumed by memory. Today it is closer to 50% to 55%. We have seen a few cases where it is more than 85% of the chip area. The trend line is up and to the right.”

He’s not alone in that assessment. “There is a greater percentage of the chip being consumed by memory,” confirms Synopsys’ Saggurti . “A few years ago it was being projected that up to 80% of the die area would be consumed by memory by 2014 or 2015, but it is not as much as that. We are seeing 60% to 70% quite routinely. In other cases such as GPUs, the percentage is higher. It is growing.”

Venkat Iyer, Chief Technology Officer at Uniquify, provides a concrete example for a chip the company recently completed. “The chip size total is 100mm^2. The memory count total was 3,183, with approximately 70 million memory bits. The chip area occupied by memories equaled 48,982,000 square microns. That is, almost 50% of die area was consumed by memory.”

“It is not just about performance or latency, it is power that has become a major problem,” says Dave Lazovsky, CEO of Intermolecular.

Adds Calypto’s Iyer: “Memories typically consume up to 50% of power. With the number of memories growing, power concerns are growing with every new application.”

The Dominant Memory
While there may be hundreds or even thousands of memory instances on a chip, most of them have the same bit-cell at their core — the SRAM cell. “The memory bit-cell for SRAM has not changed for several decades,” says Arvind Shanmugavel , director of applications engineering at Ansys-Apache. “We are still using the same 6T cell, but the way in which they have been used has evolved.”

eSilicon’s Soheili explains that “while SRAM size is larger than for embedded DRAM, there are yield issues and the total cost of ownership for embedded DRAMs is higher. Because of this, most of the industry has stuck to using SRAM on chip. Attempts to create smaller SRAM cells have also had yield issues. A few micro-controllers do use embedded NVM. We see a trend of back to the basics for advanced technology nodes because it is safer.”

Calypto’s Iyer provides another reason for sticking to the basics. “Embedded DRAM or flash are more expensive because they require a different manufacturing process that is not as well optimized for logic.”

Synopsys’s Saggurti agrees. “Embedded flash is primarily in microcontroller applications. These are in larger mature nodes, possibly down to 40mn but most are around 55nm. Flash is always a few nodes behind the leading edge. There are some ROMs but no embedded DRAM. TSMC stopped supporting it at 28nm. The only one left supporting it is IBM at smaller geometries.”

The DRAM Dilemma
Off-chip, the DRAM is still king, but it is not immune to problems. “The biggest part of what is contributing to power consumption is the memory core, particularly for 8-bit devices,” said Ajay Jain, director of product marketing at Rambus. “The DRAM industry is taking this very seriously, which is why you rarely see more than a 266MHz clocking frequency.”

The DRAM uses a capacitor to go along with the 1T bit-cell. It would seem that this would be a smaller and denser memory than SRAM that uses a 6T cell, but Saggurti says that “the capacitor is not scaling and so the benefit is not there. In addition, the number of extra layers needed for manufacturing is growing.”

Intermolecular’s Lazovsky agrees. “The DRAM capacitor technology is using new materials and more complex capacitor architectures, which is driven by the desire to have ultra-low-leakage DRAM technology,” he says.

The DRAM refresh cycles also consume a lot of power and have lower access speeds than SRAM, meaning their place on the main die has faded and eDRAM is not likely to see mainstream adoption.

“In signaling, the power is consumed in driving a high-speed bit rate from the driver to the package, over the PCB to another package, and then to the receiver,” says Rambus’ Jain. “That whole channel provides a headache for maintaining signal integrity. By the time the signal gets to the other side it doesn’t look anything like what you started with.”

Power Focus
Given that on-chip memory consumes 50% of the space and power, it would seem logical to conclude a lot of time and attention is given to memory optimization. It would appear that this is not the case. eSilicon’s Soheili claims that “they tend to use an off-the-shelf compiler for memory, and this is likely to have a lot of fat on it.”

Given that the SRAM cell has basically remained unchanged for 20 years, it also may be reasonable to assume that there is little that can be done, but Ansys-Apache’s Shanmugavel disagrees. “Using power gating, keeping the cells at a different voltage and other techniques have evolved and new architectures are being driven by power. The peripheral logic has seen a lot of change.”

Power Reduction Techniques
“Historically, SRAMs were designed for speed,” says Shanmugavel. “Today, additional operating modes are being added.”

Synopsys’ Saggurti provides an overview of the memory reduction techniques being used today. “Consider that a memory has bit-cells, where the data is stored, and the periphery that allows you to access the data. The first thing you may be able to do is to shut down power to the periphery. The state of the memory is retained. Or you may want to shut down the power to the entire memory. You could also run the memory at different voltages and this could mean taking the bit-cell to a voltage, which would be considered safe, and the periphery to an even lower voltage and still be safe. With this dual-rail mode, you have level shifters inside the memory.”

When power is removed from the periphery the memory is normally described as being in a sleep mode and if power is removed from the bit-cells, it is in a shutdown mode.

“There are various levels of sleep mode depending on the amount of the periphery that is being shut off,” explains Soheili. “The tradeoff is between leakage reduction and wakeup time.”

Iyer adds another mode. “During light sleep mode, contents may be read but you cannot write to it.”

But what of the bit-cells themselves? Soheili explains that “the bit-cells provided by the foundry have a defined voltage range in which they can operate.”

Wakeup time is important. Saggurti describes a light sleep mode that enables a quick wakeup (around 1nS). “This is done by source biasing the bit-cells. In a 28HPM process with voltage at 0.9V you could go down to 0.81V and be fine, but if you are just retaining the state of the memory, you could go down to 0.72V. To make this possible we source bias the bit cells so that it goes into a very low leakage state.”

So how far can this actually go? “More aggressive low power designs may use sub-threshold operating cell design, operating at <0.5 volts,” claims Hem Hingarh, vice president of engineering at Synapse Design. “Custom memory circuit design allows us to create memories that are working at lower voltage, but this requires much more work involving test chips.” Challenges remain. Shanmugavel says power modeling of memories is becoming more challenging. “Traditionally, with a read or write, it had a single current profile. With all of the possible modes today, you have to have a proprietary modeling format to be able to have accurate power figures.” Having these modes is just part of the problem. “One of the challenges with these power modes is that today’s system/SoC is not equipped to handle these modes,” Iyer says. “SoC designers often use fewer modes since rigorous analysis is needed to find out conditions to put memories in appropriate modes. This results in higher power consumption than what can be achieved. Tools are available that can automate this analysis, find out conditions to enable various power modes and update the RTL. Designers are beginning to use these tools during RTL design.” 3D-IC Impact
But for all of the power optimization possibilities on-chip, DRAM is still the memory of choice off-chip. That means that the contents have to be transferred across the chips boundary. “Capacitance is a linear function of power consumption – 1/2CVdd2” says Shanmugavel. “Capacitance comes from the board interconnect and that capacitance can be large.”

“It is more expensive to move information than compute it,” asserts Lazovsky. “Off-chip and off-package is two orders of magnitude greater than the cost of storing a bit. Finding ways to move data more efficiently is paramount.”

Moving that memory in-package appears to be the best hope to reduce the transfer power. “3D IC is being driven by power consumption and then secondarily form factor. While they are more expensive to manufacture today, the costs will catch up,” says Shanmugavel.

“With 2.5D and 3D, that simplifies the channel because you’re not spanning the PCB,” said Jain. “You don’t even need a PHY in the traditional sense.”

There are potential economic benefits, too. “The costs associated with these advanced nodes and the risks associated with them are bringing 2.5D/3D into the forefront,” says Soheili. “We are currently working on a design using an interposer, stacked memory, and either an FPGA or an ASIC in the middle. The cost right now is a Catch-22. If a lot more people jumped right in, the costs would go down, but most people will not jump in until the costs have come down.”

Anamul Haque, associate vice president of engineering for physical design at Synapse adds, “Stacked dies are being used for memories because it gives them more bandwidth. We can also have different configurations with stacked dies. This provides us with more flexibility.”

We all know the success stories from Xilinx regarding their usage of 2.5D technology for their Virtex-7 product line, but few other public examples exist outside of the memory industry. “I can rattle off 20 or 30 companies who are experimenting with it,” says Soheili.

As with all new technologies, additional challenges arise. “When dies are stacked there are different thermal problems because you are restricting the heat transfer pathways,” says Shanmugavel. “System temperatures could increase. Doing thermal analysis on 3D ICs is almost mandatory but for 2.5D this is not the case because the interposer is not creating much heat.