Turning up the clock in an SoC or adding more functionality raise a variety of new design challenges. Nothing is as straightforward as it seems.
As SoCs get more complex, whether due to higher frequencies or adding more functionality, there is a spillover effect on bandwidth, memory and power.
There is no simple way to just turn up the clock frequency in a complex SoC. That relatively straightforward objective will likely require more power domains, more cores, more ways to move signals along faster, and in some cases such as slow-motion video, the ability to store much larger chunks of data for longer periods of time. It even raises questions about how memory should be utilized.
“That memory increase becomes a question of how efficiently you want to use the memory,” said Anand Iyer, director of marketing for the low power platform at Calypto. “Today, memories have a lot of low power modes. How do we handle those low power modes? That is going to be an ongoing challenge both from a system design perspective, where ‘I’m designing an SoC, I’m using a memory, the design is capable of handling these three modes but now the IP provider is giving me the fourth mode. What changes do I need to make and how does the fourth mode impact the power? What changes do I need to make in the SoC to make use of that fourth mode?’”
Bandwidth is the biggest challenge with respect to memory in going from DDR3 to DDR4, and maybe even to Wide I/O interfaces.
“The main reason for going to such interfaces is looking for higher bandwidth,” said Karthik Srinivasan, principal applications engineer at Ansys-Apache. “Once you go for a higher bandwidth, it comes with a power penalty, and in order to reduce the power penalty you have to lower the supply voltage because there is a quadratic dependence on the supply voltage with respect to power. Once you reduce the supply voltage, it automatically impacts your noise margin. So you have a tighter timing and you also have a lower supply voltage, which is actually a double-edged sword. On top of it, with a tighter timing budget, that means lower capacitances on the pins of your interfaces, which means your ESD devices or your power clamps tend to add additional capacitances and need to be designed in a smarter way.”
DDR4 is starting to show up in devices these days, and at least on paper it should be both faster and lower power than DDR3. But it’s questionable whether there will actually be a DDR5 in the future. What comes next is a subject of intense research and debate.
“One option for communication with future off-chip memories is through fast serial interfaces as in the Hybrid Memory Cube,” said Bernard Murphy, CTO of Atrenta. “But this approach has higher latency, which could drive a need for larger on-chip caches and, who knows, a return to look-ahead approaches in CPUs to maybe enable pipelining memory accesses. All of this would add to power.”
Further, larger caches may lead to fancier cache partitioning and more hierarchy in the cache to minimize power consumption by allowing low power modes in currently inactive areas. “Additionally, we may see more use of scratchpad memories to avoid needing to go to main memory for internal calculations. All of this is likely to complicate on-chip memory architectures,” Murphy added.
Memory generations don’t change quickly, though. In fact, there is more work to be done to improve the throughput on DDR3.
“The workhorse in 2014 and what we expect for a significant part of 2015 will remain the LPDDR3,” said Ajay Jain, director of product marketing at Rambus. “There are a couple of trends that are evolving in the mobile market as far as the user experience is concerned. One is that people are taking a lot more selfies and they are taking slow-motion video. Slow-motion video is very memory intensive because from the camera, you are taking snapshots at a very, very high rate—120 or 240 frames per second. That pixel data needs to be stored in memory right away. It has to be buffered there, and the caches are not big enough to be able to hold that much data, so it’s got to make its way into memory.”
The industry addressed this with LPDDR3, adding dual channels to be able to handle that kind of bandwidth. The downside with that approach is the power consumption increases, because now instead of one set of channels there are two. So there is twice the amount of switching, and the cost also goes up.
Jain noted another trend in the tablet space, which is the addition of 4K displays. “When you are talking that high of resolution, then your frame buffer still exists in memory. That needs to be very high performance, very high bandwidth.”
Still, he sees the LPDDR3 memories are going to continue being widely deployed, and will have some price advantage. For companies trying to do advanced features at a lower price point, LPDDR3 is expected to play a significant role.
“Looking forward, we are beginning to see the first LPDDR4s come out,” he said. “One of the major memory manufacturers has announced production and the really high end phones will be the first ones to adopt those. The rest of the market, the second-tier vendors, expect to be able to get access to those devices in 2016. LPDDR3, particularly with its low voltage swing, is very much there through 2016 and even 2017. In LPDDR4, they lowered the core voltage of the memory to 1.1volt and that does end up reducing power. The second thing they did was the low swing I/O (LVSTL), so that is on the order of 0.4 volts as opposed to the 1.2-volt swing. That is the second contributor to the power efficiency.”
On the infrastructure side of the market, both servers and networking devices, have traditionally been using a DDR3 technology. However, going forward, it’s a relatively straightforward transition from DDR3 to DDR4, but there are also other technologies that are being used, including LPDDR4.
“A number of memory manufacturers must be trying to expand their market share or their market footprint of LP4,” said Frank Ferro, senior director of product management at Rambus. “One of the interesting things about LPDDR4 and DDR4 is that now you’re basically at the same speed and the same bandwidth and actually LP4 vendors have stared talking about an even higher speed grade of 4200MHz. Now, especially in the high end applications where they need bandwidth, manufacturers are starting to look at LPDDR4.”
He believes most customers in the server space will go to DDR4 and then will start to look at alternate solutions. “Of course, the other two that are raising their head are high-bandwidth memory (HBM) and Hybrid Memory Cube (HMC). Those are the memories that are being looked at and evaluated. If you’re really thinking about 2015, the reality is you’re going to start seeing DDR4 deployment around the time of IDF in August. A number of server manufacturers started announcing use of the new Intel Xeon E5 with DDR4. ARM also has indicated plans here for the server market.”
Because memory accounts for about 20% of the power consumption in the datacenter one of the good things about moving from DDR3 to DDR4 is a power reduction on the I/O from 1.5 volts down to 1.2 volts, which is about a 25% power savings. That could translate into as much as 8% of the total datacenter power savings, Ferro explained.
Understanding design tradeoffs
With all of the options available, it is critical to get the design tradeoffs right.
As soon as power-management techniques are introduced, there is a penalty to pay. “It’s going to impact your noise margins if you have a power gate,” said Srinivasan. “It is going to eat up the overall voltage that your bit cells or other peripheral logic are going to see. That definitely impacts the voltage drop. Reliability is another aspect. If you’re not sizing up the power gates properly, if you don’t have sufficient number of power gates, the power gates can be stressed to the maximum and it can also impact the reliability. As such, analysis is becoming more important. Customers are doing analysis-driven optimization so they are analyzing the design at the IP level, at the subsystem level, for example, when they are designing the memory, previously they used to wait until the memory is ready and performed the simulations. But nowadays, people are analyzing at a subsystem level to make sure it is robust before integrating it at the next level.”
Interestingly, one thing he noted seeing more and more is that for people who are sticking with older technology nodes, power and cost are becoming key drivers making them look into the analysis aspect of the design. “If you have higher margins, basically you are paying a higher penalty in terms of cost. You are allocating more resources for metal buses, you are allocating more resources in terms of area — you are paying a higher penalty there. That is also something that necessitates engineering teams who are using older process nodes to look into the analysis as one of the ways to minimize the cost. And not just a system level analysis. It’s more of a co-analysis: how do you perform and analysis and give feedback to the next level?”
Leakage in on-chip memory
One more issue that needs to be considered is bit-cell leakage. There is a bit cell that stores information, and there is the periphery that gets the information from the bit cells or writes to the bit cells.
“There is a lot of focus to reduce the leakage and dynamic power of those memories,” said Prasad Saggurti, the product marketing manager for Embedded Memory IP at Synopsys. “Some of the things we’ve been asked to do in our memory compilers is allow the periphery to be shut off. If, let’s say, you want to keep the contents of memory alive — we call that a deep sleep mode — what happens is we can go in and shut off the power to the periphery while still keeping the power to the bit cells. And we do one thing in addition. What we found is that the minimum voltage at which you can operate safely is higher than the minimum voltage at which you can store the information safely. In other words, minimum operating voltage is much higher than minimum retaining voltage. So when we go into this mode to retain the state of the memory, the time we not only shut off the power to the periphery, we also reduce the voltage to the actual cells storing the information and we don’t put that burden on the chip designers. We have a circuit inside the memory IP itself, which automatically reduces the effective voltage to those bit cells. That saves a lot of leakage.”
Overall, as generations of DRAM move forward, the drive and termination aspects of the power are more of an issue so engineering teams try to manage it different ways, predominantly by lowering the required power supply for the drivers. Engineers are also looking at applications where they can actually turn off the terminations of the drivers so they can reduce the power, noted Lou Ternullo, product marketing director for the IP group at Cadence.
Another thing designers are looking into are optimization opportunities in terms of memory, Iyer said. “For example, you are accessing the memory and multiple addresses are being accessed and maybe it’s a multiplexed state where you are always accessing on a specific location when that memory is active, and all the other addresses are generated when the memory is not active. Today, the tools don’t take care of this kind of a dynamic stable addresses. We need tools to handle these kinds of situations.”
Clearly, it is an active time where memory design is concerned with the new challenges being addressed in a variety of ways. What’s next in terms of solutions will be very interesting indeed.
Leave a Reply