Moore Memory Problems

The scaling of the 6T SRAM cell is slowing and the surrounding circuitry is getting more complex, so more of the die will be taken up by SRAM at future nodes.

popularity

The six-transistor static memory cell (SRAM) has been the mainstay of on-chip memory for several decades and has stood the test of time. Today, many advanced SoCs have 50% of the chip area covered with these memories and so they are critical to continued scaling.

“The SRAM being used in modern systems is similar to the SRAM they were using in the 1970s and 1980s,” says Duncan Bremner, chief technology officer at sureCore. “The feature size is smaller but little else has changed.”

The 6T SRAM cell looks like two back-to-back inverters with a couple of drive transistors. The cell is designed by the fabs, optimized for fabrication, and in general breaks the conventional design rules that would be applied to the process.

briansram1
IBM 6T memory cell. Source: App Note 1997.

“When we look at the scale of the memory macro going from 28nm planar to 16 finFET process, the expectation is that the macro should shrink about 40% to 50% and you will find that the foundry bitcells are able to achieve that,” says Deepak Sabharwal, vice president of engineering for IP at eSilicon. “The foundry pays a lot of attention to how they do the patterning and they can demonstrate bitcells that are .6 of the previous technology. The old standard used to be .5, so there has been some slowing.”

Farzad Zarrinfar, managing director of Novelics group of Mentor Graphics puts some hard numbers behind the shrinking of the 6T bitcell. “In general, at 65nm technology, the typical drawn bitcell size was .525 µm², at 40nm it went to .299 µm², about a 44% reduction. At 28nm it went down to .198 µm2 or .1566 µm² if you consider the ultra-high density process. That is another 36% reduction. At 22nm it went to .108 µm², another 44% reduction. Going to finFET at 16nm it went to .07 µm² another 36% reduction.”

ASML’s executive vice president and CTO, Martin van den Brink, stated at ISSCC 2013 that the SRAM bit-cell size might not be reduced from the 20nm to 10 nm node, and might even get larger at 7nm as it may need more than 8 transistors.

But not everyone in the industry has such a negative view. Kurt Shuler, vice president of marketing for Arteris, predicts that “on-chip SRAM is going to scale a bit faster than logic. This is due to design rules moving more and more towards single dimension (1D) metal layout rules. This will impact logic in a negative way, 15% or more, on top of scaling. SRAM is already regular and will get denser faster than logic.”

Zarrinfar talks about some of the changes that have already happened in the bitcell. “Layout has become more complex and difficult. For example, at 40nm, poly could go in any direction but at 28nm poly can only go in one direction. Poly interconnect capacitance is increasing and this is causing problems from a power point of view.”

“Even with all the manufacturing and mathematical approaches that have been applied to the 6T SRAM cell they are starting to have issues at the basic circuit level,” says chief executive officer for Kilopass Technology. “Unlike logic functions that have been through several revolutions, the SRAM is in dire need for a new circuit design.”

Power is an increasing problem, and the way to control power is to reduce voltage. “As you get move from 28nm to 14nm the supply voltage drops and you have to start playing tricks to maintain a solid 1 or 0,” explains sureCore’s Bremner. “The key is to create more sensitive read amplifiers to detect the signal and more sensitive write amplifiers that can drive a value into a bitcell.”

Variation and characterization
As process geometries get smaller, there are additional pressures on bitcell design. “If you consider 60nm standard CMOS transistor and halve the size of it, most things track with the conventional laws of physics,” says Bremner. “But if you take a 60nm transistor and quarter it, down to 15nm, the rules change because quantum effects start coming into play. This starts to kick in around 28nm, but below this if you try and divide 99 charges in a gate region by 2, you start seeing the effect of individual atoms and atomistic variability. Plus or minus an atom is now significant whereas at 60nm it didn’t really matter.”

Process geometry really does matter here. “There is more sensitivity to variability in the smaller geometries, and that increases the demand for comprehensive Monte Carlo simulation,” points out Zarrinfar. “This means that the surrounding circuitry has to be able to handle 3, 4 or even 5σ variability to ensure there are no yield issues.”

Even that may not be enough. “We see designers having to design to higher sigma, up to 7σ for memory bitcells,” says , president and chief executive officer for Solido Design Automation. “This is because of the larger number of bitcell instances. Hierarchical Monte Carlo methodologies are being deployed for full chip memory statistical verification, to improve power, performance and area by eliminating overdesign.”

Bremner helps to put this into perspective. “If I have one cell failing in a thousand, a .1% failure rate, that is 3σ and is believable. If you have a gigabit of memory you are looking at 1 x 10^9, which is much more challenging. The likelihood of a failure is much higher. With so much memory on a chip, there is more chance statistically that you will have a bitcell that is more marginal than the rest. One of the challenges with memory design is dealing with such high sigma variability. Plus or minus 3σ, or even more so with +/- 6σ, presents a problem because of the effort required to do Monte Carlo simulation.”

Can we expect anything different for finFET-based bitcells? “FinFETs gives you a leakage reduction but there is no real gain in terms of variability,” says Sabharwal. “When we design SRAM macros we have to guard band for about a 3X drop in the cell performance due to variation.”

Designing around the bitcell
While technically the bitcell continues to scale, additional pressure is being put on the surrounding logic. “With the smaller geometries, Vt is dropping,” says Zarrinfar. “They have to drop the voltage because power is dependent on it – CV²f. When the voltage is dropped, and when going from LP to ULP process, the Vt drops and it makes it more difficult to read and write the memory, meaning that you have to come up with very elegant circuitry. Techniques such as write assist and read assist have to be integrated into the newer nodes. This was not necessary at 65nm.”

The challenge, assuming the same circuitry and implemented in a different technology is, “Can we achieve a similar scaling and therefor match what the foundry is doing with the bitcell,” says Sabharwal. “It is a difficult question. The complexity lies in the details. If you look at the active bitcell dimensions, we are approaching cells where the wordline direction is 3X or 4X larger than the bitline direction. This is a very skewed kind of cell. The circuitry needed for the wordline needs to fit into a very tight pitch and requires very careful thinking about the structures. You cannot expect to port what you had at 28nm directly to 16nm and achieve the scaling.”

The bitcells are extremely sensitive and need to be enclosed in an environment that can protect it. “You have to put special circuits for doing write and read operations – this is called read assist and write assist,” explains Sabharwal. “Those circuits are considered to be part of the periphery but in reality they are really part of the bitcell array because they are needed just to gain access to the bitcell. So the scaling of a pure bitcell may look as if it is achieving the technology trend but if you burden that scaling with the extra circuitry needed for assist, then you will find the number don’t look that good.”

Read and write assist is a technique where you dynamically change the operating conditions of the bitcell. As an example, the voltage on wordline may be raised above the cell voltage. “The loading in the wordline direction becomes significant,” says Sabharwal. “One has to put repeaters in or use alternative structures of routing the wordlines to get around the problem.”

Going back to the equation for power – CV²f, another approach is to concentrate on the C. “A lot of the power is consumed by the movement of charge on and off parasitic capacitances,” points out Bremner. “We changed the architecture of the memory to optimize this. Now the read amplifiers are split into local and global amplifiers. This means that we do not have a huge wordline that goes across the whole array and so we can swing a smaller sub-division of the wordline and use that to signal to the outside world what we are seeing.”

Along with variability there are yield issues meaning that test and repair are becoming increasingly important for large memories. “Repair and ECC are possible for memories of most kinds, essentially reducing the raw-yield requirement by roughly 10X,” says Cheng.

New approaches
Each new process generation has needed innovation to keep the 6T SRAM cell alive and some are looking for alternatives. Those alternatives are new designs for the bitcell itself, a migration to different memory types, or advanced packaging techniques.

“Many new 6T, 8T, 9T, and even 10T designs are competing to replace current 6T SRAMs,” explains Cheng. “Embedded DRAM, which was thought to be dead after 40nm, is also making a comeback as an SRAM replacement. FD-SOI, with its back-biasing essentially raises the Vt and is promising to reduce the standby power by 5X as well.”

Recent techniques have been to use multiple Vt implants and re-shape the N and P transistor characteristics to enable more precise characteristics. However finFET imposes discrete quantized sizes for both N & P transistors. “This is an issue because N & P are fundamentally different in terms of speed and current density,” explains Cheng. “Traditional sizing techniques for N and P transistors to create 6T SRAM simply don’t work. The manufacturing team essentially has to use Vt implants and other dopant changes to re-shape the I-on and I-off characteristics for N and P to balance the two.”

Another possibility is to substitute SRAM for other memory types. “As logic scales down, the need for embedded (NVM) increases and goes beyond what embedded flash and eFuse can offer, considering manufacturing cost, area and the development cycle,” says Wlodek Kurjanowicz, chief technology officer at Sidense. “One-Time Programmable (OTP) memory becomes the logic NVM of choice in all IoT ICs, analog/RF and large SoCs alike, and it may become the only viable option at next-generation CMOS nodes.”

The demand for increasing memory is pushing new techniques such as High-Bandwidth Memory (HBM), Hybrid Memory Cube (HMC) and TSV-based 2.5D and 3D memory consolidation and integration. “The need for bandwidth and capacity, to lower the power dissipation, the need for bigger and bigger scratchpads for multi-core processer systems is all increasing,” says Patrick Soheili, vice president of product management at eSilicon. “We see this in many system types. When you can have memory on-board you avoid a lot of traffic or can manage the traffic better when you have high-speed gateways.”

Those kinds of tradeoffs are becoming commonplace. “For some time there has been a balance between SRAM and logic on an SoC,” says Shuler. “Using external memories with die stacking has advanced so fast that the race to put all memory on the SoC with eDRAM has just not proven to be practical.”

Shuler also points to another way to relieve some of the pressure. “Most SoCs have moved away from using large discrete blocks of memory on the chip, instead distributing program memory into I/O caches or LLC (Last Level Caches). When connected to a network on chip (NoC) these can be shared among many processing units like the CPU, MCU, GPU, and DSP, allowing memory to be allocated more efficiently. This efficiency saves power with the minimization of unnecessary DMA and multiple accesses.”

Precious memory
One final approach, but not too popular within the industry, is to consume less memory. “There are a lot of people who keep telling me that memory is free and getting cheaper,” points out Bob Zeidman, president of Zeidman Consulting. “In the past, most of the controllers were going into industrial things that people were willing to pay more for. The (IoT) is changing that and I think memory will become an issue.”

Zeidman points out that there are driving forces on both sides of this argument. History has meant that bloat continues. “For embedded systems you also have the RTOS, drivers and libraries of code. Nobody made much of an effort to make it small and kept adding to it. These require a lot of memory. Advanced programming languages, object-oriented languages, languages without type casting, garbage collection – these require a lot of memory. Nobody wants to train a software engineer how to write efficient code.”

But with the IoT, cost and power become much larger concerns. Zeidman believes it should be possible to create tools that can work out what you don’t need and then to achieve a significant reduction in memory footprint. Maybe that will be enough to stop the pressure on total memory area increases.



Leave a Reply


(Note: This name will be displayed publicly)