Scaling The Lowly SRAM

Choosing the best process node and materials for memories, as well as where they come from, isn’t so easy.

popularity

By Mark LaPedus
Chipmakers face a multitude of challenges at the 20nm logic node and beyond, including the task of cramming more functions on the same chip without compromising on power and performance.

There is one major challenge that is often overlooked in the equation—scaling the lowly static RAM (SRAM). In one key application, SRAM is the component used to make on-chip cache memories for microprocessors in PCs and mobile products. SRAM is inherently fast, but the device is expensive and occupies an inordinate amount of real estate on the chip.

So, the ability to shrink the SRAM bit-cell is critical at each node, while simultaneously maintaining the power and performance. Starting at 20nm, the challenges escalate in SRAM scaling, thereby impacting the ability to design new and faster caches.

“The use of SRAM on Intel products varies by market segment from about 10% to around 50% of the die area,” said Kaizad Mistry, vice president and director of logic technology integration at Intel. “It is important to scale this component of the technology by close to 0.5x from one generation to the next. They key limiter to scaling the SRAM is to minimize transistor variation, which can make the SRAM cell unstable.”

To address the SRAM scaling problem, chipmakers face some tough process and design choices. On the process front, there are various tradeoffs between planar transistors and finFETs. It gets more complicated when choosing between the two technology platforms—bulk CMOS and fully-depleted silicon-on-insulator (FD-SOI). Startup SuVolta also has garnered some attention with its dual-gate 2D transistor, but so far it remains an underdog in a competitive market. SuVolta’s technology is based on a super-steep retrograde well (SSRW) scheme.

On the design side, many chipmakers may continue to use the traditional six-transistor (6T) SRAM cell. There are other options, such as the 8T cell and multi-port architectures. Design solutions such as assist circuitry are emerging as a means to boost the noise margins and reduce the operating voltage of the SRAM.

Another key is the ability to obtain third-party intellectual property (IP). The foundries and third-party IP houses tend to provide SRAM IP in bulk CMOS, although vendors also offer SOI IP. “One of the problems with FD-SOI is that there is not enough IP being developed, but this problem is finally being addressed,” said Asen Asenov, chief executive of Gold Standard Simulations, a provider of simulation services. Asenov is also a director for a startup called sureCore, which is developing SRAM IP for FD-SOI and other processes. The startup is working on physical IP based on FD-SOI with STMicroelectronics.

Going with finFETs
Generally, the cache is used to speed up the access times between the processor and main memory. In a system, cache is organized as a hierarchy of more cache levels, such as Level 1, Level 2 and Level 3. For years, chipmakers have used SRAM for the cache and DRAM for main memory. SRAM is faster but is more expensive than DRAM.

And for years, chipmakers usually have opted for the 6T SRAM cell. “It’s basically six devices,” said Betina Hold, a design engineer consultant for ARM. “They are complicated because these devices don’t necessarily do what you want them to do. If you reduce the voltage in logic, it gets slower. The same thing happens in SRAM. And as you shrink SRAM, the variations rise.”

One way to address the SRAM scaling problem is to migrate from planar transistors to finFETs. “One of the advantages of the 3D tri-gate transistors we introduced at 22nm is that the Vt mismatch transistor variation is fundamentally improved,” said Intel’s Mistry. “This allowed us to continue to scale the SRAM cell at around 0.5x. We believe that process and design will allow us to continue to scale SRAMs over the next several generations.”

Intel moved to finFETs, or what it calls tri-gate, for its microprocessor designs at 22nm. Intel’s processes are based on traditional bulk CMOS. For its 22nm tri-gate transistor, Intel recently showed 6T SRAM cells, which have a 1.85x increase in density over its 32nm planar design. Intel also implemented a transient voltage collapse write-assist and wordline underdrive read-assist technology to address process variation, enabling a 175mV reduction in the supply voltage required for a 2-GHz SRAM operation.

Other leading-edge chipmakers are still in the planar era at 20nm. Taiwan Semiconductor Manufacturing Co. Ltd. (TSMC) recently described a 112-megabit SRAM cell for its 20nm planar process, also based on bulk CMOS technology. TSMC uses a partially suppressed wordline scheme for read assist and a negative-bitline-boosting scheme for write assist. The area cost of the read- and write-assist circuits are 1.2% and 3.7%, respectively. Still, with the circuitry, the overall Vddmin improvement is more than 200mV.

Following its 20nm planar process, TSMC plans to debut its initial finFET process at 16nm. “For finFETs, you also need design assist circuitry for the SRAM cell to maintain the Vcc reduction trend,” said Cliff Hou, vice president of research and development at TSMC. “We’ve already implemented a design assist circuit on our 16nm finFET macro. So in our 2 Megabit, high-density SRAM cell, we can lower down the Vcc to 0.6-miliVolt.”

Making a case for FD-SOI

Intel, TSMC and Samsung are in the bulk CMOS camp. Another foundry competitor, GlobalFoundries, is supporting three options: bulk CMOS, FD-SOI and SSRW. “SRAM scaling challenges are twofold-cell stability and physical area. SRAM Vmin is a function of transistor Vt mismatch, which has two components-structural and random dopant fluctuation (RDF). Vt mismatch, to the first order, is inversely proportional to the size of the device/cell. Therefore, the larger the cell, the better the stability. On the physical area challenges, SRAM cell size scaling is limited by a few critical ground rules,” said Subramani Kengeri, vice president of advanced technology architecture at GlobalFoundries.

“Structural variability is slightly higher in a 3D device compared to a planar device. Any fully depleted device such as the ones used in FD-SOI or finFETs help in the RDF component of the mismatch. Both structural and RDF components have to be carefully evaluated before choosing the optimum device for SRAM scaling,” Kengeri said. “There are many read/write circuit techniques to improve Vmin, but they add an area and performance overhead, and hence, requires careful choice and design optimization of area/performance with Vmin.”

In FD-SOI, meanwhile, the top layer of a wafer from Soitec is thin and uniform, enabling planar fully depleted transistors with a thickness down to 5nm under the gate. Between the top layer and the underlying silicon base is a layer of buried oxide at 25nm thick. FD-SOI also boasts a back-bias feature. Compared to bulk, FD-SOI provides a 25% area savings for the SRAM bit-cell, according to some researchers. FD-SOI has a lower static noise margin, but it has a higher write ability and read current, according to researchers.

One knock on SOI is the cost and the ecosystem. SOI wafers are more expensive than bulk CMOS. And so, IBM and STMicroelectronics are among the few leading-edge chipmakers that use SOI. “IBM and ST have nice niches where they can absorb the cost of the SOI materials,” said Dean Freeman, an analyst with Gartner.

STMicroelectronics argues, however, that FD-SOI requires fewer masks and process steps than bulk CMOS. This compensates for the premium cost of the wafer and makes the manufacturing costs comparable. And so, chipmakers should take a harder look at FD-SOI. “For Intel, finFETs are a good solution, because they deliver good performance for microprocessors,” said Gold Standard’s Asenov. “I am not so sure that (finFETs) are the best solution for everyone. I have not seen any finFETs for low-power applications. This is what worries me a little bit. If it takes Intel four or five years to implement low-power finFET technology, how long will it take the foundries to do the same?”

In other words, FD-SOI levels the playing field against bulk. “The (cost delta) goes away with fully-depleted devices,” said Harold Pilo, a senior technical staff member at IBM. Recently, IBM described an SRAM cell based on 22nm FD-SOI. The cell features a fine-granularity power-gating technique, which reduced bit cell leakage by 37%, according to IBM.

Regarding the ecosystem, STMicroelectronics and GlobalFoundries are providing foundry services for planar FD-SOI. In its latest roadmap, STMicroelectronics is ramping up the 28nm FD-SOI process, with plans to debut the 14nm and 10nm planar versions in 2014/2015 and 2016/2017, respectively. (On its roadmap, STMicroelectronics does not have a 20nm node for FD-SOI.)

STMicrolectronics does not plan to offer finFETs on FD-SOI until the 7nm node. “(Planar FD-SOI) will be very competitive with bulk finFETs,” said Jean-Marc Chery, general manager of the Embedded Processing Solutions Segment at STMicroelectronics. Compared to 28nm FD-SOI, the 14nm version of FD-SOI provides a 50% boost in performance, 30% less power, and a 40% advantage in scaling area, Chery said.

Another key for the FD-SOI camp is third-party IP. “With the new transistor structures coming on the horizon, this presents some challenges for the fabless industry,” said Paul Wells, chief executive of IP startup sureCore. “If you are going to design these multi-million-dollar chips in an 18-month timeframe, where are you going to get your IP from?”

To fill the void, sureCore is developing 28nm SRAM IP based on FD-SOI. “IP will have to be retargeted for finFETs,” Wells said. “We think it’s not as challenging to retarget existing IP for FD-SOI. Overall, the advantage that FD-SOI has over finFETs is that you can control the back-bias of the transistor.”

With its SRAM IP, sureCore plans to take a different path. “What the industry has done typically is develop read/write-assist circuitry and different sensing schemes in order to address the problem as you reduce the voltage,” he said. “We’ve taken a different approach. We have developed some techniques that reduce that level of switching technology. The net result is we can save a lot of power.”

Another option is SuVolta’s Deeply Depleted Channel (DDC) technology, which works by forming a deeply depleted channel when a voltage is applied to the gate. Fujitsu is supporting SuVolta’s technology. GlobalFoundries is supporting SSRW. Others are looking to specifically support SuVolta’s technology. “We are working with the majority of foundries right now,” said Robert Rogenmoser, senior vice president, product development and engineering at SuVolta.