The Power In Errors

Research into probability of errors now underway at big companies and universities as way to reduce energy consumption at advanced nodes.


By David Lammers
Since 1956, some of the top minds in information processing, including Claude Shannon and John von Neumann, have been pondering the problem of how to build reliable systems out of unreliable components. The communications industry embraced the challenge, and deployed error correction techniques to ensure that today’s most sensitive information is transmitted reliably over noisy transmission lines.

But in computing, the notion of probabilistic, error-resilient computing “never caught on,” said University of Illinois Professor Naresh Shanbhag. Probabilistic (stochastic) computing research is underway at Intel, IBM, and at a variety of academic research centers, including the University of Illinois, Stanford, and the University of California at Berkeley. And a new startup, Lyric Semiconductor (Cambridge, Mass.) has developed a chip architecture aimed at probability processing.

Jan Rabaey, a Berkeley professor and researcher at the Gigascale Systems Research Center, said the semiconductor industry faces a power-consumption wall that will require error-resilient computing. In a keynote speech at the recent International Symposium on Low Power Electronics and Design (ISLPED), Rabaey said, “Going forward, computing equals the cost of energy. The minimum energy point is set by the leakage in the transistors. From the 22nm generation and beyond, chip companies have not been able to scale the operating voltage. Going forward, capacitance may not go down that much. In fact, it may go up a bit.”

One path toward reducing power consumption is to reduce the supply voltage to a point where errors occur, but in small enough numbers that it is energy efficient to go back and correct the errors that matter. Rabaey said computer architects need to determine a “distribution of probabilistic outcomes” that would lead to what he calls a Probabilistic Turing Machine.

The power situation is exacerbated by the scaling-related issue of transistor variability, which also makes it more difficult and expensive to deliver deterministic outcomes.

“It might not be used for banks, but there are plenty of applications where customers don’t need absolute determinism,” Rabaey said. Among those applications are sensor networks, user interfaces, multimedia compression and graphics processing. A broad swath of RMS (recognition, mining, and synthesis) applications may only be doable, he argues, if designers consider the tradeoffs between power consumption and digital resolution, or accuracy.

“We can build machines to deal with lots of errors and still function,” Rabaey said, acknowledging that the approach would involve “a drastic redefinition in the way data is encoded and decoded.”

UC Berkeley's Jan Rabaey

UC Berkeley's Jan Rabaey

“Some errors are catastrophic . We need an error model to brace against fatal errors. We need errors that have a smooth rolloff,” Rabaey said.

In stochastic computing, the system collects information from the application and from the circuit fabric. It determines which errors are bad and which are good, and provides a classification of errors. The result, proponents argue, is a system that is much more energy efficient.

Already, some MPUs include logic level techniques, such as error-detection blocks and shadow latches which sample the signal, detect an error, which, once detected, causes the processor to roll-back and recompute. “It is quite expensive in terms of power consumption if the chip has to roll back and recompute,” Shanbhag said. “That approach does not admit the possibility of having a benign error — all errors are equally bad. Today’s MPUs treat all errors as the same.”

Industry adoption slow
While MPU vendors are edging closer, using terms such as error-tolerant or self-healing processors, the concept of accepting that some errors may be tolerated may be a decade away from industry adoption.

Within the Gigascale center, ideas for error-resilient system architectures are various, including applications that could be served by an SoC with a super reliable core, supported by several reasonably reliable cores.

Before joining the faculty at Illinois, Shanbhag designed very high-speed digital subscriber line (VDSL) communications ICs at Bell Labs, incorporating the error-correction techniques that make it possible for consumers to bank on-line and trade stocks over noisy telecom networks. When he moved to academia in 1995, he began writing about computing applications that could would be well-served by similar error correcting architectures. However, after more than a decade of research, even relatively obvious applications such as PC graphics processing — where a few errant pixels usually would not be noticeable – remain based on deterministic techniques, he said.

Stochastic computing is the best way to keep power consumption under control, Shanbhag said, particularly for data-intensive applications such as extractions of features, models, and parameters, apps which draw upon huge databases in the process of serving decision makers.

Stochastic computation is based on a model that “allows one to achieve robustness and energy efficiency on SoCs.” Shanbhag is working on a P-encode acquisition filter for CDMA phones, a stochastic computing-based implementation that is two to three orders of magnitude more power efficient than conventional P encode acquisition filters.

Naresh Shanbhag

Naresh Shanbhag

Lyric takes on Sudoku
So how does this all of this fit into the real world? Lyric Semiconductor emerged from stealth mode at the ISLPED 2010 conference last month, but the startup has kept much of its technology under wraps, partly because it attracted financial backing from DARPA.

Lyric applies statistical inference algorithms to a class of applications, mapping them onto its factor graph-based architecture. The approach creates an algorithmic description of statistical inferences, structures that can be used to derive conclusions from databases. The company has released its first chip, dedicated to error correction on flash memories, and is building a general purpose processor, said Theo Weber, a statistician at Lyric who spoke at the ISLPED conference in Austin last month.

“Lyric’s probability processing technology is designed from the ground up to efficiently consider many possible answers and find the most likely fit,” said Weber. Lyric’s factor graph approach estimates “how variables are connected, such as Sudoku-like problems. It infers meaning from imprecise data,” he said.

Lyric is working on a “GP5” general-purpose computer that Weber said is “1000 times more powerful than today’s processors.”