Systems & Design

Coherency Becomes A Stack Of Issues

Keeping SoCs in sync is becoming increasingly challenging; multicore, increased software content and multiple geographies create bigger problems.

March 22nd, 2012 - By: Ed Sperling

By Ed Sperling
As complexity increases and the industry increasingly shifts away from ASICs to SoCs, the concept of coherency is beginning to look more like a stack of issues than a discrete piece of the design.

There are at least five levels of coherency that need to be considered already, with more likely to surface as stacked die become mainstream over the next few years. Perhaps even more mind-numbing, this stack itself will have to take on a level of coherency over the couple generations of chips.

Let’s take a closer look.

Cache coherency
The concept of keeping data coherent historically was relegated to processor makers such as IBM, Intel and AMD, which have focused on improving performance through faster access to data. One solution to that improved performance has been multithreading and multiprocessing. Along with that, these vendors have added in various levels of cache memory for faster recall of important data.

More cores also makes it harder to effectively use these caches. Data has to be kept consistent, which requires more system overhead in terms of processing and power just to maintain that coherency. And it gets even harder as more cores are added into an SoC, which increasingly are not same size, do not run at the same frequency, and sometimes do not even connect directly to the main CPU.

“With cache coherency, some of the traffic may be serviced by the cache on another GPU,” said Drew Wingard, CTO at Sonics. “If you’re just using an ARM core, the CPU coherence is sufficient. But the GPU uses its own local memory. You really want it to be fully cache coherent across all of those.”

But even finding the data to maintain consistency may be a problem in a complex SoC.

“You can view what’s in memory, or view it and be able to change what’s in memory, but first you have to find it,” said Kurt Shuler, vice president of marketing at Arteris. “If you have four cores, the most efficient way to hook them up is for each core to have its own cache and graphics to have its own cache. If you change something, you have to snoop in all the caches to make sure it’s consistent.”

But there is also a move in the completely opposite direction—sharing memories among multiple cores—because it reduces the number of components on the bill of materials. The Low-Latency Interface specification from the MIPI Alliance is a case in point, where a memory can be shared between a modem and an applications processor. Intel, meanwhile, has added on-chip graphics that share memory with the CPU.

“The whole design gets more complex,” said Shuler. “You have more traffic beyond the cores, and from a power standpoint the overhead goes up.”

Still, cache coherency is one of the better-understood pieces of this stack. It has been an issue ever since multiprocessing was first employed in the 1960s. “Snooping” has been widely used since that time.

Software coherency
A newer facet of coherency involves embedded software. Because SoCs now include an increasing amount of software in the design, engineering teams now have to wrestle with coherency issues that previously were dealt with by the operating system.

“Fundamentally you’ve got two combined issues here,” said Andy Meyer, verification architect for Mentor Graphics’ Design Verification Technology Division. “You’ve got cache coherency, where the same data is being viewed in a couple places. And then you’ve got an issue with consistency in the simple code in a uniprocessor that now has to run on a second processor. The ordering of events can change in multiprocessing.”

Those problems crop up regularly in verification, but not always with the expected results. It’s difficult to effectively write the stimulus in a testbench for coherency. What happens, for example, when a core is shut down to save power?

“The scariest part is when there is no OS support,” said Meyer. “There’s also a big problem with heterogeneous cache, such as when you have a CPU working with a GPU.”

Another issue has to do with effective coverage in verification, already a problem for complex SoCs. States frequently are distributed across multiple chips and multiple boards. Timing varies from one state to another, and can be particularly problematic if snooping functions are tied to a state. And parallelism continues to baffle even the most advanced teams.

“Standard coverage methods don’t work well here,” said Meyer. “You have to query in ways you traditionally didn’t have the power to query and ask questions across months of regressions. For instance, ‘Have we been here ever—or in the last two months.’ Until coverage steps up, people with deep knowledge of verification running hundreds of full-time emulator systems are finding out at the last minute that it’s not okay to ship.”

I/O coherency
Tied in with both cache coherency and software coherency is I/O coherency. Increased communication on a chip, between chips, and between a chip and the outside world, have turned what used to be a relatively straightforward networking issue into a complex jumble of prioritization and synchronization.

“You have to deal with this even in single processors,” said Sonics’ Wingard. “You may have a PCI core streaming data into memory. Today, without I/O coherence, it’s difficult to determine what is coming in. The CPU has no way of knowing what was transferred when it dos a copy from non-cache to cache.”

He noted that personal computers had I/O coherency for a long time, particularly with direct memory access. DMA was developed initially to help solve the bottleneck that occurred when a CPU was involved in an I/O transfer. Rather than tie up the CPU with that transfer, the CPU continued running, then accepted an interrupt when the transfer was completed.

But with more of this being moved onto a chip, keeping coherency while moving data back and forth from more places is becoming much more difficult.

Ecosystem coherency
One of the least addressed facets of the coherency stack involves business and communication issues across a supply chain for a particular SoC rather than the actually technology itself. Even where competitive suspicions can be overcome, the very different approaches taken for designing components, IP and software, as well as language barriers, create one of the more difficult and less tangible challenges in the coherency stack.

“The challenge going forward is that you have a bunch of people who may not be that skilled in system development driving the chip and spec for one design, and other supplier trying to orchestrate things,” said Mike Gianfagna, vice president of marketing at Atrenta. “So you bring them together to solve a problem for one customer in 12 weeks and then they move on. You’ve got corporations coming together and bringing all these pieces together almost like the way a movie is done. But is there a coherent way to communicate data and information risks and still provide good visibility from a power/performance/area point of view?”

For decades this task has been handled by IDMs, but in the SoC world there are far fewer IDMs these days. Many of these chips are built using third-party IP such as cores from ARM or MIPS, DSPs from companies such as Tensilica, and standard IP from the Big Three EDA vendors.

Coherency in stacked die
It’s uncertain whether stacking of die, either in 2.5D or 3D configurations will make coherency easier or harder. The answer is likely to be a little of both.

“With 2.5D and 3D, you’re looking at low-power memory access,” said Arteris’ Shuler. “You put the DRAM closer to the CPU, the addressing is wider and you get rid of some of the latency. But you also need coherency across all of this.”

No one is sure yet how multiple high-speed communication channels between die will affect coherency. If the channel between the core is wider and shorter that will improve data speed, but if processors and DRAM are scattered on multiple die, with some of them shut down, some partially shut down, and others fully active, it may make it harder to keep track of data and make sure it is all synchronized.

Ed Sperling

(all posts)
Ed Sperling is the editor in chief of Semiconductor Engineering.

Knowledge Centers
Entities, people and technologies explored

Shift Left Is The Tip Of The Iceberg

A transformative change is underway for semiconductor design and EDA. New languages, models, and abstractions will need to be created.

by Brian Bailey

Partitioning In The Chiplet Era

Understanding how chiplets interact under different workloads is critical to ensuring signal integrity and optimal performance in heterogeneous designs.

by Ann Mutschler

NAND Flash Targets 1,000 Layers

New techniques go beyond improved deposition and etching, but challenges stack up, too.

by Bryon Moyer

3.5D: The Great Compromise

Pros and cons of a middle-ground chiplet assembly that combines 2.5D and 3D-IC.

by Ed Sperling

AI’s Role In Chip Design Widens, Drawing In New Startups

Focus is on letting engineers do much more with the same or fewer resources — and less drudgery.

by Karen Heyman

What Comes After HBM For Chiplets

The standard for high-bandwidth memory limits design freedom at many levels, but that is required for interoperability. What freedoms can be taken from other functions to make chiplets possible?

by Brian Bailey

Memory Fundamentals For Engineers

eBook: Nearly everything you need to know about memory, including detailed explanations of the different types of memory; how and where these are used today; what's changing, which memories are successful and which ones might be in the future; and the limitations of each memory type.

by The SE Staff

Why Small Fab And Assembly Houses Are Thriving

Megafabs churning out the most advanced processors are not the only game in town.

by Bryon Moyer

Coherency Becomes A Stack Of Issues

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

NAND Flash Targets 1,000 Layers

3.5D: The Great Compromise

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

Why Small Fab And Assembly Houses Are Thriving

Sponsors

Recent Comments

About

Navigation

Connect With Us

Coherency Becomes A Stack Of Issues

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

NAND Flash Targets 1,000 Layers

3.5D: The Great Compromise

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

Why Small Fab And Assembly Houses Are Thriving

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored