Is There Any Hope For Asynchronous Design?

This approach has long held promise, but never managed to deliver. Is there a fundamental problem, or is it just bad luck?

popularity

In an era when power has become a fundamental design constraint, questions persist about whether asynchronous logic has a role to play. It is a design style said to have significant benefits and yet has never resulted in more than a few experiments.

Synchronous design utilizes a clock, where the clock frequency is set by the longest and slowest path in the design. That includes potential variation that can happen in the manufacturing process. A common practice during test is to separate out chips based on performance into different bins. Otherwise, all chips that do not operate above a defined frequency would be considered defective.

This is all made more complex because of clock skew. Even though a clock signal may be generated from a single point, it will encounter delays as it passes across a chip. Clock skew is the difference in arrival times of the clocks relative to the signals it is intended to clock. This, too, is subject to fabrication variation.

To help mitigate these problems, multiple clocks are often used, or other complex design methodologies deployed. While this creates asynchronously coupled domains, it creates a new class of problems, namely clock domain crossing.

In addition, clocks consume a lot of power. Because a clock propagates to so many places on a chip, the clock line has a large amount of capacitance associated with it. Every clock edge means that the capacitance has to be charged or discharged, which slows it down and consumes a lot of power. Alternatively, buffers are added to reduce the capacitive load on each individual buffer, but that means there are now multiple buffers, which in turn consume more power.

With an increasing number of chips approaching the reticle limit, it becomes impossible to operate a chip from a single synchronous clock. “If you can’t get across the chip in one clock cycle, you have to look at things as being locally synchronous, but have longer distance runs that are either asynchronous clocked, where you go through synchronizers, or you use methodologies from the good old CPU times — you build a low-skew clock grid,” said Michael Frank, retired fellow from Arteris. “The problem is that clocks take power, and you have big rebuffered trees that feed a large number of flops.”

What makes synchronous design so attractive is that once the longest path has been found, timing essentially can be ignored, as all operations are divided into discrete steps. This is an important factor in tools, such as synthesis.

“Asynchronous design is one of these technologies that has endlessly promised, but it’s really hard to build, other than in fairly limited cases,” says Rob Aitken, fellow at Synopsys. “This is over-generalizing, but if you take a given piece of RTL, and assume that RTL has been optimized for synchronous design, and you try to implement the same thing using asynchronous design, the first thing you wind up doing is de-tuning the RTL slightly to make it more amenable to asynchronous design. Then you implement it asynchronously, and you see if the benefit is better than the amount of de-tuning you had to do to get it asynchronous in the first place. The world will eventually find a way to benefit from it, but for now, it’s hard to beat fully synchronous design, just because that’s what everything is optimized for.”

Tools must exist before there can be any fundamental change. “The problem with asynchronous design is the typical chicken and egg problem — no tools, no users; no users, no tools,” says Marly Roncken, director of the Asynchronous Research Center at Portland State University. “This holds big companies back and makes asynchronous the domain of startups and research centers. There are pieces of asynchronous logic in synchronous design, though synchronous designers will probably not call that out as much. I would love to see a seamless integration in the tools domain, so we can use synchronous and asynchronous as complementary parts, each working where its strengths are.”

So will there ever be a compelling event to make asynchronous design become essential? “If you give me a set of design goals that include power and energy, you can hit those a lot faster with an asynchronous approach than if you have to do the same thing synchronously,” says Rajit Manohar, professor of electrical engineering and computer science at Yale University. “This is especially true if the target is aggressive. With enough time and effort, engineers can optimize anything, and there are a lot of very good engineers out there. I don’t want to say it may never be possible for you to hit this particular performance point. Engineers can do a lot to optimize their design when they have the right tooling and the right support and competence.”

Historical attempts
Back in the 1980s and 1990s, many of the top system houses of the day explored the opportunities of asynchronous design. This was using the tools of the day, all of which were designed for synchronous logic. Several techniques were tried, but ultimately none of those companies went on to adopt asynchronous design practices.

In the 1980s, most designs were done by hand. “Better tools allowed you to do fancier chips, and then you could use the faster processors to run more sophisticated tools,” says Yale’s Manohar. “There was a virtuous cycle of evolution. Today, we have sophisticated EDA tools to design very complex synchronous chips. What happened is that the asynchronous design methodology wasn’t mature enough to catch that cycle. The first synchronous processor was designed in the 1970s. The first asynchronous processor wasn’t designed until 1989. That’s a pretty large number of years between those two.”

In one research paper1, the author identified 10 different ways to describe asynchronous systems and the synthesis approaches associated with those. “Making a strong comparison between each, especially in the critical issues of performance, area, and power usage, is difficult, and unfortunately there haven’t been many actual comparisons made,” wrote Scott Hauck. “Even worse, there hasn’t been any truly compelling evidence of real benefits of asynchronous circuits over synchronous approaches, though several impressive examples have been built. The fundamental issue of which approach is best in performance or area or power among the asynchronous styles, as well as if any asynchronous approach is worth the extra effort of abandoning the prevalent synchronous model, is still open.”

Those asynchronous techniques do not include the one that is most prevalent in those early experiments. “The current asynchronous approach is basically taking a Turing machine, which is synchronous, and trying to make an asynchronous system out of it,” says Ron Lavallee, president of You Know Solutions. “You have to start with something that is stateless and asynchronous, and simply build circuits that match that. Developing asynchronous systems was just too hard when compared to a traditional synchronous design approach. Also, it is difficult to make a synchronous-system Turing machine into an asynchronous system.”

The lack of a defined design philosophy has continued to plague asynchronous design. “It’s easy to get burned with asynchronous design,” says Manohar. “If I look at synchronous design, you pick up a VLSI book, they show you 50 different latches. They show you all possible circuit styles, even though commercial tools don’t support many of them. They have decided that there’s a particular style of synchronous logic that works well. They know how to make it work. They can get good results, and that’s what they support. That is what everybody is using. The same thing is true in asynchronous logic. There are a lot of different approaches, and some of them work well and some of them don’t work so well. It’s difficult for somebody from the outside to know which one to use. If you pick the wrong one, you’re going to get into trouble. That is part of the problem. There aren’t that many people producing asynchronous chips on a regular basis who have built this expertise. A lot of it is education and having tools and automation to support methodologies you can count on when you’re designing your chip.”

When looking for a good candidate to do an asynchronous design, start with circuits where the time taken to perform operations is data-dependent, meaning that some results can be computed quickly, while others take longer. If there is a desire to do the computation in a fixed time period, it has to be adjusted for the longest possible computation time.

“Multiplication is a simple example,” says Manohar. “Supposing I was writing a piece of software and need to multiply two numbers. I might profile my code and I find the multiplier is the slow part. Then I realize that most of the time, X is zero. If I’m a software developer, I would add a conditional that if X = 0, return 0. Otherwise, do the operation. That is not a great idea for any clocked implementation. In the worst case, I added a test to see if X is zero. It fails, and then I have to do the multiplication, and I make the frequency lower. In an asynchronous case, that’s an improvement because on average, I’m doing better. This is something you have to look at from an algorithmic perspective.”

Languages and tools
And that is where the big stumbling block comes in, because every EDA language in use today is optimized for synchronous entry methodologies.

“We started with decision flow charts,” says You Know Solutions’ Lavallee. “These got deployed across thousands of systems at General Motors’ power train, a manufacturing system. One of the issues we had was getting people to think of flow charts in a parallel fashion. They work by propagating multiple-decision flow charts simultaneously, and initiating event functions as the propagation proceeds. This propagation can be in a physical, biological, or chemical substrate. A flow chart (see figure 1) is a set of events, actions and tests. One complaint about flow charts is they can resemble spaghetti code. We solved that years ago by making them a true parallel programming language. You draw a separate flow chart for each task or function. That helps with the complexity by breaking one big flow chart down into multiple small ones. On top of that, we add objects. Objects allow you to encapsulate action and test structures into a higher-level action and test. And you can keep doing this through as many levels as you want.”

Fig 1: Flowchart for an asynchronous full adder. Source: YouKnowSolutions

Fig 1: Flowchart for an asynchronous full adder. Source: You Know Solutions

Yale has a hardware description language. “It’s a message-passing programming language, where the messages are communication between components,” says Manohar. “You can use the syntax to describe a data flow design. You can use the syntax to have loops and communication. It’s based on CSP, which was Tony Hoare’s language developed in 1979, but with some semantic changes.”

But it is a lot more difficult to start with a language such as Verilog. “Many synchronous tools lack asynchronous capabilities at the basic core part of their software,” says Portland State’s Roncken. “This is very apparent in timing analysis and test tools, tools that everyone needs. We are tying our research to the asynchronous tools being constructed at Yale, which in turn builds on in-depth knowledge and past experience from Caltech, Philips Electronics, Manchester University, Intel, and other big players in the asynchronous domain from past and present.”

The research at Yale is supported by DARPA’s Electronics Resurgence Initiative (ERI). “We have put together an ASIC flow for asynchronous circuits,” says Manohar. “We have developed a number of things that allow us to design asynchronous circuits with the same efficiency, using much less effort that it takes you to design a complex synchronous chip. But what we’re trying to do is to show that we can design high-quality chips automatically, or with much lower effort than it would take to do a clocked design.”

Verification can present some very different challenges. One of them is repeatability. While a simulation is deterministic — events will always happen in the same order each time you perform a simulation — it can be difficult to orchestrate things that involve multiple asynchronous activities. This tends to be a huge problem in live systems. But even within a simulation, it can make understanding system state difficult to capture and comprehend. Very minor changes can cause much larger changes in outcomes than with synchronous design, which filters many of those problems out.

This also causes significant problems when a reference model is being used for verification. While both models may be correct, they can exhibit different behavior, especially in the presence of asynchronous activity. Special care must be taken to ensure the reference model can synchronize with the design model.

“There are certain things that are similar, and certain things that are quite different,” says Manohar. “We are doing some work in the space using formal methods and theorem provers to be able to verify properties on asynchronous designs. At a higher level of abstraction, we need other types of verification because we have to check that asynchronous computation is correctly implemented by gates. We have developed certain verification strategies, and they look much more like the strategies used for software verification.”

Some aspects may be simpler. “Verification of a clockless flow chart system is easier because every signal path does not need to be verified,” says Lavallee. “Once the atomic structure of an action or test block has been thoroughly verified for the substrate used, then there is no need to verify them again. Verification is then only the flow line signal paths and the overall behavior of the system.”

Few designs are likely to be fully asynchronous, meaning that both styles have to meld together. “Adaptive systems could potentially solve the problem of variability,” says Synopsys’ Aitken. “You will end up in those situations being bitten by some more mundane concerns. I designed my thing and made it asynchronous, and now it’s better, but there are really two pieces that are the classic failure points for asynchronous. One is test. Recently this has changed, but historically the answer to testing an asynchronous circuit was to synchronize it and then run scan. The other one is really just the number of tricks that get played in synchronous design in order to borrow from clock cycles, in order to make sure that the signal integrity on the clock waveforms is not absolutely miserable. There are a number of not really asynchronous, but not fully synchronous things that happen. Those capabilities enable synchronous designs to get some really serious performance and power gains. It means the benefit of a fully asynchronous system isn’t really as much as it might be in theory.

The connection point between the two worlds is already well known because of the clock domain crossing (CDC) problem. “Metastability issues happen when you have two asynchronous clocks, and their edges — the relative gap between the active edges — varies dynamically,” says Prakash Narain, president and CEO of Real Intent. “At some point it will become small enough that the flop-to-flop path will not meet timing. You have to compensate for that by ensuring the CDC crossings follow a specific set of logic design principles. For die-to-die interconnect, which is relatively slower, we do have the globally asynchronous, locally synchronous (GALS) methodology. You have one clock domain, but within that clock domain you create islands of totally synchronous logic where the timing is met. Between those islands, you say that the clock tree is not balanced, so I’m going to treat them as if it is an asynchronous crossing. It is the same source clock, with the same frequency, but there will be a phase difference. That may allow you some simplification, but typically it doesn’t.”

Carving out a small piece of a synchronous design is not the way to go. “In my experience, the biggest advantages we get using asynchronous design is because we think about the entire problem differently,” says Manohar. “We often come up with a solution that would not work well in the clocked approach, but which gives good results because we’re using asynchronous logic.”

Conclusion
So is there an opportunity for asynchronous design? “If what we know today about asynchronous design and how to automate various steps was known in 1988, the story might be different,” Manohar says. “We are at an interesting point where companies that traditionally were viewed as software companies are now spinning silicon. And that’s an interesting opportunity because there may be a group of people looking at the problem of chip design with fresh eyes. That’s an opportunity for asynchronous design.”

Reference

  1. Asynchronous Design Methodologies: An Overview. Scott Hauck. Proceedings of the IEEE Vol 83 No. 1. January 1995

Related Reading
Accellera Preps New Standard For Clock-Domain Crossing
Goal is to streamline the clock-domain crossing (CDC) flow between IP vendors, integrators, and tool vendors.



6 comments

Shiv Sikand says:

The Yale HDL sounds like Tangram that was written by Van Berkel at Philips Nat Lab. That too was based on CSP. I worked with Marly on that project

Yaron k. says:

Not much has changed since I attended async ’98, it seems… Tools are lacking,value prop is tenuous,and the big technology break is 5-10 in the future

Cliff Cummings says:

I earned my MSEE from Oregon State University and my professor for Asynchronous Digital Logic design class was Don Kirkpatrick, PhD, Adjunct Faculty from Tektronix. Don was also my graduate advisor and one of the brightest digital designers I have ever known. My class used the book, “Asynchronous Sequential Switching Circuits” by Stephen Ungar, originally published in 1969 and reprinted in 1983 (I still have the book). After graduation, in 1988, I tried to do some high-speed ECL (Emitter Coupled Logic) design using the asynchronous techniques learned in class but immediately ran into trouble. I talked to Don Kirkpatrick about the design, and he reminded me that a fundamental requirement of asynchronous design is that the gate delays must be much longer than the routing delays, and that was not true for ECL designs. It is also no longer true for CMOS designs (but it was true in the 1980s).
The subtitle of the article asks, “Is there a fundamental problem, or is it just bad luck?” Based on the fundamental requirement that gate delays must be much longer than routing delays, I claim “there is a fundamental problem.” Doing asynchronous design is a very interesting exercise. Getting the routing to work is the fundamental problem. It is also very easy to prove clocked-synchronous designs using simulations and Static Timing Analysis, which is why synchronous design is preferred. We can do synchronous projects quickly and easily prove they will work.
Figure 1 in the article shows an 8-bit full adder and is very complex, something that is easily rendered using RTL design. Try extending that to a billion-gate design!
Regards – Cliff Cummings
VP Training – Paradigm Works

Ron Lavallee says:

Hi Cliff, I can understand your reaction when you first look at figure 1. It is made up of 19 parallel propagating (executing) tasks but when you look into them they are all quite simple. The point of the drawing is to show that with a parallel system like Flowpro only a portion of the asynchronous tasks need to run at the same time, thereby saving power. The 8-bit adder is a good example of flowchart hierarchy and selectively running portions of a parallel asynchronous function. As for extending to 1 billion gate design, we hardly ever use Boolean structures, but 1 billion parallel asynchronous tasks are doable with a Flowpro Machine, if the substrate will support it.
Ron Lavallee
You Know Solutions

David Scott says:

At a Jim Hogan symposium at SJ State in 2017, I heard Chris Rowen (former MIPS, Cadence, investor) propose that neural network circuits might benefit from asynchronous design. There are mysterious AI chip start-ups these days. Do we know if any of them are working on asynchronous design for that limited subset of their chips? Seems like an interesting idea, anyway: after all, biological neurons are asynchronous!

Verif_semi says:

Intel’s neuromorphic chip employs asynchronous design.

Leave a Reply


(Note: This name will be displayed publicly)