Asynchronous logic promises a number of benefits in some specific application areas, but is it ready for mainstream adoption?
There are a number of interesting technologies to keep an eye on in term of how and when they could be adopted for use in SoC design today, some of which include gallium arsenide, GPGPUs, 3D ICs and asynchronous logic.
Asynchronous logic promises a number of benefits in some specific application areas, and one that buoys to the surface for potential near-term use is in the area of security and the IoT. While area is traded off for lower power, better performance and faster speed, there could also be a side benefit of the asynchronous logic not being susceptible to side-channel analysis given its irregular power and EM structure, according to Bernard Murphy, CTO at Atrenta. But exactly when is the big question.
“Asynchronous logic — full bore — is still an academic topic predominantly because the design chain just doesn’t support it. If you start from simulation, which is cycle-based, it therefore assumes a clock. You have synthesis, which does timing optimization, which assumes a clock. You go to timing analysis, that assumes a clock. You go to place and route, timing optimization, that assumes a clock. Virtually nothing in the design flow supports asynchronous design.”
Still, from a conceptual perspective, both synchronous and asynchronous techniques co-exist today. Drew Wingard, CTO of Sonics, said many parts of systems have their own natural rates of doing things and those natural rates are not the same. Connected to this, when thinking about a clock—which is a signal that determines the rate at which things happen—they are often asynchronous to each other. In other words, they’re not synchronized to each other. “For instance, the data rate of a USB isn’t related in any way to the data rate of the pixels on my screen, which isn’t related in any way to the speed of the wireless link or any of the wireless links, for that matter. That kind of asynchronous behavior is a design requirement in almost all SoC designs.”
In terms of design, there is another type of ‘asynchronous,’ which is creating logic that does not contain clocks. “It’s a technology that’s been promoted for many years for a number of reasons,” he said. “The most commonly cited benefit of asynchronous logic is that it’s supposed to be an easier way of building larger systems to escape what some people would call the ‘tyranny of the clock.’ The other one is that asynchronous logic should be lower power. There’s a fair amount of academic work that has tried to explore that question of whether asynchronous logic is fundamentally lower power than synchronous logic. The results are very muddy. There are a number of academics who actually believe that anything you can do using asynchronous logic, you can do the equivalent using a synchronous circuit that is essentially equal power and certainly has much simpler design flow and design tools, but it does require that you do careful design.”
Technically speaking, Mrugesh Walimbe, MTS Function Manager (SOC Design) at Open-Silicon, said asynchronous logic does not use a reference signal for achieving its function such as a clock. “Thus, after implementation asynchronous logic will provide the function within a range of delay determined by the worst case — best case temperature and process variations.”
Synchronous logic uses a signal-like clock for reference, and the function therefore is achieved in a fixed number of clocks subject to a minimum period based on implementation. Slowing the reference signal effectively delays the function, he said.
Marco Brambilla, director of engineering at Synapse Design, pointed out that the obvious difference is that asynchronous logic is not using a clock. “We all know current digital circuits rely on a global clock distribution, where the entire chip is (in theory) isochronous. However, a huge amount of power is ‘wasted’ distributing this signal. From the implementation point of view, clockless circuits are still implemented via a type of standard cells, but it is a complete new class of cells.”
Another way of putting it is that synchronous logic works by putting the computations of combinational logic into storage on the same ‘heartbeat’ cycle. “This approach has the great property of simplifying the timing analysis process down to a very uniform set of checks that can be performed statically (without stimulus patterns),” asserted Steve Carlson, group marketing director in Cadence’s Office of Chief Strategy. “Asynchronous logic enables the use of discrete, rather than common, storage triggering signals. This means that each computation can proceed at its own pace, but it also creates a multiplicity of data-trigger analysis computations that must be specified and performed. Most all SoCs have a combination of synchronous and asynchronous behaviors. Be it on a very local (e.g., asynchronous reset on a flip-flop) or more global (such as GALS: globally asynchronous, locally synchronous) basis, just about every SoC is a combination. There are a few cases of the use of a completely asynchronous approach to design in schemes such as self-time logic.”
Another claim about asynchronous logic is that if the system is going to be built out of lots of different communicating things and they are naturally asynchronous to each other, maybe using asynchronous logic would be a more natural way to deal with the communication between these elements, Wingard said. “GALS tries to capture that same concept but allows the implementer to choose whether or not the actual implementation is done using mostly synchronous logic with asynchronous crossing between clock domains, which is by far the most dominant implementation scheme—or whether you actually use a truly asynchronous communications scheme with handshake signals, and all this kind of stuff in between.”
Murphy agreed. “GALS is actually a pretty reasonable idea of where you say, ‘I’m trying to balance the clock tree over this giant design and it’s becoming increasingly impossible, so what I’m going to do is have chiplets in my design. Within a chiplet, I will balance a clock tree, but between chiplets I’m going to assume I’m effectively asynchronous.’ That means there are a variety of things I can do. I can still have a global chip tree, but I don’t assume synchronization between clocks in different chiplets. Or I can actually send my clocks from one chiplet to another and then I effectively recover the clock on the other side. So, some of the communication protocols where you don’t just send data, you also send the clock, and you recover the clock and reconstruct the data. Alternatively, you can go for truly self-timed logic. You don’t want a clock at all — no clocks whatever within at least maybe an IP. To do that you have to do something different; the most common way people approach it is to do a dual rail, which is for every bit that you would have normally in whatever the function is, you now have two bits and the two bits are encoded for 0 and 1 (normal logic states) and ‘not-ready’ to manage self-timing.”
Using this approach will double the size of the design, but it could increase performance by a factor of two or more while significantly reducing power.
Caltech in the 1980s built a version of the MIPS processor using this technique, and demonstrated it to be 2 to 2.5x faster than the synchronous version. “If area is not a huge consideration, then it can really make a huge difference but of course you are handcrafting the logic and you have to handcraft the layout as well,” he said.
While this is not apt to be embraced by mainstream designs, in specialized IP like, for instance encryption IP as in the AES algorithm, Murphy suggested something could be handcrafted. That would entail a fairly regular structure with a handcrafted layout, which might be economically feasible if the market is large enough. “It would certainly be bigger, but it would potentially run much faster if you’re putting encryption in a lot of datapath channels. And it has the side benefit that, because you don’t have a clock, you don’t have that regular structure in power and EM signals, so it’s very difficult to do side channel analysis on it.”
Wingard observed that versions of asynchronous techniques are becoming more attractive in very specialized areas. “For instance, it’s normal that the on-chip network has to span lots of clock and power domains. That’s a natural thing that we do because we tend to be at the top level of the chip, so the Wi-Fi interface or the display interface or the USB or the PCI-Express interface are all connecting to each other through us. We have to resolve the boundaries there. Because we also have the requirement that the network wants to span the long distances inside the chip to get to the different ends, there are good reasons to consider the use of some of these asynchronous techniques as a way of both crossing clock domain boundaries while being able to handle long distances, and therefore large wire delays.”
“Traditionally,” he continued, “you tend to see [asynchronous logic] promoted the most in ultra low power kinds of things so one could imagine—a wireless sensor node for this fabulous Internet of Things that was trying to live off energy it was harvesting from the environment. So there’s a place where you would absolutely value anything that could save a little bit of energy. There are good reasons to believe that during transition times, with very low or no activity through very low activity into moderate activity, asynchronous logic is probably more stable with respect to being close to optimum as you can do from an energy perspective.”
Carlson agreed communication protocols are inherently dealing with asynchronous behaviors. “A desirable property of asynchronous design is that it tends to reduce peak current and power. Rather than having nearly all elements switch on the active edge of a synchronizing clock, the transitions tend to be spread more evenly over time. This puts less strain on the power delivery network and helps with device reliability. There is less potentially much less ‘wasted’ computation in an asynchronous design. Clock gating has been successfully used to mitigate the gap for synchronous approaches, but as more architectural innovation is taking place, the energy efficiency of fully asynchronous operation is gaining interest.”
Paul Cunningham, vice president of R&D at Cadence, added that asynchronous design is a good fit for moving data. This includes FIFOs (first in, first out) and switching fabrics, for example. He also noted that the flip-flop is itself a very small asynchronous circuit, and there has been some success in security-sensitive applications such as smartcards because an asynchronous circuit is more difficult to hack.
Finally, Synapse’s Brambilla observed that proponents of clockless logic claim that circuits require less power hungry. He has seen many contrasting claims but don’t have enough numbers to have a sure opinion.
However, there are definite advantages in clockless logic, he stressed. First, the activation power is now much more distributed. Asynchronous requires enormous current peaks when the circuit toggles. “A large networking ASIC will require several amperes of current during the clock toggling phases. A clockless circuit probably distributes all the toggling more uniformly, so that even if the total power should be the same, the instantaneous power will be closer to the average number, thus putting much less stress on the power distribution—less IR Drop and less inductive losses due to the spikes,” Brambilla said.
As noted above, the absence of clocks also means much lower EMI interference. “In fact, one claim of the clockless companies is that thanks to the much less noisy power distribution, it is much more difficult to perform EMI signature attacks, which is a huge benefit for encryption/security applications. The lack of a precise EMI signature makes it much harder for an attacker to detect what computations the chip is operating just by analyzing power consumption and EM radiation,” Brambilla explained.
Further, the clockless circuits are intrinsically self-timing, which makes the circuit much more tolerant of lower voltages, voltage instabilities and even process variations. Because of this, he said several benefits can be expected including higher yields; lower voltage operation (beneficial to power consumption); if needed, DVFS becomes trivial; and resilience to crypto attacks based on lowering the VDD to force errors. “Since the circuit slows down, but does not introduce errors, the type of attacks performed by lowering VDD becomes ineffective,” he added.
For the right applications, an asynchronous approach does make sense. But finding those applications where the tradeoffs can be justified requires careful understanding of both the technical requirements and business dynamics that surround the application.