AI Begins To Reshape Chip Design

Technology adds more granularity, but starting point for design shifts as architectures cope with greater volumes of data.

popularity

Artificial intelligence is beginning to impact semiconductor design as architects begin leveraging its capabilities to improve performance and reduce power, setting the stage for a number of foundational shifts in how chips are developed, manufactured and updated in the future.

AI—and machine learning and deep learning subsets—can be used to greatly improve the functional control and power/performance of specific functions within chips. For those purposes it can be layered on top of existing devices, as well as incorporated into new designs, allowing it to be applied across a wide swath of functions or targeted at a very narrow one.

There are a number of benefits AI provides. Among them:

  • It adds increasing granularity for speeding up performance and reducing power by varying the accuracy of specific functions through sparser algorithms or data compression.
  • It provides the ability to process data as patterns rather than individual bits, effectively raising the abstraction level for computing and increasing the density of the software.
  • It allows processing and memory read/writes to be done as a matrix, greatly speeding up those operations.

But AI also requires a significant rethinking of how data moves—or doesn’t move—across a chip or between chips. Regardless of whether it is applied at the edge or in the data center, or whether it involves training or inferencing, the amount of data being processed and stored can be enormous.

New starting points
On the plus side, AI provides a way to balance highly precise results against using more elements with less precision to achieve good-enough accuracy. In the case of voice recognition, precision is far less critical than facial recognition in a security application or object recognition in an autonomous vehicle. What AI brings to the table is the ability to dial in those results as needed for a particular application.

With AI, the starting point is less about hardware and software than about the quality, quantity and movement of data. That requires a different way of looking at designs, including collaboration between groups that typically haven’t worked together in the past.

“Compute is really cheap and compressing/decompressing data is cheap, but storing and loading data in memory is not,” said Jem Davies, an Arm fellow. “To build these systems you need domain-specific experts, machine learning experts and optimization and performance experts. And you need all three of those domains.”

He noted that machine learning can affect everything in a system, much of it hidden from view. “Some is invisible to the user,” said Davies. “It’s being used to improve battery life. There’s also machine learning inside a camera.”

AI works best with neuromorphic approaches and different memory architectures, where data can be approached as a matrix. Making that work optimally requires architecting well beyond the processor. It requires throughput of large quantities of data back and forth to memory, and it requires changes in memory so that data can be written and read left-to-right and up-and-down.

“A lot of the architectural improvements are a combination of software and hardware able to be able to handle the software better,” said Gerard Andrews, product marketing director for audio and voice IP at Cadence. “This doesn’t necessarily improve the overall performance of the individual processors, but it does add power and memory efficiency. If you can make a smaller bit, you can cut memory size by half.”

That, in effect, allows higher density in the designs from the software side, and it speeds the movement of data in and out of memory. “The problem we’re seeing is memory does not shrink efficiently and word recognition error rates are going up,” said Andrews. “We are all in on exploring sparsity of algorithms to lower power and improve performance.”

This is just scratching the surface of what’s changing, and those shifts are occurring quickly.

“What’s happening in the memory subsystem is a discontinuous and sudden change,” said Kent Orthner, systems architect at Achronix. “It’s all about latency and bandwidth and how to feed the beast off-chip and on-chip. There are a lot of architectures being developed about how to move data around because you need massive amounts of data pipes. Before this, it was about how much RAM you could add and how deep you could go in memory. Now, it’s huge pipes to relatively shallow use of memory.”

One of the new approaches being explored to reduce the flow of data is spiking neural networks. So rather than firing signals consistently, they spike like signals in the human brain.

“Spiking neural networks are the next generation of neural nets,” said Bob Beachler, senior vice president of marketing and business development at BrainChip. “Convolutions use linear algebra. With spiking, the data is fed in the form of spikes. You can train by spikes, and if there are a lot of spikes, you can reinforce a few of them or inhibit that. And for bits dedicated to training thresholds, you can do that with very low weight values.”

All told, there are an estimated 70 AI startups working on various approaches or pieces of approaches. On top of that, virtually all of the major chipmakers, IP vendors and tool companies have their hand in some aspect of AI.


Fig. 1: Data compression Source: Google

AI risks and confusion
But there also is a level of risk associated with AI, depending upon the application and the level of precision. The design of electronic systems in the past has been based upon the complete predictability of logic, much of which has been hard-wired. AI replaces computational precision with distributions of acceptable behavior, and there is a lot of discussion at conferences about what that means for design sign-off. It’s not clear whether existing tools or methodologies will provide the same level of confidence that a device will fall within that distribution, particularly if there is damage or degradation to a system, and how quickly any aberrant behavior can be detected.

There is a level of confusion about how to apply AI, as well. There are chips designed specifically for AI, chips that are used for AI that were not specifically developed for that purpose, and modifications and overlays on both of those to utilize AI more effectively.

Collectively, this fits under the heading of AI, and it’s set within the context of an industry-wide race to improve performance at the same or lower power. With Moore’s Law scaling dropping off to 20% improvements in power and performance for each node after 16/14nm, everyone is looking at new approaches to replace or supplement those benefits. There are a swarm of options on multiple fronts.

For chips that are targeted at AI training or inferencing—or for processors and accelerators within chips that leverage AI’s capabilities—the general consensus is that several orders of magnitude are possible using different chip architectures. But it doesn’t work for everything, and there are a number of variables such as the size and value of the training data that can render AI useless for some applications. In others, a 100X improvement in performance is considered conservative.

This is why it is taking so long to bring some of these new architectures to market. There is a massive amount of architectural exploration and experimentation underway as the chip industry begins to document what works best where and why.

“There are challenges with the applications and the algorithms, and there are challenges with the chip with processors and memories,” said Ron Lowman, strategic marketing manager at Synopsys. “This makes exploration more important with AI architectures, and it’s one of the reasons CCIX (Cache Coherent Interconnect for Accelerators) is becoming so popular. A lot more customers are looking at exploration of architectures. Everyone is trying to build new architectures to mimic the brain.”

This is more than just better routing and floor-planning. There are new non-volatile memory technologies being developed. There also is a push toward smaller processors located next to smaller memories, sometimes tied to a variety of new accelerators that are customized for different data types. Alongside of all of that is a big effort around data compression and quantization.

“There’s work underway to move from 32-bit floating point to 8-bit floating point,” said Lowman. “Now the question is whether you can get down to single-bit quantization.”

Quantization involves the mapping of a large set of input values to a smaller set of output values, and the big concern is what is an acceptable loss of precision. With enough sensors or data inputs, the impact of that error rate theoretically can be minimized, but that is very much application-dependent.

Another approach along those lines involves source synchronization, particularly for AI chips in the data center, and this is prompting changes in on-chip network topologies. Rather than using broadcasting, where all of the targets in a network receive the same data, that data can be more targeted using a multicast approach.

“With multicast you can do one write to many destinations,” said Kurt Shuler, vice president of marketing at Arteris IP. “It’s usually used for weights. The benefit is you have better utilization of the available network on chip bandwidth. So basically you’re putting fewer cars on the road.”

One of the problems with AI chips is they tend to be very large. “The big issue is the clock tree,” said Shuler. “This requires synchronous communication, because if you deal with communication asynchronously, that takes a lot of area. Plus, you’re more likely to have routing congestion on a large chip. The way around that is to create virtual channel links, where you decrease the number of wires and share communications over a set of wires. That requires arbitration to match the data flow.”


Fig. 2: Mapping ports on a chip. Source: Arteris IP

Planning for obsolescence
That’s one piece of the design. Another piece involves the ability to keep current with algorithms, which are being regularly updated, and that affects what kinds of processors are added into chips that utilize AI. Each of those has an impact on the movement of data within a chip and the type of processors used for that data.

CPUs and GPUs offer some programmability, primarily through software. DSPs and FPGAs offer programmability in firmware/hardware. And embedded FPGAs add that programmability directly into an SoC or multi-chip package.

Choosing the types of processors also is dictated by end market applications. For a safety-critical application in a car or industrial setting, for example, it’s expected that technology will stay current and responsive enough to be compatible with other vehicles on the road or other equipment in a factory.

“When we discuss future-proofness, it’s not a question of whether it works or doesn’t work,” said Carlos Macián, senior director of innovation at eSilicon. “A TPU (tensor processing unit), which is a trailblazer, showed that orders of magnitude improvements in performance can be achieved. But for a new workload, if the ASIC is not optimized, you may only get 3X improvement.”

That’s assuming the data is clean and useful. And this is where things can get really complicated.

“AI works well for unstructured data sets,” said Macián. “If you tag people appearing in Facebook, you know that is well suited for AI. But it’s not organized or structured data. So AI is by nature inaccurate, and sometimes it is wrong.”

Not everything needs to be future-proofed. In some markets such as mobile phones, consumers expect to replace devices every few years. In others, electronics are expected to remain fully functional for as long as two decades.

Improving the quality of the data helps, which helps explain why algorithms change so quickly and why field upgradeability is considered essential for some devices. But those changes also can have an impact on performance, and there’s no way to account for them without adding some programmability into the hardware. The question is how much programmability, because programmable logic is significantly slower than hardware that has been tuned to the software.

Conclusion
Unlike many other growth markets for semiconductors, AI is a horizontal technology. It can be applied to a variety of vertical markets, and it can be used in developing the chips for those markets. It also can be used to make existing chips more efficient.

This is just the beginning of the AI revolution, and the impact is already significant. As design teams become better versed in this technology, it will have a big effect on how they design chips, how those chips interact with other chips, and it will create new opportunities for developers of tools, hardware, software, and possibly entirely new markets.

Related Stories
Big Changes For Mainstream Chip Architectures
AI-enabled systems are being designed to process more data locally as device scaling benefits decline.
AI Architectures Must Change
Using the Von Neumann architecture for artificial intelligence applications is inefficient. What will replace it?
What Makes A Good AI Accelerator
Optimizing processor architectures requires a broader understanding data flow, latency, power and performance.



1 comments

ramanji reddy says:

which vendor will have high memory and the processing requirements?

Leave a Reply


(Note: This name will be displayed publicly)