What Does An AI Chip Look Like?

As the market for artificial intelligence heats up, so does confusion about how to build these systems.

popularity

Depending upon your point of reference, artificial intelligence will be the next big thing or it will play a major role in all of the next big things.

This explains the frenzy of activity in this sector over the past 18 months. Big companies are paying billions of dollars to acquire startup companies, and even more for R&D. In addition, governments around the globe are pouring additional billions into universities and research houses. A global race is underway to create the best architectures and systems to handle the huge volumes of data that need to be processed to make AI work.

Market projections are rising accordingly. Annual AI revenues are predicted to reach $36.8 billion by 2025, according to Tractica. The research house says it has identified 27 different industry segments and 191 use cases for AI so far.

Screen Shot 2017-02-24 at 9.41.23 AM
Fig. 1. AI revenue growth projection. Source: Tractica

But dig deeper and it quickly becomes apparent there is no single best way to tackle AI. In fact, there isn’t even a consistent definition of what AI is or the data types that will need to be analyzed.

“There are three problems that need to be addressed here,” said Raik Brinkmann, president and CEO of OneSpin Solutions. “The first is that you need to deal with a huge amount of data. The second is to build an interconnect for parallel processing. And the third is power, which is a direct result of the amount of data that you have to move around. So you really need to move from a von Neumann architecture to a data flow architecture. But what exactly does that look like?”

So far there are few answers, which is why the first chips in this market include various combinations of off-the-shelf CPUs, GPUs, FPGAs and DSPs. While new designs are under development by companies such as Intel, Google, Nvidia, Qualcomm and IBM, it’s not clear whose approach will win. It appears that at least one CPU always will be required to control these systems, but as streaming data is parallelized, co-processors of various types will be required.

Much of the processing in AI involves matrix multiplication and addition. Large numbers of GPUs working in parallel offer an inexpensive approach, but the penalty is higher power. FPGAs with built-in DSP blocks and local memory are more energy efficient, but they generally are more expensive. This also is a segment where software and hardware really need to be co-developed, but much of the software is far behind the hardware.

“There is an enormous amount of activity in research and educational institutions right now,” said Wally Rhines, chairman and CEO of Mentor Graphics. “There is a new processor development race. There are also standard GPUs being used for deep learning, and at the same time there are a whole bunch of people doing work with CPUs. The goal is to make neural networks behave more like the human brain, which will stimulate a whole new wave of design.”

Vision processing has received most of the attention when it comes to AI, largely because Tesla has introduced self-driving capabilities nearly 15 years before the expected rollout of autonomous vehicles. That has opened a huge market for this technology, and for chip and overall system architectures needed to process data collected by image sensors, radar and LiDAR. But many economists and consulting firms are looking beyond this market to how AI will affect overall productivity. A recent report from Accenture predicts that AI will more than double GDP for some countries (see Fig. 2 below). While that is expected to cause significant disruption in jobs, the overall revenue improvement is too big to ignore.

Screen Shot 2017-02-24 at 10.00.15 AM
Fig. 2: AI’s projected impact.

Aart de Geus, chairman and co-CEO of Synopsys, points to three waves of electronics—computation and networking, mobility, and digital intelligence. In the latter category, the focus shifts from the technology itself to what it can do for people.

“You’ll see processors with neural networking IP for facial recognition and vision processing in automobiles,” said de Geus. “Machine learning is the other side of this. There is a massive push for more capabilities, and the state of the art is doing this faster. This will drive development to 7nm and 5nm and beyond.”

Current approaches
Vision processing in self-driving dominates much of the current research in AI, but the technology also has a growing role in drones and robotics.

“For AI applications in imaging, the computational complexity is high,” said Robert Blake, president and CEO of Achronix. “With wireless, the mathematics is well understood. With image processing, it’s like the Wild West. It’s a very varied workload. It will take 5 to 10 years before that market shakes out, but there certainly will be a big role for programmable logic because of the need for variable precision arithmetic that can be done in a highly parallel fashion.”

FPGAs are very good at matrix multiplication. On top of that, programmability adds some necessary flexibility and future-proofing into designs, because at this point it is not clear where the so-called intelligence will reside in a design. Some of the data used to make decisions will be processed locally, some will be processed in data centers. But the percentage of each could change for each implementation.

That has a big impact on AI chip and software design. While the big picture for AI hasn’t changed much—most of what is labeled AI is closer to machine learning than true AI—the understanding of how to build these systems has changed significantly.

“With cars, what people are doing is taking existing stuff and putting it together,” said Kurt Shuler, vice president of marketing at Arteris. “For a really efficient embedded system to be able to learn, though, it needs a highly efficient hardware system. There are a few different approaches being used for that. If you look at vision processing, what you’re doing is trying to figure out what is it that a device is seeing and how you infer from that. That could include data from vision sensors, LiDAR and radar, and then you apply specialized algorithms. A lot of what is going on here is trying to mimic what’s going on in the brain using deep and convolutional neural networks.”

Where this differs from true artificial intelligence is that the current state of the art is being able to detect and avoid objects, while true artificial intelligence would be able to add a level of reasoning, such as how to get through a throng of people cross a street or whether a child chasing a ball is likely to run into the street. In the former, judgments are based on input from a variety of sensors based upon massive data crunching and pre-programmed behavior. In the latter, machines would be able to make value judgments, such as the many possible consequences of swerving to avoid the child—and which is the best choice.

“Sensor fusion is an idea that comes out of aircraft in the 1990s,” said Shuler. “You get it into a common data format where a machine can crunch it. If you’re in the military, you’re worried about someone shooting at you. In a car, it’s about someone pushing a stroller in front of you. All of these systems need extremely high bandwidth, and all of them have to have safety built into them. And on top of that, you have to protect the data because security is becoming a bigger and bigger issue. So what you need is both computational efficiency and programming efficiency.”

This is what is missing in many of the designs today because so much of the development is built with off-the-shelf parts.

“If you optimize the network, optimize the problem, minimize the number of bits and utilize hardware customized for a convolutional neural network, you can achieve a 2X to 3X order of magnitude improvement in power reduction,” said Samer Hijazi, senior architect at Cadence and director of the company’s Deep Learning Group. “The efficiency comes from software algorithms and hardware IP.”

Google is attempting to alter that formula. The company has developed Tensor processing units (TPUs), which are ASICs created specifically for machine learning. And in an effort to speed up AI development, the company in 2015 turned its TensorFlow software into open source.

Screen Shot 2017-02-25 at 10.46.19 AM
Fig. 3: Google’s TPU board. Source: Google.

Others have their own platforms. But none of these is expected to be the final product. This is an evolution, and no one is quite sure how AI will evolve over the next decade. That’s partly due to the fact that use cases are still being discovered for this technology. And what works in one area, such as vision processing, is not necessarily good for another application, such as determining whether an odor is dangerous or benign, or possibly a combination of both.

“We’re shooting in the dark,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “We know how to do machine learning and AI, but how they actually work and converge is unknown at this point. The current approach is to have lots of compute power and different kinds of compute engines—CPUs, DSPs for neural networking types of applications—and you need to make sure it works. But that’s just the first generation of AI. The focus is on compute power and heterogeneity.”

That is expected to change, however, as the problems being solved become more targeted. Just as with the early versions of IoT devices, no one quite knew how various markets would evolve so systems companies threw in everything and rushed products to market using existing chip technology. In the case of smart watches, the result was a battery that only lasted several hours between charges. As new chips are developed for those specific applications, power and performance are balanced through a combination of more targeted functionality, more intelligent distribution of how processing is parsed between a local device and the cloud, and a better understanding of where the bottlenecks are in a design.

“The challenge is to find the bottlenecks and constraints you didn’t know about,” said Bill Neifert, director of models technology at ARM. “But depending on the workload, the processor may interact differently with the software, which is almost inherently a parallel application. So if you’re looking at a workload like financial modeling or weather mapping, the way each of those stresses the underlying system is different. And you can only understand that by probing inside.”

He noted that the problems being solved on the software side need to be looked at from a higher level of abstraction, because it makes them easier to constrain and fix. That’s one key piece of the puzzle. As AI makes inroads into more markets, all of this technology will need to evolve to achieve the same kinds of efficiencies that the tech industry in general, and the semiconductor industry in particular, have demonstrated in the past.

“Right now we find architectures are struggling if they only handle one type of computing well,” said Mohandass. “But the downside with heterogeneity is that the whole divide and conquer approach falls apart. As a result, the solution typically involves over-provisioning or under-provisioning.”

New approaches
As more use cases are established for AI beyond autonomous vehicles, adoption will expand.

This is why Intel bought Nervana last August. Nervana develops 2.5D deep learning chips that utilize a high-performance processor core, moving data across an interposer to high-bandwidth memory. The stated goal is a 100X reduction in time to train a deep learning model as compared with GPU-based solutions.

Screen-Shot-2016-04-26-at-10.23.50-AM
Fig. 4: Nervana AI chip. Source: Nervana

“These are going to look a lot like high-performance computing chips, which are basically 2.5D chips using a silicon interposer,” said Mike Gianfagna, vice president of marketing at eSilicon. “You will need massive throughput and ultra-high-bandwidth memory. We’ve seen some companies looking at this, but not dozens yet. It’s still a little early. And when you’re talking about implementing machine learning and adaptive algorithms, and how you integrate those with sensors and the information stream, this is extremely complex. If you look at a car, you’re streaming data from multiple disparate sources and adding adaptive algorithms for collision avoidance.”

He said there are two challenges to solve with these devices. One is reliability and certification. The other is security.

With AI, reliability needs to be considered at a system level, which includes both hardware and software. ARM’s acquisition of Allinea in December provided one reference point. Another comes out of Stanford University, where researchers are trying to quantify the impact of trimming computations from software. They have discovered that massive cutting, or pruning, doesn’t significantly impact the end product. University of California at Berkeley has been developing a similar approach based upon computing that is less than 100% accurate.

“Coarse-grain pruning doesn’t hurt accuracy compared with fine-grain pruning,” said Song Han, a Ph.D. candidate at Stanford University who is researching energy-efficient deep learning. Han said that a sparse matrix developed at Stanford required 10X less computation, an 8X smaller memory footprint, and used 120X less energy than DRAM. Applied to what Stanford is calling an Efficient Speech Recognition Engine, he said that compression led to accelerated inference. (Those findings were presented at Cadence’s recent Embedded Neural Network Summit.)

Quantum computing adds yet another option for AI systems. Leti CEO Marie Semeria said quantum computing is one of the future directions for her group, particularly for artificial intelligence applications. And Dario Gil, vice president of science and solutions at IBM Research, explained that using classical computing, there is a one in four chance of guessing which of four cards is red if the other three are blue. Using a quantum computer and entangling of superimposed qubits, by reversing the entanglement the system will provide a correct answer every time.

Screen Shot 2017-02-25 at 6.53.27 PM
Fig. 5: Quantum processor. Source: IBM.

Conclusions
AI is not one thing, and consequently there is no single system that works everywhere optimally. But there are some general requirements for AI systems, as shown in the chart below.

Screen Shot 2017-02-28 at 7.06.12 PM
Fig. 6: AI basics. Source: OneSpin

And AI does have applications across many markets, all of which will require extensive refinement, expensive tooling, and an ecosystem of support. After years of relying on shrinking devices to improve power, performance and cost, entire market segments are rethinking how they will approach new markets. This is a big win for architects and it adds huge creative options for design teams, but it also will spur massive development along the way, from tools and IP vendors all the way to packaging and process development. It’s like hitting the restart button for the tech industry, and it should prove good for business for the entire ecosystem for years to come.

Related Stories
What Does AI Really Mean?
eSilicon’s chairman looks at technology advances, its limitations, and the social implications of artificial intelligence—and how it will change our world.
Neural Net Computing Explodes
Deep-pocket companies begin customizing this approach for specific applications—and spend huge amounts of money to acquire startups.
Plugging Holes In Machine Learning
Part 2: Short- and long-term solutions to make sure machines behave as expected.
Wearable AI System Can Detect A Conversation Tone (MIT)
An artificially intelligent, wearable system that can predict if a conversation is happy, sad, or neutral based on a person’s speech patterns and vitals.



  • John Terrence

    Two other approaches not mentioned here:

    1. the pulpino group doing binaryConnect , 2 orders of magnitude gain.

    2. Analog compute, via [a] or isocline/isosemi , gains : 100x cost, 10,000x ;power.

    btw isosemi also work on a general SIMD processor using analog for compute, and had a gps chip at isscc with 66x lower power.

    [a]https://arxiv.org/abs/1610.02091