Machine learning, AI, require more than just power and performance.
The idea that devices can learn optimal behavior rather than relying on more generalized hardware and software is driving a resurgence in artificial intelligence, machine leaning, and cognitive computing. But architecting, building and testing these kinds of systems will require broad changes that ultimately could impact the entire semiconductor ecosystem.
Many of these changes are well understood. There is a need for higher performance per watt and per operation, because all of these developments will drive a huge increase in the amount of data that needs to be processed and stored. Other changes are less obvious, and will require a certain amount of guesswork. For example, what will chips look like after they have “learned” to optimize data in a real-world setting? The semiconductor industry is used to measuring reliability as a function of performance that degrades over time. In contrast, a well-designed adaptive learning system in theory should improve over time.
Part of this shift will be evolutionary, rolled out as technology progresses. Some will be closer to revolutionary, based upon the functioning of the human brain, which is remarkably more efficient than any technology yet developed. In both cases, the amount of research and testing being done in this field is exploding, particularly for such applications as robotics, data management and processing, industrial applications, and for vision systems in driver-assisted or fully autonomous vehicles.
“We’ve been having a lot of discussions lately about cognitive computing,” said Wally Rhines, chairman and CEO of Mentor Graphics. “When we’re born, all the cells in our brain are the same. Over time, those cells specialize into regions such as eyesight. The thinking is that if you start with identical (semiconductor) memory cells, you can specialize them with time. And based on the applications you expose the chip to, stored memory accumulates more data over time. The brain is largely about pattern recognition. What differentiates us from animals is predictive pattern recognition. That requires hierarchical memory and invariant memory. So you don’t store every pattern, but if you see a face in the shadows you can still recognize it. The human brain does this much more effectively than technology.”
In fact, the human brain is 12 orders of magnitude more energy efficient at this than the best computer architecture, Rhines said. But in terms of the density of memory and logic cells, semiconductor technology is getting closer. And increasingly this understanding of the human brain and invariant memory types—all using the same form, no matter how information is recalled—is being applied to technology in universities, research houses, and within large companies across a wide swath of markets.
Still, while there has been much research done on the algorithms that enable devices to learn, there is scant information about the best ways to build the underlying hardware to facilitate machine learning, artificial intelligence and deep learning. Today, the various components of a chip are benchmarked based on the power, performance and cost for a given set of operations. Those metrics are used to support a chip’s architecture, and they affect everything from which analog and digital IP blocks are chosen, how much and what types of memory are used, how signal paths are designed, and even how much heat will be generated by doing things a certain way and how to manage it.
With architectures that can be modified through interaction with the real world, not all of these approaches apply. What does remain the same are such things as the demand for increased performance. Throughput between processors and memory, as well as between sensors and processors, is essential for chips to react to real-world events. This has been one of the recent shifts driving 2.5D architectures.
“You need to store state variables, which becomes the basis of dynamic variability,” said Mike Gianfagna, vice president of marketing at eSilicon. “This is one of the reasons everyone is watching 2.5D. We first saw high-bandwidth memory being driven by high-end communication, but it’s starting to diversify and expand into machine learning and adaptive computing. We’re in the middle of several of these designs right now. This is real.”
Gianfagna said the common thread in these systems is an almost steady stream of data, particularly with machine vision and pattern recognition. “So if you have a picture of a cat, a dog, and a water tank, the system can point to a list of possible guesses. But it needs to be right all the time. There are millions of corner cases in this, and some of them are weird corner cases.”
2.5D/3D, as well as fan-outs, partition a problem between die within a package. At the same time, there also is a lot of attention being paid to the on-chip interconnects and cache coherency. Underlying many of these systems are convolutional neural networks (CNNs), which utilize many sensors to collect data, at least some of which is sent to a centralized logic source to process, interpret and react.
“We’re seeing a lot more demand for acceleration and compute,” said Jeff Defilippi, senior product manager in ARM’s System and Software Group. “Depending on where they are, there are different needs. We’re seeing more content cache from small access points with some level of intelligence built into them. So then you have a coherent on-chip backplane to connect the various components. It’s coherent off-chip, as well.”
Seven companies—ARM, AMD, IBM, Qualcomm, Mellanox, Xilinx and Huawei—have formed the Cache Coherent Interconnect For Accelerators (CCIX), to develop an open acceleration framework for datacenters and other markets. The goal is orders of magnitude reduction in latency, higher bandwidth, and better integration of cache coherency.
“CNNs will be built on top to transport to memory,” said Defilippi. “This will be a heterogeneous environment where you blend together different computing elements.”
CNNs are developing independently, as well as in concert with adaptive learning systems, and they are finding homes in everything from data centers to driver-assisted vehicles.
“The reason CNNs are taking off is because it effectively says you don’t need to write programs in order to do complex pattern recognition,” said Steve Roddy, senior product line group director for Cadence’s IP Group. “The challenge is with people doing high-end pattern analytics recognition, whether it be searching for that one particular face on all the Brussels subway cameras or whether it be validating your face as you walk up to your front door and it automatically opens your door for you—or whether it be the four cameras on your car speeding down the highway at 70 miles an hour taking high-resolution video at 60 frames per second and trying to figure out where the other cars are, where the lanes are and what the speed sign say, and so forth. That’s huge computation.”
Fusing disciplines, creating new ones
This is the point where things get more difficult, though. In the semiconductor industry, there has been talk of hardware-software co-design for years. Some large chipmakers have mastered this synchronized development process. In defining the upcoming HoloLens “mixed-reality” headset, Microsoft distinguished engineer Nick Baker pointed out that another skill set was required to get the job done.
“We used co-design with our ‘Experience’ team, our hardware team and our software team,” Baker said. “They used algorithms from Microsoft research, HoloLens software, and hardware from our silicon team.”
What Microsoft calls its “Experience” team uses a different skill set than what is developed today in computer science or electrical engineering. Many of these systems require different ways of looking at problems.
“You need to frame questions in a way that you can get to something of use,’” said ARM fellow Jem Davies. “One of the founders of ARM asked, ‘How shall we reason about this problem?’ That is a killer question. It sets the tone to raise the question correctly. Are there some new classifications, such as data scientists for neural networks and machine learning? These are the people who will need to classify and filter data. You might be able to make a correlation between redheads and left-handed people, but it’s not useful. We have to ask better questions. Asking questions is really important, but it helps if you ask useful questions. Then you might get useful answers.”
This is harder than it looks. In some ways, it combines the best elements of electrical engineering, computer science, and a philosophical/social science approach to asking very precise questions.
“These are at complex interactions,” said Anush Mohandass, vice president of marketing at NetSpeed Systems. “It’s human instinct plus the ability to deal with a machine. There are cases where people try to manually tweak complex systems, and it’s never as good as when it’s done by a machine. But there are other cases where the expert uses instinct based upon deep experience to locate thermal hot spots, for example, and then comes up with a better solution. The best architects understand why they came up with a solution and then they tweak it.”
It’s not just one skill set versus another, though. It’s the combination of multiple skills that don’t ordinarily go together. Asking questions using an adaptive learning architecture without understanding the consequences on a design also can be problematic.
“These systems learn from experience, so it depends on what you give them” said Raik Brinkmann, president and CEO of OneSpin Solutions. “The tough problem is to generalize from that data what you want to verify. How do you verify a cat from an image?”
With that comes another challenge. It’s nearly impossible to really grasp what is happening inside an adaptive learning system. “If you look at general machine learning error rates and conversion rates, there is no way to make sure they’re accurate,” said Brinkmann. “Verifying these systems is very challenging.”
One of the keys to solving this riddle is better training of systems. That may require more standardized data sets. Experimentation is still in its infancy on this front.
“Today we are shifting from programming toward learning,” said Achim Nohl, technical marketing manager at Synopsys. “This is all heuristics, so it cannot be proven right or wrong. There is supervised or unsupervised learning, but nobody has the answer to signing off on a system. It may all be good enough, but what is good enough really?”
He said that to effectively train a system requires more real-world testing, where real hardware and software is used. That is happening increasingly for convolutional neural network accelerators, which is at least a starting point on the data gathering side. “You have to expand from design verification to system validation in the real world to have the highest confidence that the system will react correctly.”
That requires a certain level of confidence. Until that confidence is achieved, some of these technologies likely will be used only on the fringes. So while vehicles may be capable of driving themselves, they may still require a driver with a steering wheel and a brake pedal.
“You may have six or seven trucks, with only the first one driven by a person,” said Charlie Janac, chairman and CEO of Arteris. “They’ll be taken to a warehouse at the edge of the city and other drivers will pick up the trucks and take them to their final destination.”
But how far will system designers’ confidence really go? Will they be comfortable allowing people to customize their devices far beyond what they’ve done in the past. Janac said an interesting question for much of this segment is whether people will interact with algorithms directly over a user interface. “Are we going to write code or teach machines?”
The answer to that question could have a big impact on system architectures.
“There are two strategies that work here,” said Steve Teig, CTO of Tessera. “One is to use a higher bandwidth pipe. The second is to send less stuff to the pipe. You might need both, but if you look at machine learning, you may have data compression of a 16-megapixel image and on the other side it comes up as 10 megabytes. So even with data compression, you still need more pre-processing to get to less power. If you look at our eyes, ears and fingers, there is a huge amount of preprocessing going on. If the body sent all of the data it receives to the brain, it would total 20 watts. Fingers can determine whether something is hot or cold without sending that message to the brain to process.”
Chris Rowen, a Cadence advisor and principal at Cognite Ventures, agrees. “We are dealing with vastly greater amounts of data. A car has to be able to do tens of teraflops. It used to be just enough power to run a garage door opener. So now we need more abstract models, where you’re no longer thinking about the flow of the program. It’s the structure of the neural network and how you manipulate training sets. That’s an upheaval in the nature of computing.”
He said that systems will require much more parallelism than in the past to process these algorithms quickly and efficiently. “In the past, if we had a problem, it was typically an old problem with new hardware. But here, we have new problems and the hardware is driven by software more than in the past—and calling it software isn’t even accurate. It’s posing a question. Is that a dog? Where is the road? And that problem has to be distributed and parallel. This is a lot different than the old sort, matrix multiplier. Many of the characteristics of the chip are the same. You still have standard cells, logic synthesis and high-speed I/O. And you have a master CPU and multiple computational elements. But in the past, you believed all of this was compatible. Now there is much more opportunity for parallel architectures running under supervision, whether that is for virtual reality or deep learning or any other type of AI.”
What these systems ultimately will look like or how they will function is still more theoretical than reality. But at least people are beginning to look at the problem differently. That’s a very large first step.
Plugging Holes In Machine Learning Part 2
Short- and long-term solutions to make sure machines behave as expected.
What’s Missing From Machine Learning Part 1
Teaching a machine how to behave is one thing. Understanding possible flaws after that is quite another.
Inside AI and Deep Learning
What’s happening in AI and can today’s hardware keep up?
What Cognitive Computing Means For Chip Design
Computers that think for themselves will be designed differently than the average SoC; ecosystem impacts will be significant.
Convolutional Neural Networks Power Ahead
Adoption of this machine learning approach grows for image recognition; other applications require power and performance improvements.
Decoding The Brain
How will decoding the brain of a fruit fly help us build better computers? Lou Scheffer says we have a long way to go before electronics catches up with image recognition.
Inside Neuromorphic Computing
General Vision’s chief executive talks about why there is such renewed interest in this technology and how it will be used in the future.