The Next Phase Of Machine Learning

Chipmakers turn to inferencing as the next big opportunity for this technology.


Machine learning is all about doing complex calculations on huge volumes of data with increasing efficiency, and with a growing stockpile of success stories it has rapidly evolved from a rather obscure computer science concept into the go-to method for everything from facial recognition technology to autonomous cars.

Machine learning can apply to every corporate function, and it can have an impact on companies in every part of the economy. So it’s no surprise that funding is pouring into this sector. A survey by McKinsey & Co showed that total investments in AI development tripled between 2013 and 2016. Most of that — $20 billion to $30 billion — came from tech giants. Those companies expect that machine learning, and other AI models that descend from it, will be as critical to their customers in the future as mobility and networking are now.

What makes this technology so attractive is that machine learning and other forms of AI can be applied broadly and still produce dramatic benefits. Gartner predicts that by 2020, AI technologies will be pervasive in new business software and will be a top-five investment priority for 30% of CIOs.

In fact, most of the big pushes into this market are by established companies that can leverage their developments in other areas.

• Nvidia has emerged as the dominant player in GPUs, the platform of choice for machine learning’s training phase. So far, this is where much of the focus of machine learning has been.
• Intel has introduced the Nervana Neural Processor, a low-latency, high-memory bandwidth chip it says was purpose-built for deep learning. (Intel acquired Nervana in 2016).
• Google’s Tensor Processing Unit (TPU) already has taken a chunk out of the market for machine-learning accelerators. The second version, Cloud TPU, is more of a high-performance cluster of TPUs designed to compete as a training module against Nvidia than the first version, an ASIC Google developed to accelerate inference on its own servers its speech-to-text app.

The bulk of this work is on the training side, which is the first step in a two-phase process. This piece is largely confined to data centers and cloud operations, and it’s a huge market in its own right. Linley Gwennap, principal analyst at the Linley Group, projects the market for datacenter-oriented AI accelerators will reach $12 billion by 2022.

“During the next year or two we’ll start seeing a lot more choices out there for datacenters and other devices,” said Gwennap. “So the question facing the Googles and Facebooks of the world is, ‘Do I keep designing my own chips? Or, if I can get something just as good on the open market, should I do that?'”

The inferencing opportunity
The second phase in machine learning is inferencing, which is basically applying the learning phase to specific applications and market segments. This is where algorithms are put into real-world applications, and the projected opportunity is even larger. The result has been a stampede of VC-funded startups, few of which have shipped or demonstrated any products yet, as well as a big push by existing companies into this space.

“Inferencing and training are quite different,” said Jem Davies, an Arm fellow. “Inferencing is where you can do all sorts of wacky things, like sorting cucumbers, or many useful things. It’s closer to the user, which is why you’re seeing ‘interesting’ use cases. But it’s also in mobile phones today with predictive text, which started 25 years ago, and facial detection and recognition.”

Inferencing is an important component of assisted and autonomous driving, as well, where data collected from sensors needs to be pre-processed based on machine learning.

“Inferencing needs to happen at the edge,” said Pulin Desai, product marketing director for the Tensilica DSP group at Cadence. “In a car you may have 20 image sensors, plus radar and LiDAR, to provide a 360-degree view. But if you put an image sensor on a car, it might have a 180-degree field of view. That requires distortion correction, which is image processing.”

A key difference between training and inferencing is that the training is done in floating point, while inferencing uses fixed point. DSPs and FPGAs are fixed point.

“We’re moving out of the point where everything is solved using an x86 processor or people optimizing hardware for a specialized workload,” said Geoffrey Tate, CEO of Flex Logix. “Most computing is going to be done out of datacenters, so the role of FPGAs and other things will have to change—although you’ll probably still see a mix of traditional architectures and new ones as the need to support audio and video expands. I see us all as accelerators.”

In machine learning, FPGA and eFPGA players are lining up for a piece of the inferencing market. All told, Linley estimates there will be 1.7 billion machine-learning client devices by 2022.

“GPUs have gained a lot of attention in the learning phases of machine learning,” said Robert Blake, president and CEO of Achronix. “But the larger market is going to be on the inferencing side, and the cost and power consumption of those products is going to be critical. That’s why an embedded solution in those spaces will be attractive.”

Arm’s Davies agreed. He said power budgets remain in the 2- to 3-watt range, while battery technology improvements have been relatively flat. Lithium battery improvements typically have been in the 4% to 6% per year range. The required increase in compute performance to make all of this work, in contrast, is orders of magnitude.

That will require a different architecture, including an understanding of what processing gets done where.

“We’re seeing all sorts of AI, neural networking chips and cores,” said Steven Woo, distinguished inventor at Rambus. “What’s happening at a higher level is they’re fusing information together. There is a lot of exploration going on. What you’re seeing now is a lot of companies looking for major markets to build infrastructure around. You see that with cell phones, where there are billions of units. Those are driving new packaging infrastucture. You also see it in automotive, which has a lot of money behind it. And with IoT, the potential is there, but the challenge is finding commonality. And with neural networking and machine learning, there seem to be new algorithms every week, which makes it hard to develop a single architecture. This is why you’re seeing so much interest in FPGAs and DSPs.”

Defining machine learning
It doesn’t help that companies are using terms interchangeably for machine learning, deep learning, artificial intelligence and neural networking. While all of these are nuanced, the general idea is that with enough real-time data, computers can weight a number of different scenarios and respond with the best option based upon those pre-determined weights. The weighting process is part of the training and inferencing that are central to machine learning.

Deep learning is machine learning in the extreme – featuring more layers of different types of analysis and, ultimately, more complete solutions at the expense of more compute resources to get the training done. Both often involve neural networks, which create mesh-like connections around information nodes in much the way neurons in the human brain make mesh connections with cells around them. Artificial intelligence is something of an umbrella term that means many things to many people, from IBM’s Watson to HAL in the movie “2001: A Space Odyssey.” But the general theme is that devices can learn behavior independent of explicit programming.

Who’s using ML
Machine learning has become very common in customer-focused applications to project sales, look for signs of customer churn, provide customer service via interactive voice response or online through a chatbot, or consumer applications such as Google’s Translate.

Facebook uses three deep-learning applications to filter uploads, for example, one to recognize faces and tag people in images being uploaded, one to check posts for hate speech or other objective content and one to target advertising.

“What has surprised me is how fast the revolution in deep learning has been. During the last three years all sorts of applications have flipped almost overnight from being done in the traditional method to deep learning,” according to Bill Dally, chief scientist and senior vice president of research at Nvidia. “It doesn’t take a huge investment in software; you take the application, train the network and you’re done. It’s become pervasive in some areas, but for every application that’s flipped to neural networking, there’s another ten that can flip.”

Most of the adoption outside the tech industry has been experimental, while most of the AI adoption inside the tech industry has been either to enable or improve other services or add new ones to offer customers, according to a McKinsey. Of more than 3,000 companies surveyed, only 20% said they use any AI related technology in a significant part of the business. Surveying 160 AI use cases, McKinsey found AI deployed commercially in only 12%.

Or looked at differently, 88% of companies still have not deployed AI commercially, which is a huge opportunity. Tech companies, in contrast—including Google and Baidu—spent between $20 billion and $30 billion during 2016, 90% for R&D and 10% for acquisitions.

Deep learning as the next big thing
Deep learning may be good in customer service and analytics, but it’s also the primary candidate for systems that provide the instant sensing, decision-making and control needed for autonomous vehicles, according to Nizar Sallem, principal engineer, sensor fusion at Mentor, a Siemens Business.

“The most important application in machine learning is understanding the environment around the car, the different actors on the road and the context based on traffic rules and expectations where the vehicle is at that moment,” Sallem said. “It has to identify what your behavior should be, but also when you are allowed to break the rules to escape danger, or to protect the human beings in the car.”

Market predictions
However capable AI technologies might be, development is still in its infancy. The main providers are still existing tech companies, and the biggest moneymakers are still services to consumers. That includes Google’s speech-to-text and translation services and consumer-interaction/customer-service applications from Amazon, Facebook, Baidu and others, according to a Tractica report. That report estimates AI-driven consumer services were worth $1.9 billion in 2016, and would rise to $2.7 billion by the end of 2017.

Fig. 1: AI revenue by technology. Source: Tractica

Tractica estimates the entire market for AI – including hardware, software and services – will rise to $42.1 billion by 2025.

Fig. 2: AI revenue by segment. Source: Tractica

Machine learning as a service (MLaaS), is a different category—73% of which is owned by Amazon, IBM and Microsoft. That is expected to grow from approximately $1.07 billion during 2016 to $19.9 billion by 2025, according to an April report from Transparency Market Research (TMR).

The bulk of machine-learning-enabled services are currently aimed at consumers, accorcing to Tractica – a category that includes Google’s Translate and speech-to-text applications that served as proofs of concept for its customer TPU.

Switching from customer to competitor
The advent of deep learning also has highlighted some increasingly complex relationships between the semiconductor industry and its largest customers, especially Google and other hyperscale datacenter owners large enough to spec and build their own servers and chips.

Chip companies have worked for years to build or customize silicon to the needs of specific cloud customers. Consider Intel, for example, which built FPGA DL accelerators for Microsoft, FPGA-based app accelerators accessible to Alibaba cloud customers. Intel also enlisted Facebook to help design packaging for the rollout of Intel’s Nervana Neural Processor and on its upcoming “Lake Crest” ASIC for deep learning.

Google has announced other chips, including news that it had developed a machine-learning co-processor for the Pixel2 phone, its first mobile chip. Google has also developed Titan, a microcontroller it attaches to servers to make sure they don’t boot with a faulty, corrupted or malware-infected software on board.

Google justified its investment in the first TPU by saying it could deliver “an order of magnitude better-optimized performance per watt for machine learning” and push Google’s ML apps ahead by about seven years. The first TPUs were designed only to accelerate ordinary servers running the inference of a machine-learning model, not train the model in the first place. As such, they didn’t compete directly with Nvidia or Intel’s ML training products.

When Google announced its Cloud TPUs in May, its claims sounded more directly competitive with those of Intel and Nvidia.

Google described Cloud TPU as having 180 teraflops of floating-point performance apiece, but packaged the units into four-TPU Pods that contain a total of 11.5 petaflops. The configuration seems designed to compete with Nvidia’s highly regarded DGX-1 “supercomputer,” which contains eight top-of-the-line Tesla V100 chips, and claims a collective top throughput of one petaFLOP.

Competition from the cloud
“Google and some others have had early successes without acceleration, or with just the TPU, but some nets are easy to train; standard image searching is easy,” Dally said. “But for training with increased signal processing – handling image and video streams, and for people who are retraining their networks every week or who put a lot more emphasis on training, GPUs are much more efficient.”

The question is whether a new processor from Google will be enough to steal customers away from the rest of the business, and the answer is probably ‘no.’ Any cloud provider has to support more than one architecture, so a deep-learning-enabled datacenter will be a smorgaboard of CPUs, GPUs, ASICs, FPGAs and IP from a range of technologies, according to Chris Rowen, former CTO of Cadence’s IP group who founded Cognite Ventures to fund and advise startups in neural network, IoT and autonomous embedded systems.

Some of the training load could also shift to put more weight on all the billions of inference engines to be built into client devices, Rowen said. There is definitely opportunity for many companies in that space; for machine-learning training stints on datacenter servers, however it will be hard to displace established players.

Fig. 3: Evolution of cognitive computing. Source: Cognite Ventures

“There is good reason to want choice, but there are a lot of options, and Intel, Qualcomm and the others are paying attention,” Rowen said. “It’s not a good idea to assume, just because you have a neural network for a smartphone, that you can out-manufacture Samsung.”

—Ed Sperling contributed to this report.

Related Stories
The Great Machine Learning Race
Chip industry repositions as technology begins to take shape; no clear winners yet.
Using Machine Learning In EDA
This approach can make designs better and less expensive, but it will require a huge amount of work and more sharing of data.
Machine Learning Meets IC Design
There are multiple layers in which machine learning can help with the creation of semiconductors, but getting there is not as simple as for other application areas.
CCIX Enables Machine Learning
The mundane aspects of a system can make or break a solution, and interfaces often define what is possible.
Machine Learning Popularity Grows
After two decades of experimentation, the semiconductor industry is scrambling to embrace this approach
The Darker Side Of Machine Learning
Machine learning needs techniques to prevent adversarial use, along with better data protection and management