ML Focus Shifting Toward Software

Hardware is faster, but software is easier, and for some applications that may be good enough.


New machine-learning (ML) architectures continue to garner a huge amount of attention as the race continues to provide the most effective acceleration architectures for the cloud and the edge, but attention is starting to shift from the hardware to the software tools.

The big question now is whether a software abstraction eventually will win out over hardware details in determining who the future winners are.

“Machine learning historically has come from a place where there was an emphasis on specific end applications, like object detection for automotive or natural-language understanding using voice,” said Sree Harsha Angara, product marketing manager for IoT, compute, and security at Infineon. “Now, ML has started to branch out to several types of applications, which puts more emphasis on design tools and software frameworks.”

Designers of machine-learning systems are starting to demand more from the machine-learning software development kits (SDKs), and some vendors are using their software to bury the hardware details. In the end, less optimal hardware with good software might succeed over better hardware with less software.

We’ve been here before
Compilers and other development tools have long been taken for granted on the software side. But when it came to hardware in the early ’80s, designers were expected to do their designs largely manually. While this has changed today, it took a rocky road to get there. And the programmable logic market was one of the first places where design software made a stand.

At first, PLD software served merely to eliminate busy work. In the early days of programmable logic devices (PLDs), engineers would work out their logic and then figure out which connections were needed in the programmable arrays. These literally would be handwritten in “fuse maps,” which then could be entered into programming equipment for physically configuring a device.

A big change happened with the introduction of programmable array logic, or PALs, by Monolithic Memories. Two things ramped up the PLD industry overall. First was an architectural change that reduced cost and increased speed. But more influentially, it saw the first release of a so-called PAL assembler, dubbed PALASM.

This eliminated the tedious process of mapping out the connections. Instead, one could enter Boolean equations — much more natural for engineers to work with — and let the tool figure out the programming details. This helped to create a serious inflection point in the business.

A few years after that, new entrants to the PLD market appeared, providing more architectural programmability. There always had been logic such devices couldn’t implement, so the race was on to come up with the most flexible architectures that were still fast and cheap.

A significant turning point came when Altera released its new architecture with MAX-PLUS design software. Engineers struggled to find the boundaries of what that architecture could do, and were pleasantly surprised. Equations that were expected to fail compilation actually worked, because the software was now doing more than simply converting Boolean equations into connections. Instead, it also was transforming those Boolean equations by, among other things, applying DeMorgan’s theorem to convert things that couldn’t be done into things that could be done.

This was a game changer, and it significantly raised the stakes for the software side of the business. Try as you might to point out architectural weaknesses in the hardware, as long as the software masked those weaknesses, it was a losing argument.

The business had to that point been dominated by hardware companies, with AMD the leader (having acquired Monolithic Memories). But hardware companies weren’t good at making software, and they didn’t like doing it. So rather than struggle with software, AMD outsourced the work to a small software company. That turned out to be a strategic mistake, and the company ultimately went out of business, selling some technology to AMD’s newly spun-out Vantis subsidiary (which was ultimately not successful), and some to Xilinx, then a newcomer on the scene with its fancy FPGAs (which were wildly successful).

In the end, it was the software that determined the winner. Hardware features followed a leapfrog pattern, and the tools proved to be the differentiator.

“History is littered with remarkable FPGA hardware architectures that all stumbled and exist no more, in part due to the lack of an effective toolchain,” said Stuart Clubb, principal product manager at Siemens EDA. “It doesn’t matter how good a hardware platform is if you don’t have the software to efficiently and effectively leverage that platform.”

A machine-learning parallel?
Today, all eyes are on artificial intelligence, and companies are trying to find the best architecture for machine-learning accelerators that will work either in the cloud or at the edge. While cloud instantiations have dominated, edge implementations — with their tough power and cost requirements — have spurred much more creativity.

But it’s become increasingly difficult to understand the impact of the differences between these architectures. It almost feels as if the big ideas are behind us, with companies pushing and pulling on different parameters in order to find a sweet spot for performance, power, and cost.

“The problem that people are running into is that they have these hardware solutions and are not able to optimize them or get the level of utilization that they want,” said Dana McCarty, vice president of sales and marketing, inference products at Flex Logix.

Machine learning always had a software element to it, since the whole idea of training and implementing a machine-learning solution is no mean feat. The existence of early machine-learning frameworks like Caffe and TensorFlow made AI applications practical and accessible to a larger number of developers.

Even so, there’s a lot of work involved in creating a full design — especially for edge applications. Advantest provided an example of using AI on test data — an application that is far less prominent than vision, for example.

“We get this big uncleaned data set, and it will take several weeks of manually going through it, applying tools manually,” said Keith Schaub, vice president of technology and strategy at Advantest. “You’ve got hundreds of features, sometimes 1,000 features or more, and you’re trying to figure out the most important features that correlate to your prediction. They come up with those through regression methods and [principal component analysis]. And then you have this static model with 12 features.”

And that’s just the high-level feature engineering. The next part is mapping that model efficiently onto a specific device, which is where a competitive advantage might appear. “You have to be able to bridge that model efficiently onto your accelerator, and the company that can do that bridging is going to win,” said Sam Fuller, senior director of marketing at Flex Logix.

While cloud-based engines can largely rely on full floating-point hardware, much edge hardware saves power and cost by focusing on integer implementations. That means taking a design and the trained parameters and quantizing them — converting them from floating-point to integer. That introduces some errors, which may hurt inference accuracy, so it may be necessary to retrain with the desired integer format. Tellingly, over time, it has become easier to train directly to integers.

Reducing the size and energy consumption of these designs also has required work. One might go through the design identifying parameters that were likely too small to matter and then pruning them away. This was originally a manual process, which required re-evaluating the accuracy to ensure it hadn’t been too badly compromised by the pruning process.

Then there’s the matter of adapting software “kernels” for whatever type of processor are being used. This is often bare-metal code written for each node in the network. Some architectures choose processing elements that focus only on the common instructions being used for inference. Others maintain “full programmability” so they can be used more flexibly beyond inference.

And if you happen to have a dataflow architecture, then you might need to partition the hardware up and assign different layers and nodes to different regions.

These are just a few of the things that must be handled in order to implement a fully functional machine-learning application. Some of it has been manual, but the amount of software automation gradually has been ratcheted up.

The next stage: keeping hardware secret
Over the last year, a change has become visible in the industry. At conferences such as the Linley Processor conferences or Hot Chips, companies have been announcing new offerings with more discussion of the software. And notably in some cases, they really don’t talk about the underlying hardware.

That, of course, can happen at public forums like conferences. Sometimes companies divulge the details only under NDA to legitimate sales prospects. But the tenor of the conversations seems to have moved increasingly in the direction of saying, “Don’t worry about the details. The software will take care of that.”

That significantly changes sales discussions from one of trying to convince a prospect that subtle architectural differences will have meaningful results, to one where design experience provides the proof. Can you implement a trial design more quickly than you’ve been able to do before? Do you hit your target performance metrics — cost, speed, power, accuracy, etc. — with minimal manual iteration? Can you achieve a good-to-go implementation with little to no manual intervention?

If the answer to all of those is yes, then does it matter how the hardware achieved that? If the software can perform transformations as needed quickly, does it matter whether the underlying gates had a limitation that required software transformation? If another architecture has far more bells and whistles, but it takes much more effort to complete a design, are those extra features worth that effort?

Companies that rely on their tools to do the talking are betting their customers really don’t care what’s under the hood – as long as, paired with good software, it can do the desired job and the needed speed, power, accuracy, and cost.

Will history repeat itself with machine learning?
Much like people, companies tend to have personalities. Some are hardware-oriented, while others are software-oriented. Some of both clearly is required for any machine-learning offering, but it feels like the edge may be moving toward those with a software orientation. That means keeping software front and center as a star of the offering, not as an annoying but necessary cameo walk-on. It also means that the hardware and the software need to be designed together.

The need for software to buffer the details is probably greater with ML than it was with FPGAs. Even with tools, FPGAs are designed by hardware engineers. ML models, on the other hand, are designed by data scientists, who are many levels away from the hardware. So tools need to bridge the abstraction gap.

“Unless you talk their language, you don’t stand a chance,” said Nick Ni, director of product marketing for AI and software at Xilinx. “Every vendor is talking about TensorFlow and Python support because they have no other way. Like it or not, you have to support it. But in order to support such a high framework, you have to do everything in between.”

Another failure in the PLD industry was designing clever architectures only to find afterward that software was extremely hard to build for it. The most successful hardware and software teams worked together, with the hardware being tweaked as needed to allow for smooth and powerful software algorithms.

Fig. 1: The evolution of design tools, beginning with manual designs and progressing through elimination of tedium, actual ability to manipulate designs, and, finally, to optimize them. With the latter stages, hardware/software co-design is critical for success. Source: Bryon Moyer/Semiconductor Engineering

Fig. 1: The evolution of design tools, beginning with manual designs and progressing through elimination of tedium, actual ability to manipulate designs, and, finally, to optimize them. With the latter stages, hardware/software co-design is critical for success. Source: Bryon Moyer/Semiconductor Engineering

This will be true for machine learning, as well. If a clever hardware trick is hard to leverage in software, then it likely will never be used. Ultimately, the most successful offerings probably will be architectures that pair well with their tools and that have shed any features that can’t be effectively used by the tools.

“One of the fundamental premises of the company is that the needs of software must drive the design of the hardware,” said CTO Nigel Drego at last fall’s Linley Processor Conference.

At that same conference, Ravi Setty, senior vice president of Roviero, mentioned the role of software in defining the company’s architecture. “We have added maybe 5% complexity in the hardware to achieve something like 90% simplicity in the compiler. The hardware is purely agnostic to any of the neural net information. It’s the compiler that has all of the knowledge. And the hardware – it’s just an execution engine.”

While the role of tools is growing, we’re still not at the point of them completely burying the hardware. There is still a rich mix of architectural exploration that has yet to settle out. As with many design automation trajectories, we’re entering the realm where many designs will be do-able automatically, with hand-tweaking necessary for getting the most out of the hardware.

At this stage of the market, there’s also a tension between more generalized architectures with software abstraction and purpose-built architectures. “While a general-purpose hardware solution that is software-driven may offer greater flexibility, such solutions frequently lose to specialized hardware when a certain dimension (area, power, speed, cost) is of greater importance,” noted Siemens EDA’s Clubb.

This can create a challenge for software targeting specialized hardware. “Every architecture has unique advantages and is optimized for specific use cases,” explained Anoop Saha, senior manager, strategy and growth at Siemens EDA. “But the challenge for the user remains — how can they compile their network on a specific hardware architecture? And if they are able to do it, how can they optimize it for that particular hardware and leverage the different components available? The hardware-specific optimizations and flexibility need to be handled by software in a more automatic manner.”

Do tools rule?
Ultimately, then, it feels like the long-term hardware winners will be the ones that provide the best design experience, with only enough hardware exposed to help developers with their design decisions. That’s certainly the way FPGAs work today. In fact, in some cases, there are things the FPGA hardware can theoretically do that the software won’t allow.

ML appears to be following a similar path. “Innovations in hardware that provide significant advantages in power and speed are wrapping themselves underneath a common software framework or API,” said Infineon’s Angara. “This means they provide significant gains in running ML without the pain of ‘unsophisticated’ software.”

It remains to be seen whether engineers will stop thinking about the hardware. “Can ML ‘compilers’ be smart enough to target generic hardware platforms to the point where the hardware doesn’t really matter? Probably not,” said Clubb. “ML hardware specialization certainly has advantages and disadvantages in locking down the flexibility and reprogrammability of the solution. Inventive architects and hardware designers always will need to engineer more effective solutions when the general-purpose solution does not meet the needs of the application.”

Scaling and the bulk of the market may affect this, however. When synthesis was new, there were many engineers who thought they could always do a better job than a tool. That might have been true, but it became impractical as designs scaled, productivity expectations rose, and the tools improved.

So while hardware will always matter to a certain extent, it looks as if in the long term, just as it was with programmable logic, software tools could often end up being the kingmaker.


dev dutt says:

Insightful, very well written.

Leave a Reply

(Note: This name will be displayed publicly)