Making Tradeoffs With AI/ML/DL

Optimizing tools and chips is opening up new possibilities and adding much more complexity.

popularity

Machine learning, deep learning, and AI increasingly are being used in chip design, and they are being used to design chips that are optimized for ML/DL/AI. The challenge is understanding the tradeoffs on both sides, both of which are becoming increasingly complex and intertwined.

On the design side, machine learning has been viewed as just another tool in the design team’s toolbox. That’s starting to change, however. As the amount of data increases, it is becoming a much more useful tool. Accuracy is increasing, and so are the possibilities for what can be done with this technology — using AI/ML/DL chips and systems to help design and optimize complex chips that also include various AI/ML/DL components.

“For a long period of time, we have been focusing on machine learning, and it is essentially a statistical way of doing processing,” said Suhas Mitra, product marketing director for Tensilica AI products at Cadence. “This is not new, but suddenly the deluge of data gave rise to this field of AI called deep learning, and deep learning essentially is a way to do statistical processing. It’s just that now that there is lots and lots of data to throw around, so better and deeper and bigger models can be created.”

This is where things start to get really complicated. Those models can be processed more quickly using arrays of specialized processing elements, usually highly parallelized to handle matrix multiplication at blazing speeds. And with some version of built-in intelligence, they also can assess different configurations and IP choices to more quickly to optimize power, performance, and area/cost.

“Most AI chips are just big, complex SoCs, so a lot of the traditional EDA works well on it,” said Neil Hand, director of strategy, design verification technology at Siemens Digital Industries Software. “One of the nice things about AI chips is they’re very regular structures. They’re basically big sets of processors or MACs or special memory architectures. They’re very regular structures in that core compute, which then allows the EDA tools to take advantage of that regular structure in order to say, ‘Here’s something that I can layer many times and get better performance.’ At that level, an AI chip is not any different really than a large processor cluster or any other type of design. There are other places where it gets a little bit interesting, such as you can use the AI to make better AI chips because it is just a very large chip.’

That means AI/ML/DL algorithms can be used in EDA tools to manage bigger designs, and to spot potential weaknesses or flaws in those designs. “As the designs get bigger, the amount of data gets unfathomably large, and AI/ML is a good way to get better layouts, better root cause analysis, finding those needles in the haystack. That plays into the development as well,” Hand said.

It also opens up a whole new opportunity for EDA and IP companies, and one that is expected to grow as ML/DL and the broader AI category are integrated into design flows.

“From a software reuse standpoint, we’re increasingly seeing companies wanting to productize functional software building blocks that just so happen to be machine learning inference-based things,” said Steve Roddy, chief marketing officer at Quadric. “If you’re going to write an application for, say, a phone as a target, you’ve got a software development kit that has a bunch of pre-baked software. People have done this for years and years and years. Previously, it all might have been tightly written embedded C code. Now, there is a proliferation of languages, so there are higher-level scripting approaches — things like Python that are quick and easy to write.”

This can significantly shorten time-to-market for some of the most complex designs. “You can get an object detector that’s already pre-trained and ready to go, something like a ResNet that has 1,000 classes, 1,000 objects, and you can just deploy it,” Roddy said. “You don’t have to retrain it. You don’t have to understand training, get all the data sets, sample and performance-optimize it. You name it, you can just say, ‘Here’s this thing that can recognize up to 1,000 different objects and I’m going to use it in my smart appliance in the home, or I’m going to stick it on my Roomba so it can figure out as it wanders around the house common everyday objects it should not bump into.’ You don’t always have to have a custom-created building block of something and custom training. It’s not like Tesla that wants to have self-driving cars, so they have to have all their own imagery and train their own neural nets because it’s a life-or-death situation. But not everything’s life or death. A lot of times it’s just how to quickly get the appliance out the door. I want to get the software running, ship it, move on to the next thing.”

So re-using building blocks that may or may not be heuristic C code or machine learning becomes the norm, and engineering teams can determine whether it is a performance-optimized piece of code. “Is it already an optimized thing? Am I licensing or re-using a software package from someone that’s already done the work? Great. If not, I have to go train my own. So it plays out both ways,” Roddy said.

Memory considerations
Memory in any advanced systems is a critical system architecture consideration, but when it comes to AI/ML/DL chips, it may take AI/ML/DL to figure out which is the best choice and where to place different memories. Performance can vary greatly depending upon the volume of data that needs to be processed, how far it has to travel, and where it is stored and accessed. And that’s just for starters.

“Reduced precision data types are an important feature that enables bandwidth and capacity to be used more effectively,” said Steven Woo, distinguished inventor and fellow at Rambus. “Instead of using 32-bit parameters, 16-bit and 8-bit parameters are commonly used, with some systems even using 2-bit or 1-bit parameters. Using 16-bit parameters allows twice as many parameters to be stored in memory compared to 32-bit parameters, and the scaling goes up as the parameter size goes down. Bandwidth also has benefits, as twice as many parameters can be transferred per second with 16-bit values compared to 32-bit values. There is a tradeoff in terms of accuracy, and designers need to be aware of how reduced precision impacts performance on their AI tasks. Mixed precision processing is a variation where some operations like multiplies are done in one precision (for example, in 16-bit bfloat16 precision), while accumulation is done in higher precision (for example, 32-bit precision).”

Sparsity is another technique that helps to stretch the capacity and bandwidth resources in a system. “Parameters like weights that have values close to 0 can be pruned (set to 0) so that they don’t need to be stored and hence no computation needs to be performed,” Woo noted. “This again saves capacity in the memory since only non-zero parameters need to be stored, and saves bandwidth. Computation gets reduced, as well. As with reduced parameter sizes, sparsity also can affect the accuracy of the model, so architects need to be aware of the impact that sparsity has on their systems.”

The good news is that many of these tradeoffs are familiar to design teams working at the most advanced process nodes. “The same challenges exist,” said Siemens EDA’s Hand. “If you want to take it holistically, you look at the system level, you identify your performance needs, trace those all the way through, make sure that it’s working as you’re doing the implementation. You can utilize your emulation and simulation to see what the performance is looking like. With emulation prototyping, you can start to run real datasets and see if it’s working, or you can model it if you take a lot of the workflow that developers are using at the moment.”

Quadric’s Roddy believes it doesn’t really matter whether you’re running C code or whether you’re running a machine learning code, because both can run on the same platform. “If you’re building your own network or training your own network, you do have to go through a whole series of steps to figure out how much of the network you need to have, even for things like bit precision. It was trained in floating point. Do I need all the precision of floating point when I run inference, or can I convert to int eight, which most people do? You might lose 1% accuracy, but you’ve now shrunk the size of the weights in the model by a factor of four, and you’ve cut energy by a factor of 10, for example. So you have to go through some analysis figuring out that bit. Even the complexity of the network. Let’s say it’s a detector that has lots of classes, lots of different types of objects. Do I really need to be able to recognize 1,000 different objects, or is it a children’s toy and I can trim the number of classes so it only has to recognize the child’s face, or maybe a handful of things that would be found in a child’s bedroom? Speed of development versus the optimality of performance is the tradeoff the developer has to go through. Anything that can be handwritten with heuristic C code could be trained with a neural net probably a lot faster. But you’re going to use more compute cycles each time you run the inference compared to a finely handcrafted piece of control code.”

The goal is to balance performance tradeoffs. “Look at the customer ‘asks’ in those particular segments,” said Cadence’s Mitra. “Understand where the market is headed. Look at some of these applications very carefully and say, ‘If these are the kinds of workloads that we want to enable, what tradeoffs are we going to do in terms of, for example, features like quantization, memory sizing, how many MACs or how many processing elements do I actually put in that particular thing. How scalable and flexible should my design be? What are my tradeoff points? What is the sweet spot of design?’ Also, where are things leveling off? If I create this device with this sweet spot, could I create a variant or a different SKU with a different sweet spot? Each of the sweet spots has its own tradeoffs as far as how much quantization and memory I put down. How much logic do I put down? What is the ratio between compute versus data movement?”

Accuracy is one of the big tradeoffs in AI/ML models. “In an autonomous vehicle application, identifying the street sign or the pedestrian with the highest level of accuracy makes sense, and that shouldn’t be a surprise,” said Gordon Cooper, product marketing manager at Synopsys. “Sometimes it is because a customer will give us a network and say, ‘Tell us the benchmark we’re going to compare against your competitor for how fast you can run this.’ That network has a built-in accuracy. So we have to meet the accuracy that the GPU can do in an efficient accelerator. We’re down to 8 bits rather than 32 bits of floating point — eight bits of integer. To us in the accelerator space, we have to figure out how to provide the same amount of accuracy in a smaller area with more power efficiency. There’s a challenge of quantizing 32 bits down to 8 bits, or maybe some combination of 8 and 16 bits and maybe a few layers. There are multiple cameras and increasing frame sizes, so the data movement is a challenge. Even if you throw a ton of multiplies at it, how do you move the data around to do that efficiently?”

Every time external memory is utilized, it impacts performance and power consumption. “How do you bring in data and maximize the goodness of that data before you have to go out to get another chunk? On top of that, the trend has been to higher levels of performance,” said Cooper. “Hundreds of teraflops of performance is a starting point for ADAS. All the ADAS is sort of a catch-all. We see radar and lidar as well — people using transformers and convolutional neural networks for point clouds. It used to be very much of a DSP problem, and now we’ve seen it’s a DSP/neural network problem.”

Conclusion
Chip design is changing rapidly due to ML/DL/AI, and so are the chips themselves.

“ML opens the floodgates in a manner where new innovative ideas are constantly coming up,” Mitra said. “You are not making another IP that just anybody else can do and adding some better design just from the perspective of design. It’s more about new innovative ideas coming on board and many other designs that have come on board, which have their own tradeoffs. But the floodgates open because everybody will look at these things and see an opportunity to build something that is custom or some new design technology. You can enable some markets that other people cannot, and give 10X or 100X improvement. Again, the floodgates open because there’s no specification, no one way to do a certain thing. That makes it complex but also challenging, while allowing opportunities to extend the boundaries of what we know today.”



Leave a Reply


(Note: This name will be displayed publicly)