From AI Algorithm To Implementation

Experts at the Table, part 2: The transformation from algorithm to implementation has some significant problems that are not being properly addressed today.


Semiconductor Engineering sat down to discuss the role that EDA has in automating artificial intelligence and machine learning with Doug Letcher, president and CEO of Metrics; Daniel Hansson, CEO of Verifyter; Harry Foster, chief scientist verification for Mentor, a Siemens Business; Larry Melling, product management director for Cadence; Manish Pandey, Synopsys fellow; and Raik Brinkmann, CEO of OneSpin Solutions. What follows are excerpts of that conversation. Part one can be found here.

SE: There is a separation between inferencing and learning. Each task has different needs, and yet algorithm development often does not look at the needs of inferencing. That is another separation that has to be dealt with.

Melling: It is an interesting problem. It is similar with imaging algorithms, where the algorithm developers work in Matlab and are working with floating point mathematics and matrix math. The job is to come up with the right algorithm, and then you pass that over to a poor deployment person who has to try and make sense of it. They start to slice and dice and cut it and stream the data, converting from floating point to fixed point approximations. All of that starts to happen, and at the end what is going to drive their success or failure is the success of the end-market solution. It is similar to what happens in the GPU world where there were matters of interpretation. Much was in the eye of the beholder. People who do the approximations and manage the process of making it efficient and deployable will be the ones who will win in the marketplace. This is crossing over the line from theoretical to deployable, and doing that efficiently so that it becomes available to the mass consumer market.

Letcher: Those two ways are very different from a verification perspective, as well. You can verify the inference as a complete unit. You have a device where you can pass a few vectors through in a traditional way, randomly or whatever and determine if it does what you intended it to do. But the training problem is almost hopeless to verify as a combined hardware/software solution. You can never run the necessary amount of data.

Brinkmann: If you look at the GPU companies, the winning companies were partly due to their superior hardware, but more so because of the platform. Once they have enough users using it, they have a nice API, they have quality measures in place.

Melling: And their verification approaches, too. Nvidia spent a fortune on fast platforms, emulation, to be able to run so many vectors.

Brinkmann: But if you talk to anyone who is building an AI chip for the edge, they all do a software stack. They start with a model from Caffe or TensorFlow and they take over the whole task of mapping that to the architecture. They want you to use an API and measure against quality metrics, but they keep that as a secret. If it is an FPGA synthesis algorithm underneath, or a new architecture, then as a user you don’t care. You didn’t care when it was a GPU either. There is an interface and you program against that and the ones who could best map that to the architecture was the winner.

Letcher: That is true when you are selling a general-purpose device. That is when a different person is going to program it. If you are building a device that will go into a self driving car, and you are building the entire device as a unified piece of hardware/software, then you have full visibility.

Brinkmann: That is true, but I don’t believe it will scale without using the platform idea. It will fail because the investment in such a platform will be huge and if it is used only once, it will not be a good investment. You need to have something that can be reused, even specifically in self-driving cars where you get new data, new requirements and you cannot exchange the hardware in the field. You have to have a reusable hardware platform.

Letcher: Agreed. It needs to be able to evolve. TensorFlow is becoming the abstraction level.

Foster: This is key because the actual hardware defines the specific terms of the optimizations. But at the same time, there has to be the abstraction just because of the investment levels that are involved.

Hansson: The analogy with the GPU is good and relevant. But is there a difference with debug? When you debug machine learning it is hard to understand what is going on. But if you don’t like a picture, you can probably trace it back to needing more of this or more of that.

Melling: That is a big challenge. I am not sure that the low-level units that make up the matrix that becomes the neural network, that the programming that occurs on those low-level units is even understandable in context. Debugging at the instruction level – it is not even like a hardware/software thing, and while I not familiar with microcode debug on a GPU, I assume it is more like that in that you don’t think of people diving in and looking at execution on any particular compute component.

SE: In order to go from floating point in TensorFlow, or whatever algorithmic platform, into a fixed-point representation, you have to go through quantization. Now you are manipulating the network and that mapping may not be deterministic. The execution remains deterministic, but when you introduce an error into a weight, do you know what you have affected? How do we verify the transformation?

Melling: Quantization creates accuracy problems.

Letcher: Are we talking about algorithm verification or traditional RTL verification? By the time it gets to RTL world, the quantization is determined. It has been reduced from 64- or 32-bit floating point to 16 or even 8 bits.

Melling: There is an intermediate. There is the algorithm, which is largely developed in floating-point using the full capabilities and compute of the cloud, and then there is the quantization that happens for deployment. That is the next level of the algorithm.

Letcher: And they are pushing to use reduced precision floating-point.

Brinkmann: Has anyone thought of making sure that when you make mathematical transformation that they are legally okay? Precision is not exactly the same thing. There is a deliberate decision to do quantization, and if you are on fixed-point you may be able to keep everything in check. But if you use floating-point and you shift a few operations around to optimize or map to a specific hardware, you are actually introducing another error. This is less transparent to most people.

Pandy: For the longest time, it was accepted that you would do FP32bits computation, but for most calculations you do FP16 even for training. Now for inferencing or when you have hardware resource constraints, then people ask if they can compress it further. Can you use an 8-bit presentation? That is when you start to do quantization. Now, it is not that you just throw away bits from the model. It would perform terribly. There is a loop involved. You may start with a 16-bit model and you reduce it to 8 bits. Then you retrain the weights, you readjust things, and potentially cluster and center them. Why things work in the end is an open area of research, but a lot of it is proof through results. If you suffer 1% loss in accuracy, that may be fine.

Brinkmann: So the best I can get is, ‘It kind of works.’

Pandy: That is the whole thing. When people came up with deep learning architectures, they found that it was better than anything ever considered in the past. It is a different thing to find out why it works, and yet when you put in a picture with a couple of pieces of paper on a stop sign it becomes something else.

Brinkmann: When I started work on proof engines, I was debugging one and there were some heuristics being used. I found a bug in one of the heuristics and after fixing them, the results got worse. What do you do? Should we leave the error in because it had demonstrated that it works? This sounds similar. Maybe you don’t need to look for the last RTL bug or the bug in a transformation because if you just assess the results, it may be enough and work. So why do we need all of this verification? Just run it through. What is the right middle ground? I don’t believe it will be just measure at the end.

Pandy: As an alternative, do we wait to have the most perfect explainable system?

Hansson: The question is what is good enough?

Brinkmann: Having a platform that gives you a predictable result with no obvious bugs—so you can trust the platform and you can trust the process that you are using to map the algorithm to the platform—is important because of debugging. If you can’t pinpoint a problem easily, you want to be able to debug it on the algorithmic side and see what needs to change there. If you are trying to optimize the algorithm against a bug in the hardware, good luck. There is a strong demand for having a platform that you can trust, and that includes the software-to-hardware mapping.

Letcher: There is an extension to that, which is IEEE 26262 and standards where you have to formally prove that, especially in automotive applications, it is going to work. How does that overlap with these types of issues? You can’t just leave in errors that seem to work.

Pandy: Let us separate two things. Early on we talked about two tightly interlocked cycles of verification. You have a hardware-level of abstraction. What the hardware does when you perform a matrix multiplication had better work. You have an interface, and the hardware better represent that. Now we are talking about taking AI algorithms and mapping them onto hardware. That is algorithm creation itself. Why does it work? How does it work? There are only existence proofs that it works. You have a data set and you hope it has been curated well.

Melling: Until someone comes out with one that works better.

Pandy: If you have an ML algorithm, which has a 99% test accuracy, I use that for spam filtering or reordering and prioritizing my mail box. That is pretty good. If I take 99% accuracy and apply that to my car, or a cancer diagnosis – you are missing out on 1%. Is that good enough? That is a societal question. A doctor’s assistant may help screen results to get fewer holes, and so long as there are no false positives that may be fine. You have to be very careful as to how to deploy. DARPA is working on explainable AI. Can you explain how you are making these decisions? We want to make sure that whatever hardware abstraction is presented to the software system is high fidelity. How do we get the whole process, including the dataset, integrated into a good flow?

Melling: I read an interesting piece about the economics of AI. The point was that machine learning and AI will commoditize prediction, and value becomes judgment. Getting to 90% or 99% is automatable, but it is that 1% where judgment has to be applied, and how do you apply judgment when you fall into that space? That will be the differentiator for providing big value. Here is what we know, and here is what we don’t know, and the value is in the don’t knows. Invest on risk.

Leave a Reply

(Note: This name will be displayed publicly)