Tools To Design CNNs

If neural networking architectures are simpler than a CPU, why is it so difficult to create them?

popularity

Convolutional neural networks are becoming a mainstay in machine learning and artificial intelligence, allowing a network of distributed sensors to collect data and send them to a central brain for processing.

This is a relatively simple idea in comparison to today’s technology, and the idea of the convolutional neural network has been around for some time. But building them into hardware and software is proving rather daunting, given the complexity of the systems they inhabit.

“Almost every company is working on it—Google, Intel, Qualcomm, Lattice Semiconductor, NVIDIA,” said Norman Chang, chief technologist for the semiconductor business unit at Ansys. “In terms of doing the architecture, it is simpler than a CPU. So the design of a convolutional network chip I don’t think is more complicated.”

Those designs also can leverage standard tooling from RTL on down. But there is a mindset change required to make all of this work. Neural networks lend themselves to very high degrees of parallelism, using very wide data paths or very large numbers of cores operating in parallel.

“You’re talking about some big numbers in terms of the width of everything or the number of replications of that function,” said Chris Rowen, CEO of Cognite Ventures. “As such, the capacity of the tools and the ability to deal with very wide data paths and very long wires is somewhat stressed in those circumstances.”

And because engineering teams want process portability, as well as the ability to make tradeoffs between area and power and clock frequency in the same way that they do on other kinds of high-performance logic design, the mainstream digital libraries, synthesis, place-and-route and simulation tools are directly relevant and used very heavily.

“Where it starts to look different is when you’re moving up in the stack and asking how to program this,” Rowen explained. “How do I think about prototyping an application that runs on this? That’s where the programming model differences between neural networks and other kinds of computational elements starts to really get pretty large.”

The real essence of a deep-learning application is embodied in its data set, which is used to train the network. This is the creative part that really defines the functionality of the system, but it isn’t written in a piece of C code or Matlab code or any other conventional programming language. Instead, it’s written in the data set.

“The actual structure of the neural network evaluation may be captured in a piece of C code or Python code or Matlab code or Tensor Flow code, but that is really just a representation of what’s really a fairly simple and generic structure that has been trained,” said Rowen. “It’s sort of a skeleton on which the flesh of the training and other data has been applied. The nature of the application and the meaning of programming is really quite different in the context of these neural network-based applications. So it’s less driven by conventional programming expertise, it’s less about somebody who pulls an all-nighter and comes up with a new algorithm embodied in a piece of C. It’s much more about tapping into a new data source that allows them to train in a very different way.”

Samer Hijazi, senior design engineering architect in the IP group at Cadence, said that in the near future, for CNN technology to propagate properly, there are two products that need to — and will — come to market.

First, new hardware architectures are needed. These will show up as chips and as IP, and will become available to the chip and system designers. Second, enhanced tools are in the works that both enable design of CNNs and allow optimization of the network from a power point of view to automate network design for power-conscious applications.

“The Tensor Flows and Cafes of the world either need to evolve or be augmented in order to be mindful in the design process of the architecture of the power question,” he explained. “There has been significant research on this topic, and the groundwork for this is being done today from technology and science. What does it mean to design a power-conscious CNN or deep-learning algorithm for a particular application? There are two concepts that are being exploited and analyzed. One is redundancy elimination. The second is sparsity in the design. These are the two primary threads that people are trying to implement or inject into the design process in order to assure that the models are as concise and as limited as they can be. The concepts need to be automated in the design tools so the power parameter becomes just one of the optimization dimensions in the process of the network design. Over the next two years, we will start seeing these tools gradually coming to market.”

Rowen noted that the kinds of tools at the higher level are different, and so in many ways programming is replaced by training. “Therefore, training tools replace programming tools and data management replaces software management. Thinking about verification, it’s actually pretty easy to get the hardware right. But now the correctness of the system has less to do with the correctness of the hardware or even the correctness of the program and much more to do with the adequacy and the correctness of the data set used for training. In the past if a flaw was discovered in some embedded system using traditional programming methods there would be some software guy who hit the heel of his hand to his forehead and said, ‘Oh, I didn’t think of that.’ Now there’s going to be some data manager who hits his palm to his forehead and says, ‘Ugh, my data set doesn’t have any examples of that situation.’ So the paranoia about verification doesn’t go away. It shifts to a new place.”

In addition, the tools are likely to come from less-likely sources than EDA users may be accustomed to, given that different organizations will have different approaches including whether the technology will be pre-training or post-training, Hijazi said. One example of this is quantization, which can include converting the network from floating point to fixed point. “This is a topic in discussion now. Industry and academia have converged on this as a good idea because deep learning does not need to be in floating point. However, all of the frameworks are designing networks in floating point, and the quantization is happening as an after-thought after the network is designed. Still today, to do proper quantization is a topic of debate, and there is no specific tool in the industry that can convert floating point to fixed point.”

A next step will be training in a quantized format, which is even less developed at the moment, he said.

However, in the area of redundancy elimination, technologies are more mature. Redundancy elimination is the notion of designing the networking and pooling the network layers to make sure each layer is as small as possible, and does not have redundancy so that two filters aren’t doing the same job. “The ideas are clear. But how to augment existing frameworks to support that is a topic of software architecture that is not easy to answer. There is no one answer. It can be as part of a framework design tool itself, or it can be a layer on top of it. Both are equally valid options,” Hijazi added.

Other tool companies are currently taking the approach of using compression and pruning techniques in existing development tools and hardware in order to reduce bandwidth and computations, noted Gordon Cooper, product marketing manager for Synopsys’ embedded vision processors.

For these hardware providers, the tool is easily half the battle, he said. “Half of our R&D staff is focused on the tool side, not just the R&D side, because you have to have tools that are easy to use and effective and efficient. At the end of the day it’s a moving target which is an interesting challenge. We are on our third generation of CNNs because the industry is pushing toward higher resolution, and the tools become more and more important as we move forward.”

At the same time, CNN designs illustrate the huge chasm between hardware and software designers, according to Randy Allen, director of advanced research at Mentor, A Siemens Business. “I am one of the oddballs in the world because I have done some of both. My thesis was one of the first ones back in 1983 on how to do matrix multiplication. But I also did VCS. so I’ve done the hardware side of things, as well. As a general rule, there is a huge chasm that is very hard to bridge, and that’s one of the things that comes across on it.”

Having tools to help with power challenges will be especially helpful. “If you’re looking at using a GPU for this type of stuff, you’ve got a lot of compute power there. If you can gauge the size of the matrix that you’re working on and the amount of stuff it’s going to take, you can shut down parts of the GPU, allocating the power or doing the right allocation of resources,” Allen said. “Dividing up the matrix in a way that optimizes power is something that you can do. For instance, if you are having to do a 1,000 x 1,000 matrix multiply and you have a 128-processor GPU, it wouldn’t be outrageous to take 64 of the processes and let them work on that. And you can do another problem with the other 64 or gate off those 64. That would be the type of thing that you could do on this type of stuff.”

While it isn’t completely far-fetched to imagine some industry standard approaches down the line, it’s more likely that startups will bring solutions to market sooner.

“What will probably happen is a start up company will come along and build the right tool that helps divide things up,” Allen said. “Then it will go in that direction. Part of the reason I say start up company is because the EDA companies or the hardware companies really don’t have enough knowledge or insight into the way applications are run to be able to come up and do something like that. And the application guys really don’t have enough insight into the problems of power or that power even is a problem to come around and focus on it. So it will probably be someone that spans the two that comes along and comes up with something in there at some point.”



  • mita56

    Hi I am working with a AI chip vendor that I think you might like to have a look at: Neuromem. They are launching a new version at the upcoming ST Developer day- details http://www.st.com/content/st_com/en/special-events/developers-conference-2017.html

    Would be interesting to have a chat about this

    • Ann Steffora Mutschler

      Hi there, thank you for your comment. Please email me directly at [email protected]

      • mita56

        Perfect Ann. Thank you for responding. I will drop you an email and set up a quick chat with the Chairman.