The Multiplier And The Singularity

AI makes interesting reading, but physics will limit just how far it can go and how quickly.


In 1993, Vernor Vinge, a computer scientist and science fiction writer, first described an event called the Singularity—the point when machine intelligence matches and then surpasses human intelligence. And since then, top scientists, engineers and futurists have been asking just how far away we are from that event.

In 2006, Ray Kurzweil published a book, “The Singularity is Near,” in which he extended the hypothesis that artificial intelligence (AI) would enter a ‘runaway reaction’ of self-improvement cycles. He suggested that with each new and more intelligent generation appearing more and more rapidly, that it would cause an intelligence explosion resulting in a powerful superintelligence that qualitatively surpasses all human intelligence. Various dates have been assigned to when this would happen with the current consensus being 2040, but even Elon Musk fears that in five years “AI will have become dangerous.”

But it’s not clear that AI, and the march toward the Singularity are even close to reality. In five years, we may have one more technology node under our belt, meaning that we can expect twice the number of transistors that we have today. But while power per transistor may drop a little, heat will continue to limit what can be done with those chips. Many chips today cannot use all of the compute power at the same time due to thermal limitations.

If we rewind the clock a few of decades we can trace what got us to this point.

The heart of computing
At the heart of every advance has been an advance associated with the multiply operation, along with the ability to move data into and out of those multipliers and to have an element of programmability associated with them.

“The multiply is the most noticeable arithmetic operation, and plays a central role in the computation of many essential functions — filters, convolutions, transforms, weightings,” says Chris Rowen, CEO of Cognite Ventures. However, Rowen always warns against ignoring the other aspects mentioned.

The first major advance was wireless communications and the rise of the Digital Signal Processor (DSP). It provided single-cycle multiply operations, which until then only had been available in fixed-function hardware. “Wireless communications used to be seen as the epitome of hard compute problems,” says Samer Hijazi, senior architect in the IP Group of Cadence. “It has been and continues to be one of the hardest compute problems. It is an NP-complete (nondeterministic polynomial-complete) problem. The DSP gave you a wide array of multipliers, specifically an array of fixed-point multipliers. How many bits can you trust and use? As people learn more about what is needed, the type of accuracy needed is evolving.”

As applications get more complex, they tend to use a rich variety of arithmetic. “The computation often uses a mix of bit precisions (8b, 16b, 32b, and sometimes odd bit-lengths) and a mix of data formats (integer, fixed point, floating point),” explains Rowen. “This means that an implementation typically needs sufficient flexibility to cover a mix of arithmetic operations, bit precisions and data formats — not just a single form of multiply — to handle the necessary computation without compromising accuracy, efficiency or programmer productivity too much.”

The birth of AI
Artificial intelligence always has been an element of Science Fiction and this, like many other things in the technology world, does have an impact on the course of development. “For AI, there is one algorithm that has made a big comeback and has enabled the whole industry to rise again,” says Hijazi. “It is an algorithm from the late ’90s called Convolutional Neural Networks (CNN).”

At the crux of it, convolution is just a 3D filter. “It performs a repeated filter that is applied to an entire scene,” explains Hijazi. “It is looking for a specific pattern that you are correlating with every location in the scene and trying to see if it exists. You are doing multiples of patterns at a time and you are doing it in layers. In the first layer, you are looking for some pattern and creating a pattern correlation map or a feature map and then running another correlation map on the first map produced, and so on. So, I am building a sequential pattern layers on top of each other. Each of them is limited in some field of view.”

Convolutional Neural Networks were first developed by Yann LeCun while director for the NYU Center for Data Science. He is currently director of AI research for Facebook. The first application was an attempt to recognize the zip codes on letters. “It did not become mainstream because they did not have the necessary compute power,” points out Hijazi. “It was only the availability of massive GPUs that it became possible to show the superiority of the algorithms over the ones that had been developed by the experts.”

Technology direction
But while the multiplier may be important, it just one piece of a system. “Even an extreme vision processor, built to sustain hundreds of multiplies per cycle for convolutional neural network inner loops, dedicates little more than 10% of the core silicon area to the multiply arrays themselves,” says Rowen. “The other area is allocated to operand registers, accumulators, instruction fetch and decode, other arithmetic operations, operand selection and distribution and memory interfaces.”

The modern-day graphics processing unit (GPU), which is being used a lot for implementation of CNNs, also has an extensive memory sub-system. “Another piece that is essential for graphics is the massive hierarchical memory sub-system where data is moving from one layer to another layer in order to enable smooth transitions of pixels on the screen,” says Hijazi. “This is essential for graphics but not as needed for AI tasks. It could live with a memory architecture that is less power hungry.”

Another solution being investigated by many is the Field Programmable Gate Array (FPGA). “FPGAs have many DSP slices and these are just an array of fixed point multipliers,” continues Hijazi. “Most of them are 24-bit multipliers, which is actually three or four times what is needed for the inference part of deep learning. Those DSP slices have to be coupled to the memory hierarchy that would be utilizing the FPGA fabric to move the data around. The power consumption of an FPGA may not be that much different from a GPU.”

Rowen provides another reason for favoring programmable solutions. “Very few applications are so simple and so slowly evolving that they can tolerate completely fixed-function implementations. Programmability may come in the form of FPGA look-up tables and routing, or in the form of processors and DSPs, but some degree of programmability is almost always required to keep a platform flexible enough to support a set of related applications, or even just a single application evolving over time.”

But those DSP slices in the GPU and FPGA may not be ideal for AI. “It may be possible that only 4-bit multiplication is necessary,” says Hijazi. “So the race to reduce the cost of the multiplier is at the core of how we can advance AI. The multiplier is expensive, and we need a lot of it. It limits the flexibility of this newfound capability.”

It would seem likely that chips dedicated to AI will be produced. “2017 will see a number of chips targeted at AI and several demonstrable technologies by year end,” predicts Jen Bernier, director of technology communications for Imagination Technologies. “As companies develop chips for AI, they need to consider the increased demands to process data locally and relay data to the cloud for onward processing and data aggregation.”

The reality today
So how close to the Singularity are we? “The algorithm that we are using today was created in the ’90s and has created a lot of hype in the media,” says Hijazi. “But this all stems from one algorithm and its ability to solve one interesting problem — computer vision. The hype about extrapolating this capability has created a lot of enthusiasm, and the media loves the original premise of AI from the ’50s that may be coming to roost. AI did not make a significant leap. One algorithm was developed that enabled one advance.”

People are finding ways to use that algorithm for other tasks, such as Google using it to play the game of Go. Another example is related to voice recognition. “Virtual assistants will be virtually everywhere,” says Bernier. “Voice recognition and interaction will be incorporated into an increasing number of devices and we’ll see new classes of hearable devices. The technology will continue to evolve for more and more interactivity.”

Other advances expected in this area are discussed in the Predictions for 2017.

But does any of this directly lead us to the singularity? Would an AI have been able to invent the algorithms or the hardware structures that got us to this point? AI may well be able to help us optimize what we have, but that is not the Singularity. Engineers, it would seem, see the future in a more rational manner.

Related Stories
What Cognitive Computing Means For Chip Design
Computers that think for themselves will be designed differently than the average SoC; ecosystem impacts will be significant.
Convolutional Neural Networks Power Ahead
Adoption of this machine learning approach grows for image recognition; other applications require power and performance improvements.
Neural Net Computing Explodes
Deep-pocket companies begin customizing this approach for specific applications—and spend huge amounts of money to acquire startups.
Happy 25th Birthday HAL!
AI has come a long way since HAL became operational.

  • MD

    Good article.

  • witeken

    No discussion of the Nervana-Intel ASIC approach?

    • Brian Bailey

      Nervana is an approach to improve the performance on the learning side of the equation, something that was not directly covered in this article. It is the inferencing side that will be in cars, IoT devices etc and thus provides the limitation in what can be achieved given the available power budget. The learning side requires an order or two more computing than inferencing, again something that is likely to make the notion of the Singularity even less likely to happen. In order to even approach the human brain we have to make learning as cheap as inferencing.

  • Another Angle

    The one major point missed in this article is that NN architectures are evolving to be more efficient for a given hardware platform. So at the same time hardware nodes will get more efficient, NN architectures will too. This may very well support the 2040 date. However, I doubt the general term of Singularity captures the practical realities of how AI will be deployed and how AI will change the physical and virtual operations that make up the world. The fear of AI only becomes reasonable when it is able to impact both the physical and virtual world pervasively and at a detailed level. Watson is already an interesting AI, but has no practical impact on my world at all. AI will eventually be intelligent enough to be a threat, but I believe it will happen at a pace that humanity is able to experience crises and implement control mechanisms so that it evolves in a more constructive way. Change is inevitable but also manageable.

    • Brian Bailey

      Thank you for your comment. I agree that hardware architectures will evolve and that was one of the points I was trying to make about how we are currently testing the precision needed for many AI tasks. Using 24-bit precision is highly wasteful for inferencing although may not be enough for learning. I also agree with you that without proper control and regulation. the technology can be badly applied and thus indirectly create risk. As an example, leading the industry to think that AI is currently capable of and should be trusted to drive a car, is perhaps dangerous.