Neuromorphic computing and machine learning dominate the leading edge of chip design this year.
By Jeff Dorsch & Ed Sperling
Machine learning, artificial intelligence and neuromorphic computing took center stage at Hot Chips 2017 this week, a significant change from years past where the focus was on architectures that addressed improvements in speed and performance for standard compute problems.
What is clear, given the focus of presentations, is that the bleeding edge of computing has shifted significantly in the past 12 months. While performance and power are still critical elements, the primary concern is no longer how to cram more transistors on a piece of silicon or how to put multiple chips together. It is now much more geared toward new architectures that can bridge the digital and physical worlds.
Phillip Alvelda, a former DARPA program manager and current CEO of Cortical.ai, said during his keynote that one of the big revelations among researchers is that the human brain makes sense of the external world by simultaneously searching all parts of the brain for memories. This is something of a radical shift in direction, because research until now has assumed that different regions of the brain respond to different stimuli. Distributed memory is an entirely different twist. Moreover, he said the cerebellum has been found to consist of more neurons than can be found in the rest of the brain, which allows a person to anticipate future states or actions. This is like pre-fetch on steroids.
“What we do is build common code across all memory systems,’ Alvelda said, adding that the next generation of processors will be able to read intentions, which he describes as “digital telepathy,” and combine collective sensory data from people to achieve “collective cognition.” He also pointed to a new concept, “digital empathy.”
Fig 1: Alvelda discussing the integration and synthesis component in the human brain.
Other presentations were equally thought-provoking. The Google Brain team had two presentations on the second day of Hot Chips. Software engineer Cliff Young spoke about “domain-specific architectures” and described Moore’s Law as “embattled” under current technological conditions. The original Tensor Processing Unit works as an accelerator for existing servers, kind of like a co-processor with a floating-point unit, he said. It was designed with DDR3 memory interfaces, which limited the chip’s bandwidth. The TPU has five main CISC instructions, he noted, and uses systolic execution in a matrix array – a technology that dates back to the 1970s. When it comes to machine learning, Young said, “Inference prefers latency over throughput.”
Google Senior Fellow Jeff Dean spoke earlier in the day. “Deep learning is causing a machine-learning revolution,” he said. “Deep learning is transforming how we design computers.” He added, “More compute power is needed,” a theme echoed by other Day 2 speakers. Version 2 of the TPU will be used for training and inference, Dean said. The chips will be programmed via TensorFlow through the Google Cloud, he noted, while 1,000 Cloud TPUs will be made available for free to machine-learning researchers.
Paul Whatmough, of ARM Research, who also serves as a research associate at the Harvard Architectures, Circuits, and Compilers program, hailed “the age of deep learning” and called for deeper networks, bigger data, and more computing power in the development of deep neural networks. Harvard is working on a DNN Engine to “process a group of neurons in parallel,” he said, and using a 16-nanometer system-on-a-chip device for always-on applications. The chip is based on a Cortex-M0 core, while packing on-chip SRAM.
Jamie Markevitch, principal engineer at Cisco Systems, described a network processor for a distributed router architecture. The chip has 672 processor cores and 2,688 threads. It incorporates a run-to-completion model and will be programmed in C or assembly language, he said. The chip has 9.2 billion transistors and is fabricated with a 22nm process.
The final session of Hot Chips 29 was devoted to next-generation processors developed by IBM, Advanced Micro Devices, Intel, and Qualcomm, all aiming at use in data centers. IBM’s Christian Jacobi described the new z14 microprocessor chipset, which is the basis for the company’s latest mainframe model. The z14 is made with a 14nm process. There are two components in the chipset, the CP chip and the SC chip, each of which is fabricated with 17 metal layers, Jacobi said. The CP chip has 10 cores. Remarking on the longevity of the mainframe market, Jacobi said, “You wouldn’t believe how much COBOL is out there.”
AMD’s Kevin Lepak talked up the chip design company’s new EPYC processor series, based on the x86 architecture Zen core. He emphasized the chip’s security measures, including an AES-128 engine and keys that are not known to the operating system. EPYC processors are packaged in multichip modules.
Intel’s Akhilesh Kumar provided details on his company’s Xeon Scalable Processors, unveiled last month and aimed at cloud services, communication services, high-performance computing, and artificial intelligence. “We have made significant changes in the way we put the processor together,” he said. The Core microarchitecture was enhanced, a mesh interconnect architecture was implemented, and L2/L3 cache hierarchy was re-architected, Kumar noted. The Xeon Scalable Processor Platinum 8180 chip is meant for machine-learning training.
Three people were involved in the presentation on the Qualcomm Centriq 2400 Processor, a product of Qualcomm Datacenter Technologies. “Qualcomm is going to be a player in the data center,” said Qualcomm’s Thomas Seier, using low-power arm CPUs and a 10nm server processor SoC, which he described as “a beefy, brawny core.” Seier added, “The Falkor CPU is designed for the cloud.” It is a 64-bit processor with the Qualcomm System Bus ring interconnect. The chip features an 8-wide heterogeneous pipeline, according to Seier. Barry Wolford said the chip is “a highly integrated SoC” with 48 cores/24 duplexes. It supports third-generation PCI Express and comes in an LGA package measuring 55 millimeters by 55 mm. Dileep Bhandarkar, Qualcomm’s vice president of technology, capped off the presentation by saying he would not divulge operating speed and other key metrics, nor would he answer any attendee questions on those topics, to laughter from the audience. Once the program officially ended, he was swarmed by engineers within the theater at the Flint Center for the Performing Arts.
Leave a Reply