AI Accelerator Gyrfalcon Soars Post Stealth

Second generation inference accelerator ASIC targets the datacenter.


Milpitas, Calif.-based startup Gyrfalcon Technology Inc. (GTI), which emerged from semi-stealth mode in September, recently announced the datacenter-focused second generation of its neural-network accelerator, which was first aimed at the endpoint.

GTI is not alone: The endpoint market is growing. By 2022, 25% of endpoint devices will execute AI algorithms (inference for neural network applications), predicts analyst firm International Data Corporation (IDC). Furthermore, 25% of datacenter chips running machine learning in the datacenter according to Deloitte will be FPGAs and ASICS rather than GPUs, which currently handle most training, and CPUs, which handle most inference.

A host of startups have promised to deliver new architectures optimized for machine learning applications. Nvidia, Intel and a list of established FPGA and SoC providers have done the same. Most of these architectures are not yet commercially available.

GTI’s first product, the Lightspeeur 2801S ASIC, was available in limited quantities after its introduction at the Consumer Electronics Show in Las Vegas in January, 2018. GTI has agreements with customers, including Samsung, Fujitsu and LG Electronics, and expects the chip to show up in endpoint devices by year’s end.

The 2801S is capable of handling inference chores in the datacenter but was primarily designed for endpoint devices with very low power, space and heat requirements, according to Mark Nadell, GTI’s vice president of marketing.

“What you get is the ability to offload the AI function from a host chip and perform with high precision and speed, whether the host is a GPU or CPU or something else, using low enough power that you can add it to a phone or other edge device or build it into boards for the datacenter,” Nadell said.

The company’s second generation, the Lightspeeur 2803 AI Accelerator, is designed specifically for inference duty in datacenter servers and will be packaged in GTI-designed boards that will typically support 16 chips per board, Nadell said.

The boards and chips are designed to be used with existing racks and processors, adding acceleration without requiring much additional power or cooling, maximizing the potential ROI for cloud providers or datacenter owners trying to add neural-networking capability as cost effectively as possible, Nadell said.

Back to the future
The 2803 was announced Oct. 22, just five weeks after GTI had formally emerged from stealth. The time scale for development is not as compressed as it appears, however.

Most of the company’s approach and technology—which it refers to in research papers as a Domain Specific Architecture convolutional neural networks (DSA-CNN)—is based on research work GTI Chief Scientist Lin Yang began as a Ph.D. candidate at the University of California at Berkeley.

In 1988, Yang co-authored a paper introducing cellular neural networks that has been cited in other research nearly 4,000 times since its publication, describing “ways to use neural networking in a way that makes it possible to save energy and process data far more rapidly using matrix calculations than anyone had thought of at the time,” Nadell said.

The matrix processing engine in the Gryfalcon Technology’s Lightspeeur 2803 AI Accelerator. Source: Gryfalcon Technology Inc.

Yang shares a patent on the technology but was unable to develop it commercially because it, like other machine-language/artificial-intelligence approaches, was too compute-intensive for the hardware available at the time.

When the hardware did, finally, catch up, Yang extended work he’d done in the meantime and made adjustments to the original concept—shifting analog processing to digital so most of the work could be handled in memory, which reduced power and latency, for example, Nadell said.

The result is a chip designed for very broad application optimized for two-dimensional matrix processing, with embedded SRAM in an ASIC to house data close to the processing logic that allows data to be processed quickly without the energy required to move data into and out of a central processor.

The first generation, the Lightspeeur 2801S was packaged as both a standalone accelerator and as a USB stick designed to compete with Intel’s Neural Compute Stick, was an ASIC designed around a Matrix Processing Engine (MPE) using AI Processing in Memory (APiM), GTI’s trademark for an in-memory instantiation of approximate computing, which is becoming popular in machine-learning processors such as Google’s Tensor Processing Unit (TPU) ASIC, for its ability to reduce power use and increase throughput in complex matrix calculations that are tolerant of low-precision initial results.

“Balancing the cost-performance-energy equation has been a challenge for developers looking to bring AI-enabled equipment to market at scale,” according to a statement from GTI co-founder and chief scientist Lin Yang in the Sept. 18 announcement of the product’s debut. “The GTI founding team has been watching the industry struggle with this challenge for decades and believe that our AI Processing in Memory and Matrix Processing Engine provides an elegant solution to avoid having to make trade-offs. By deploying APiM and MPE on a standard, commoditized ASIC, GTI is enabling our customers to bring innovative, AI-enabled devices to the masses.”

The 2801 is a 7 mm x 7 mm ASIC designed on a 28 nm process, with typical power draw of 300 mW to deliver 28,000 teraops per second (TOPS) per watt, with top performance of 9.3 TOPS, the ability to combine up to 32 chips on one board for either heavy compute loads or discrete task handling and an overall cost 10x lower than competitive hardware, according to GTI. AT CES GTI compared its Laceli AI compute stick, a USB 3.0 that can be used for image-based deep-learning for natural-language, image, video and other AI applications, with the performance of Intel’s Movidius USB Stick with performance of 0.1 TOPS at 1W. Intel claims performance as high as 4 TOPS for the Myriad X version of the VPU chip instantiating the Movidius Neural Compute Engine.

The second generation chip, the Lightspeeur 2803 AI Accelerator is designed as a datacenter inference accelerator, to be installed in multiples commonly of 16 chips on a single GTI G.A.I.N. 2803 board to accelerate cloud applications with performance as high as 16.8 TOPS at 700 mW with 2 milliseconds of latency, Nadell said.

The roughly 28,000 nodes inside the chip design are able to handle a matrix of 168 x 168 using approximately 10MB of memory throughout the chip without using external memory or a discrete area to house data before processing, Nadell said.

The 2803 chip is 9 mm x 9 mm on a 28 nm process connected using a PCIe interface, including ResNet, MobilNet, ShiftNet and VGG neural networks for model sizes ranging from 4.4 MB to 17.6 MB per chip for both training and inference.

The 2803 is taped out and available in samples to partners now. It will be available in volume during the fourth quarter of 2018, Nadell said.

Both chips are manufactured by TSMC, can be packaged in single units or groups and are designed as add-ons to existing devices with as little difficulty as possible using GTI-produced development tools that make ML-enablement accessible to developers without requiring deep specialization in ML architectures or processing, Nadell said.

The company expects the first endpoint and edge products featuring its 2801 chips to be out by the end of the year.

Leave a Reply

(Note: This name will be displayed publicly)