Hyperscale HW Optimized Neural Architecture Search (Google)


A new technical paper titled "Hyperscale Hardware Optimized Neural Architecture Search" was published by researchers at Google, Apple, and Waymo. "This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three ... » read more

Google’s TPU v4 Architecture: 3 Major Features


A new technical paper titled "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings" was published by researchers at Google. Abstract: "In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer f... » read more

Bespoke Silicon Rattles Chip Design Ecosystem


Bespoke silicon developers are shaking up relationships, priorities, and methodologies across the semiconductor industry, creating demand for skills that cross traditional boundaries, and driving new business models that leverage these enormous investments. Bespoke silicon designers today are a rare breed, capable of understanding the unique requirements of a specific domain, as well as a gr... » read more

What Is An xPU?


Almost every day there is an announcement about a new processor architecture, and it is given a three-letter acronym — TPU, IPU, NPU. But what really distinguishes them? Are there really that many unique processor architectures, or is something else happening? In 2018, John L. Hennessy and David A. Patterson delivered the Turing lecture entitled, "A New Golden Age for Computer Architecture... » read more

Challenges For New AI Processor Architectures


Investment money is flooding into the development of new AI processors for the data center, but the problems here are unique, the results are unpredictable, and the competition has deep pockets and very sticky products. The biggest issue may be insufficient data about the end market. When designing a new AI processor, every design team has to answer one fundamental question — how much flex... » read more

Ten Lessons From Three Generations Shaped Google’s TPUv4i


Source: Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou (Google); David Patterson (Google / Berkeley) Find technical paper here. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) Abstract–"Google de... » read more

Tapping Into Purpose-Built Neural Network Models For Even Bigger Efficiency Gains


Neural networks can be categorized as a set of algorithms modelled loosely after the human brain that can ‘learn’ by incorporating new data. Indeed, many benefits can be derived from developing purpose-built “computationally efficient” neural network models. However, to ensure your model is effective, there are several key requirements that need to be considered. One critical conside... » read more

Software-Defined Hardware Gains Ground — Again


The traditional approach of running generic software on x86-based CPUs is running out of steam for many applications due to the slowdown of Moore’s Law and the concurrent exponential growth in software application complexity and scale. In this environment, the software and hardware are disparate due the dominance of the x86 architecture. “The need for and advent of the hardware accelerat... » read more

Machine Learning At The Edge


Moving machine learning to the edge has critical requirements on power and performance. Using off-the-shelf solutions is not practical. CPUs are too slow, GPUs/TPUs are expensive and consume too much power, and even generic machine learning accelerators can be overbuilt and are not optimal for power. In this paper, learn about creating new power/memory efficient hardware architectures to meet n... » read more

An Increasingly Complicated Relationship With Memory


The relationship between a processor and its memory used to be quite simple, but in modern SoCs there are multiple heterogeneous processors and accelerators, each needing a different means of accessing memory for maximum efficiency. Compromises are being made in order to preserve the unified programming model of the past, but the pressures are increasing for some fundamental changes. It does... » read more

← Older posts