Fantastical Creatures


In my day job I work in the High-Level Synthesis group at Siemens EDA, specifically focusing on algorithm acceleration. But on the weekends, sometimes, I take on the role of amateur cryptozoologist. As many of you know, the main Siemens EDA campus sits in the shadow of Mt. Hood and the Cascade Mountain range. This is prime habitat for Sasquatch, also known as “Bigfoot”. This weekend, ar... » read more

Co-optimizing HW Architecture, Memory Footprint, Device Placement And Per-Chip Operator Scheduling (Georgia Tech, Microsoft)


A technical paper titled “Integrated Hardware Architecture and Device Placement Search” was published by researchers at Georgia Institute of Technology and Microsoft Research. Abstract: "Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization ... » read more

Hyperscale HW Optimized Neural Architecture Search (Google)


A new technical paper titled "Hyperscale Hardware Optimized Neural Architecture Search" was published by researchers at Google, Apple, and Waymo. "This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three ... » read more

Google’s TPU v4 Architecture: 3 Major Features


A new technical paper titled "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings" was published by researchers at Google. Abstract: "In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer f... » read more

Bespoke Silicon Rattles Chip Design Ecosystem


Bespoke silicon developers are shaking up relationships, priorities, and methodologies across the semiconductor industry, creating demand for skills that cross traditional boundaries, and driving new business models that leverage these enormous investments. Bespoke silicon designers today are a rare breed, capable of understanding the unique requirements of a specific domain, as well as a gr... » read more

What Is An xPU?


Almost every day there is an announcement about a new processor architecture, and it is given a three-letter acronym — TPU, IPU, NPU. But what really distinguishes them? Are there really that many unique processor architectures, or is something else happening? In 2018, John L. Hennessy and David A. Patterson delivered the Turing lecture entitled, "A New Golden Age for Computer Architecture... » read more

Challenges For New AI Processor Architectures


Investment money is flooding into the development of new AI processors for the data center, but the problems here are unique, the results are unpredictable, and the competition has deep pockets and very sticky products. The biggest issue may be insufficient data about the end market. When designing a new AI processor, every design team has to answer one fundamental question — how much flex... » read more

Ten Lessons From Three Generations Shaped Google’s TPUv4i


Source: Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou (Google); David Patterson (Google / Berkeley) Find technical paper here. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) Abstract–"Google de... » read more

Tapping Into Purpose-Built Neural Network Models For Even Bigger Efficiency Gains


Neural networks can be categorized as a set of algorithms modelled loosely after the human brain that can ‘learn’ by incorporating new data. Indeed, many benefits can be derived from developing purpose-built “computationally efficient” neural network models. However, to ensure your model is effective, there are several key requirements that need to be considered. One critical conside... » read more

Software-Defined Hardware Gains Ground — Again


The traditional approach of running generic software on x86-based CPUs is running out of steam for many applications due to the slowdown of Moore’s Law and the concurrent exponential growth in software application complexity and scale. In this environment, the software and hardware are disparate due the dominance of the x86 architecture. “The need for and advent of the hardware accelerat... » read more

← Older posts