Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)


A new technical paper, "Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning," was published by researchers at USC and University of Wisconsin-Madison. Abstract "Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoni... » read more

Generative AI In Chip Manufacturing


Generative AI is a natural-language or text-based query, predicting patterns based on a massive set of data. While most of the attention has been focused on chatbots and copilots, it also can be used to identify small, transient aberrations in semiconductor manufacturing that are otherwise difficult to find. Jon Herlocker, vice president and general manager of software analytics at Cohu, talks ... » read more

LLMs On The Edge


Nearly all the data input for AI so far has been text, but that's about to change. In the future, that input likely will include video, voice, as well as other types of data, causing a massive increase in the amount of data that needs to be modeled and the compute resources necessary to make it all work. This is hard enough in hyperscale data centers, which are sprouting up everywhere to handle... » read more

A Chiplet-Based Supercomputer For Generative LLMs That Optimizes Total Cost of Ownership


A technical paper titled "Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models" was published by researchers at University of Washington and University of Sydney. Abstract: "Large language models (LLMs) such as ChatGPT have demonstrated unprecedented capabilities in multiple AI tasks. However, hardware inefficiencies have become a significant factor limiting ... » read more