Redefining AI Inference With New Silicon Architecture


AI inference is rapidly becoming the largest and most demanding segment of the AI market, but the cost of running these workloads continues to be a major challenge. VSORA, a fabless semiconductor company, is tackling this problem head-on with a fresh approach to high‑performance AI processing and a deep collaboration with Cadence. VSORA develops advanced AI chips that dramatically reduce t... » read more

AI Workloads Are Turning The Data Center Network Into A Combined Memory And Storage Fabric


Recent industry trends, including the release of NVIDIA’s Rubin platform (developer.nvidia.com), point to a growing consensus that AI inference is reshaping data center architecture in a fundamental way. As inference workloads become dominant, the data center network is no longer just a communication layer between servers. It is increasingly part of a distributed memory and storage hierarchy,... » read more

AI Power on the Edge


Key takeaways Power and thermal become primary design considerations, not just optimizations. Hardware architectures need to be developed from the ground up. Hardware/software/model co-development is essential. Implementing AI on the edge is driven by a different set of metrics than training or even inference in the cloud. It makes power a first-class citizen, if not the mos... » read more

Ensuring AI Reliability: Mitigating OCP’s Silent Data Corruption Risks


Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand ext... » read more

AI Inference Needs A Mix-And-Match Memory Strategy


AI inference is no longer a single workload that can be served efficiently by a single type of accelerator or memory. From fast chat replies to 10M token codebases, inference spans wildly diverse workloads with very different limits on latency, bandwidth, capacity, and compute, as the figure below demonstrates.1 Source: Meta1 The AI inference spectrum of workloads includes: Inter... » read more

Four Architectural Opportunities for LLM Inference Hardware (Google)


A new technical paper titled "Challenges and Research Directions for Large Language Model Inference Hardware" was published by Google. Abstract "Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and in... » read more

Next Generation AI: Transitioning Inference From The Cloud To The Edge


AI inference deployments are increasingly focused on the edge as manufacturers seek the consistent latency, enhanced privacy, and reduced operational costs they can’t achieve in cloud-based deployments. While cloud-based platforms provide incredible computational power and enable widely adopted services, the dependence on network connectivity inherently creates variability, cost and security ... » read more

AI Bubble Or Boom?


Are we in an AI bubble? Parallels are being drawn to the dot.com boom/bust of 1999-2000. In the dot.com bust, many high-tech companies valuations soared up 10X, then deflated. The peak P/E ratio for the Nasdaq Composite was 200! Remember Webvan? It went public November 1999 with an $8 billion valuation, then filed for bankruptcy 19 months later. It was much speculation without profits or gro... » read more

GDDR7 Tackles Massive-Context AI Inference


The AI hardware landscape is evolving at breakneck speed, and memory technology is at the heart of this transformation. NVIDIA’s recent announcement of Rubin CPX, a new class of GPU purpose-built for massive-context inference, underscores this trend. Rubin CPX is designed to tackle workloads that require reasoning across millions of tokens. Use cases include long-form generative video, comple... » read more

Re-Architecting AI For Power


The industry is becoming increasingly concerned about the amount of power being consumed by AI, but there is no simple solution to the problem. It requires a deep understanding of the application, the software and hardware architectures at both the semiconductor and system levels, and how all of this is designed and implemented. Each piece plays a role in the total power consumed and the utilit... » read more

← Older posts