Building Fixed HW Implementations of Neural Networks (Yale, Cornell et al.)


Researchers from Yale University, Cornell University, Boston University, and NTT Research have published “Physical Foundation Models: Fixed hardware implementations of large-scale neural networks”. Abstract "Foundation models are deep neural networks (such as GPT-5, Gemini~3, and Opus~4) trained on large datasets that can perform diverse downstream tasks -- text and code generation, q... » read more

GDDR7 Momentum Accelerates As A Key Solution For AI Inference


The AI hardware landscape continues to evolve at a breakneck speed, and memory technology is rapidly becoming a defining differentiator for the next generation of GPUs and AI inference accelerators. When NVIDIA introduced Rubin CPX, its new class of GPU tailored for massive context inference, it underscored a new industry reality: memory throughput and efficiency are now just as critical as ra... » read more

AI Workloads at the Edge: Ensuring Performance, Privacy, and Security


Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss why some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president a... » read more

Next Generation AI: Transitioning Inference from the Cloud to the Edge


Deploying AI inference at the edge—on smartphones, appliances, industrial devices, and vehicles—promises faster, private, and energy-efficient intelligence. Expedera’s packet-based NPU architecture delivers up to 90% utilization and dramatic reductions in memory movement compared to conventional approaches, enabling next-generation real-time AI capabilities. This white paper examines tech... » read more

Optimizing AI Workloads For Edge Computing


Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss how some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president an... » read more

Moving AI Workloads To The Edge


Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss how some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president an... » read more

Analog Plus 3D Optics to Accelerate AI inference and Combinatorial Optimization (Microsoft, Cambridge)


A new technical paper titled "Analog optical computer for AI inference and combinatorial optimization" was published by researchers at Microsoft Research, Barclays and University of Cambridge. Abstract "Artificial intelligence (AI) and combinatorial optimization drive applications across science and industry, but their increasing energy demands challenge the sustainability of digital comput... » read more

Complex Mix Of Processors At The Edge


With AI changing so fast, it’s a juggle for companies to ensure they can deliver the best performance now while also future-proofing for unknown AI models or a completely different approach to training and inference that may emerge. There are a slew of options for high-end and budget phones, hyperscalers, and low-cost, low-power edge devices, and while GPUs keep making headlines, many designe... » read more

Implementing AI Activation Functions


Activation functions play a critical role in AI inference, helping to ferret out nonlinear behaviors in AI models. This makes them an integral part of any neural network, but nonlinear functions can be fussy to build in silicon. Is it better to have a CPU calculate them? Should hardware function units be laid down to execute them? Or would a lookup table (LUT) suffice? Most architectures inc... » read more

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)


A new technical paper titled "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" was published by DeepSeek, Peking University and University of Washington. Abstract "Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention... » read more

← Older posts