NPU Acceleration For Multimodal LLMs


Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now efficiently support the computation of weights and propagation of activations through a series of attention blocks. Increasingly, NPUs must be able to process models with multiple input modalities with ac... » read more

Vision Transformers Change The AI Acceleration Rules


Transformers were first introduced by the team at Google Brain in 2017 in their paper, "Attention is All You Need". Since their introduction, transformers have inspired a flurry of investment and research which have produced some of the most impactful model architectures and AI products to-date, including ChatGPT which is an acronym for Chat Generative Pre-trained Transformer. Transformers a... » read more

Nightmare Fuel: The Hazards Of ML Hardware Accelerators


A major design challenge facing numerous silicon design teams in 2023 is building the right amount of machine learning (ML) performance capability into today’s silicon tape out in anticipation of what the state of the art (SOTA) ML inference models will look like in 2026 and beyond when that silicon will be used in devices in volume production. Given the continuing rapid rate of change in mac... » read more