NPU Acceleration For Multimodal LLMs


Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now efficiently support the computation of weights and propagation of activations through a series of attention blocks. Increasingly, NPUs must be able to process models with multiple input modalities with ac... » read more

HW and SW Architecture Approaches For Running AI Models


How best to run AI inference models is a current topic of much debate as a wide breadth of systems companies look to add AI to a variety of systems, spurring both hardware innovation and the need to revamp models. Hardware developers are making progress with AI accelerators and SoCs. But on the model side, questions abound about whether the answer might come from revisiting older, less compl... » read more

Vision Is Why LLMs Matter On The Edge


Large Language Models (LLMs) have taken the world by storm since the 2017 Transformers paper, but pushing them to the edge has proved problematic. Just this year, Google had to revise its plans to roll out Gemini Nano on all new Pixel models — the down-spec’d hardware options proved unable to host the model as part of a positive user experience. But the implementation of language-focused mo... » read more

How To Successfully Deploy GenAI On Edge Devices


Generative AI (GenAI) burst onto the scene and into the public’s imagination with the launch of ChatGPT in late 2022. Users were amazed at the natural language processing chatbot’s ability to turn a short text prompt into coherent humanlike text including essays, language translations, and code examples. Technology companies – impressed with ChatGPT’s abilities – have started looking ... » read more

Fundamental Issues In Computer Vision Still Unresolved


Given computer vision’s place as the cornerstone of an increasing number of applications from ADAS to medical diagnosis and robotics, it is critical that its weak points be mitigated, such as the ability to identify corner cases or if algorithms are trained on shallow datasets. While well-known bloopers are often the result of human decisions, there are also fundamental technical issues that ... » read more

Is Transformer Fever Fading?


The hottest, buzziest thing bursts onto the scene and captures the attention of the business press and even the general public. Scads of articles and videos are published about The Hot Thing. And then, in the blink of an eye, the world’s attention shifts to the Next New Thing! Are we talking about the latest pop song that leads the Spotify streaming charts? Perhaps a new fashion trend that... » read more

Neural Network Model Quantization On Mobile


The general definition of quantization states that it is the process of mapping continuous infinite values to a smaller set of discrete finite values. In this blog, we will talk about quantization in the context of neural network (NN) models, as the process of reducing the precision of the weights, biases, and activations. Moving from floating-point representations to low-precision fixed intege... » read more

Generative AI: Transforming Inference At The Edge


The world is witnessing a revolutionary advancement in artificial intelligence with the emergence of generative AI. Generative AI generates text, images, or other media responding to prompts. We are in the early stages of this new technology; still, the depth and accuracy of its results are impressive, and its potential is mind-blowing. Generative AI uses transformers, a class of neural network... » read more

(Vision) Transformers: Rise Of The Chimera


It’s 2023 and transformers are having a moment. No, I’m not talking about the latest installment of the Transformers movie franchise, "Transformers: Rise of the Beasts"; I’m talking about the deep learning model architecture class, transformers, that is fueling anticipation, excitement, fear, and investment in AI. Transformers are not so new in the world of AI anymore; they were first ... » read more

Achieving Greater Accuracy In Real-Time Vision Processing With Transformers


Transformers, first proposed in a Google research paper in 2017, were initially designed for natural language processing (NLP) tasks. Recently, researchers applied transformers to vision applications and got interesting results. While previously, vision tasks had been dominated by convolutional neural networks (CNNs), transformers have proven surprisingly adaptable to vision tasks like image cl... » read more

← Older posts