Transformers At The Edge: Efficient LLM Deployment


Since the groundbreaking 2017 publication of “Attention Is All You Need,” the transformer architecture has fundamentally reshaped artificial intelligence research and development. This innovation laid the foundation for Large Language Models (LLMs) and Video Language Models (VLMs), fueling a wave of productization across the industry. A defining milestone was the public launch of ChatGPT in... » read more

Shrinking LLMs With Self-Compression


Language models are becoming ever larger, making on-device inference slow and energy-intensive. A direct and surprisingly effective remedy is to prune complete channels whose contribution to the task is negligible. Our earlier work introduced a training-time procedure – Self-Compression [1, 4] – that lets back-propagation decide the bit-width of every channel, so unhelpful ones fade away. T... » read more

The Rise Of Generative AI On The Edge


Artificial intelligence (AI) and machine learning (ML) have undergone significant transformations over the past decade. The revolution of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) is evolving toward the adoption of transformers and generative AI (GenAI), marking a pivotal shift in the field. This transition is driven by the need for more accurate, efficient, and ... » read more

No Fooling With Voxel Pooling


A variety of new and complicated transformer models have emerged in the past 18 to 24 months as new “must have” networks in advanced automotive use cases. These novel architectures often introduce new network operators or novel ways of combining tensors – often from different types of sensors – in ways to enhance detection and recognition of objects in L3 / L4 / L5 ADAS and autonomous d... » read more

NPU Acceleration For Multimodal LLMs


Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now efficiently support the computation of weights and propagation of activations through a series of attention blocks. Increasingly, NPUs must be able to process models with multiple input modalities with ac... » read more

HW and SW Architecture Approaches For Running AI Models


How best to run AI inference models is a current topic of much debate as a wide breadth of systems companies look to add AI to a variety of systems, spurring both hardware innovation and the need to revamp models. Hardware developers are making progress with AI accelerators and SoCs. But on the model side, questions abound about whether the answer might come from revisiting older, less compl... » read more

Vision Is Why LLMs Matter On The Edge


Large Language Models (LLMs) have taken the world by storm since the 2017 Transformers paper, but pushing them to the edge has proved problematic. Just this year, Google had to revise its plans to roll out Gemini Nano on all new Pixel models — the down-spec’d hardware options proved unable to host the model as part of a positive user experience. But the implementation of language-focused mo... » read more

How To Successfully Deploy GenAI On Edge Devices


Generative AI (GenAI) burst onto the scene and into the public’s imagination with the launch of ChatGPT in late 2022. Users were amazed at the natural language processing chatbot’s ability to turn a short text prompt into coherent humanlike text including essays, language translations, and code examples. Technology companies – impressed with ChatGPT’s abilities – have started looking ... » read more

Fundamental Issues In Computer Vision Still Unresolved


Given computer vision’s place as the cornerstone of an increasing number of applications from ADAS to medical diagnosis and robotics, it is critical that its weak points be mitigated, such as the ability to identify corner cases or if algorithms are trained on shallow datasets. While well-known bloopers are often the result of human decisions, there are also fundamental technical issues that ... » read more

Is Transformer Fever Fading?


The hottest, buzziest thing bursts onto the scene and captures the attention of the business press and even the general public. Scads of articles and videos are published about The Hot Thing. And then, in the blink of an eye, the world’s attention shifts to the Next New Thing! Are we talking about the latest pop song that leads the Spotify streaming charts? Perhaps a new fashion trend that... » read more

← Older posts