Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Survey: HW SW Co-Design Approaches Tailored to LLMs


A new technical paper titled "A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models" was published by researchers at Duke University and Johns Hopkins University. Abstract "The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language proce... » read more

MTJ-Based CRAM Array


A new technical paper titled "Experimental demonstration of magnetic tunnel junction-based computational random-access memory" was published by researchers at University of Minnesota and University of Arizona, Tucson. Abstract "The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because ... » read more

Data Formats For Inference On The Edge


AI/ML training traditionally has been performed using floating point data formats, primarily because that is what was available. But this usually isn't a viable option for inference on the edge, where more compact data formats are needed to reduce area and power. Compact data formats use less space, which is important in edge devices, but the bigger concern is the power needed to move around... » read more

AI Accelerator Architectures Poised For Big Changes


AI is driving a frenzy of activity in the chip world as companies across the semiconductor ecosystem race to include AI in their product lineup. The challenge now is how to make AI run faster, use less energy, and to be able to leverage it from the edge to the data center — particularly with the rollout of large language models. On the hardware side, there are two main approaches for accel... » read more

Improving Image Resolution At The Edge


How much cameras see depends on how accurately the images are rendered and classified. The higher the resolution, the greater the accuracy. But higher resolution also requires significantly more computation, and it requires flexibility in the design to be able to adapt to new algorithms and network models. Jeremy Roberson, technical director and software architect for AI/ML at Flex Logix, talks... » read more

Optimizing Projected PCM for Analog Computing-In-Memory Inferencing (IBM)


A new technical paper titled "Optimization of Projected Phase Change Memory for Analog In-Memory Computing Inference" was published by researchers at IBM Research. "A systematic study of the electrical properties-including resistance values, memory window, resistance drift, read noise, and their impact on the accuracy of large neural networks of various types and with tens of millions of wei... » read more

Low-Power Heterogeneous Compute Cluster For TinyML DNN Inference And On-Chip Training


A new technical paper titled "DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training" was published by researchers at University of Bologna and ETH Zurich. Abstract "On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clus... » read more

Review of Tools & Techniques for DL Edge Inference


A new technical paper titled "Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review" was published in "Proceedings of the IEEE" by researchers at University of Missouri and Texas Tech University. Abstract: Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted in breakthroughs in many areas. However, deploying thes... » read more

Where And Why AI Makes Sense In Cars


Experts at the Table: Semiconductor Engineering sat down to talk about where AI makes sense in automotive and what are the main challenges, with Geoff Tate, CEO of Flex Logix; Veerbhan Kheterpal, CEO of Quadric; Steve Teig, CEO of Perceive; and Kurt Busch, CEO of Syntiant. What follows are excerpts of that conversation, which were held in front of a live audience at DesignCon. Part two of this... » read more

← Older posts