Ultra-low-bit LLM Inference Allows AI-PC CPUs And Discrete Client GPUs To Approach High-end GPU-Level (Intel)


A new technical paper titled "Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs" was published by researcher at Intel. Abstract "The advent of ultra-low-bit LLM models (1/1.58/2-bit), which match the perplexity and end-task performance of their full-precision counterparts using the same model size, is ushering in a new era of LLM inference for resource-constrained environments... » read more

New Ways To Optimize GEMM-Based Applications Targeting Two Leading AI-Optimized FPGA Architectures


A technical paper titled “Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs” was published by researchers at The University of Texas at Austin and Arizona State University. Abstract: "FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability. Recently, the leading FPGA... » read more

Optimizing EDA Cloud Hardware And Workloads


Optimizing EDA hardware for the cloud can shorten the time required for large and complex simulations, but not all workloads will benefit equally, and much more can be done to improve those that can. Tens of thousands of GPUs and specialized accelerators, all working in parallel, add significant and elastic compute horsepower for complex designs. That allows design teams to explore various a... » read more

ISA and Microarchitecture Extensions Over Dense Matrix Engines to Support Flexible Structured Sparsity for CPUs (Georgia Tech, Intel Labs)


A technical paper titled "VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs" was published (preprint) by researchers at Georgia Tech and Intel Labs. Abstract: "Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible v... » read more