LLM Inference on GPUs (Intel)


A technical paper titled “Efficient LLM inference solution on Intel GPU” was published by researchers at Intel Corporation. Abstract: "Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with massive operations and... » read more

Every Walk’s A Hit: Making Page Walks Single-Access Cache Hits


As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB p... » read more

What’s Next In AI, Chips And Masks


Aki Fujimura, chief executive of D2S, sat down with Semiconductor Engineering to talk about AI and Moore’s Law, lithography, and photomask technologies. What follows are excerpts of that conversation. SE: In the eBeam Initiative’s recent Luminary Survey, the participants had some interesting observations about the outlook for the photomask market. What were those observations? Fujimur... » read more

Machine Learning Inferencing At The Edge


Ian Bratt, fellow in Arm's machine learning group, talks about why machine learning inferencing at the edge is so difficult, what are the tradeoffs, how to optimize data movement, how to accelerate that movement, and how it differs from developing other types of processors. » read more