Week In Review: Design, Low Power

UK regulators on Nvidia-Arm; AI in the data center; processing-in-memory.


The UK’s Competition and Markets Authority is raising new challenges for Nvidia’s proposed acquisition of Arm, suggesting in a new report that an in-depth Phase 2 investigation into the deal is warranted on competition grounds. Andrea Coscelli, chief executive of the CMA, said, “We’re concerned that Nvidia controlling Arm could create real problems for Nvidia’s rivals by limiting their access to key technologies, and ultimately stifling innovation across a number of important and growing markets.” Specific markets noted in the report include data center CPUs, GPUs, and SmartNICs, as well as SoCs for high-performance IoT, automotive, and game consoles. Nvidia has stated that Arm will be kept neutral if the acquisition goes through.

AI hardware
Cerebras revealed details of its CS-2 accelerator system. Powered by the company’s second-generation wafer-scale chip and supporting models of over 120 trillion parameters in size, the company says it enables ‘brain-scale’ AI in a footprint “the size of a dorm-room refrigerator.” It utilizes a new software execution architecture that disaggregates compute and parameter storage, a memory extension technology providing up to 2.4 Petabytes of high performance memory, an AI-optimized communication fabric that extends the on-chip fabric to off-chip to connect multiple CS-2s, and selectable weight sparsity. In particular, natural language processing (NLP) was called out as one area where improvements scale well with increased number of parameters.

Esperanto Technologies debuted a RISC-V inferencing chip for data centers. Containing over a thousand RISC-V processor cores with custom vector / tensor unit, four high performance out-of-order RISC-V processor cores, and a high-performance memory system, it is designed to operate at under 20 watts and with a focus on machine learning recommendation models.

IBM introduced a new Z Series processor with on-chip acceleration for AI inferencing while a financial transaction is taking place. Aimed at financial services workloads like fraud detection, loan processing, clearing and settlement of trades, anti-money laundering, and risk analysis, the Telum chip contains 8 processor cores with a deep super-scalar out-of-order instruction pipeline, running with more than 5GHz clock frequency. The cache and chip-interconnection infrastructure provides 32MB cache per core, and can scale to 32 Telum chips. A system based on the processor is planned in 2022.

For more from Hot Chips, check out New Approaches For Processor Architectures to learn why companies now consider flexibility and customization to be critical elements for optimizing performance and power.

Achronix used Synopsys’ design, verification, and IP solutions in creating its new Speedster7t FPGA, which includes an array of new machine learning processors optimized for high-bandwidth and AI/ML workloads. Achronix plans to continue using DesignWare IP in its next design.

Imagination Technologies intends to re-enter the CPU market with designs based on the RISC-V ISA. It plans to offer discrete CPUs as well as heterogeneous solutions that combine GPU, CPU and AI processors.

Samsung Electronics displayed its processing-in-memory (PIM) technology, including PIM-enabled HBM integrated with a Xilinx Virtex Ultrascale+ (Alveo) AI accelerator, where it delivered an almost 2.5X system performance gain as well as more than a 60% cut in energy consumption. Beyond HBM, the company is broadening its PIM technology to DRAM modules with an ‘Acceleration DIMM’ that contains an AI engine built inside the buffer chip, as well as LPDDR5-PIM mobile memory that aims to improve performance in applications like voice recognition while reducing energy usage.

JEDEC published the JESD233: XFM Embedded and Removable Memory Device (XFMD) standard. XFMD (XFM stands for Crossover Flash Memory) is a new universal data storage media providing an NVMe over PCI Express interface in a small, thin form factor that aims to enable replaceable storage for IoT devices, embedded, and other applications where storage would typically be soldered.

Argonne National Laboratory is getting a new supercomputer. Named Polaris and built by Hewlett Packard Enterprise, it will enable scientists and developers to test and optimize software codes and applications to tackle a range of AI, engineering, and scientific projects planned for the forthcoming exascale supercomputer Aurora. Built using 280 HPE Apollo Gen10 Plus systems, Polaris will deliver approximately 44 petaflops of peak double precision performance and nearly 1.4 exaflops of theoretical AI performance, which is based on mixed-precision compute capabilities. It includes 560 2nd and 3rd Gen AMD EPYC processors as well as 2240 Nvidia A100 Tensor Core GPUs.

Leave a Reply

(Note: This name will be displayed publicly)