Systems & Design

SPONSOR BLOG

SMP, Asymmetric Multi- processing And The HSA Foundation

Symmetric multiprocessing has gotten that attention, but future gains will require its lesser-known relative.

September 27th, 2012 - By: Kurt Shuler

When we hear the term “multiprocessing,” we often associate it with “symmetric multiprocessing (SMP).” This is because of SMP’s initial prevalence in the high-performance computing world, and now in x86/x64 servers and PCs. However, it’s been known for years that SMP’s ability to scale performance as the number of cores increases is poor. (For more information on SMP’s inability to scale well, read Jack Ganssle’s 2008 embedded.com article, “The Nulticore Effect,” or the IEEE Spectrum/Sandia Labs article, “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.”)

Processor companies serving the mobility and consumer electronics markets have avoided purely SMP solutions and instead have implemented asymmetric multiprocessing (AMP) architectures. An example of AMP is a mobile phone modem baseband SoC, which contains an ARM processor and a DSP to handle control and signal processing, respectively. We also see AMP architectures in today’s mobile phone application processors, which usually have multiple CPU cores and separate discrete graphics cores, video cores, audio cores and imaging cores.

Battery size and heat drive asymmetric multiprocessing in mobility devices.
The mobility world has always been forced to use “the best core for the job” because of the constraints imposed by battery size and heat dissipation.So architectures in mobility have always been created from a baseline expectation of heterogeneous core AMP.

This is in contrast to the server and PC markets, which have relatively unlimited (at least compared to a mobile phone) power-consumption and heat-dissipation capabilities. In these markets, it has always been easier to add more cores of the same type, connect them using cache coherency, and re-use the legacy software to run on top.
Things are starting to change, though, as the SMP approach starts to wear thin. For example, for server farms that power the likes of Google and Facebook, power consumption and heat dissipation have become huge cost and environment issues. And in the PC space, we have run into a “GHz wall” where the only way to have a step function increase in performance is to have different cores optimized for different workload types.

Why hasn’t AMP been implemented in the PC and server markets?
It’s hard.

In mobility designs, each heterogeneous processing core, whether graphics, audio, DSP, etc., usually has a custom firmware and software stack associated it. This software must be integrated to communicate with the CPU cores’ operating system, which necessitates coding work in the OS hardware abstraction layer and drivers.

Furthermore, these heterogeneous cores do not have a single view of system memory, so complicated synchronization schemes are usually implemented in hardware and software. Context switching and preemption are difficult to implement. And most importantly, each of these cores requires an expert programmer to code it, someone conversant in a particular core’s instruction set and tool chains. As a result, asymmetric multiprocessing has thrived in the relatively closed-to-developers/ISVs mobility and consumer electronics worlds while SMP has flourished in the wide-open world of PCs and servers.

The Heterogeneous System Architecture Foundation
The HSA Foundation is a non-profit organization that intends to make it easier for the world to adopt AMP architectures.

Its goals are to:

Make heterogeneous programming easy and a first-class pervasive complement to CPU computing
Continue to increase the power efficiency of heterogeneous systems (AMP), keeping it the platform of choice from smartphones to the cloud
Bring to market strong development solutions (tools, libraries, OS runtimes) to drive innovative advanced content and applications
Foster growth of heterogeneous computing talent through HSA developer training and academic programs to drive both learning and innovation

To achieve these goals, HSA will have to innovate by providing a technical framework and architecture to address the following issues:

Unified Programming Model – Today, CPU and GPU (or other accelerator) cores are programmed separately, with the GPU treated as a remote processor. HSA will allow developers to target the CPU or GPU by writing in task-parallel languages, like the ones they use today when writing for multicore CPUs.
Unified Address Space – HSA supports virtual address translation among the heterogeneous cores with an HSA-specific memory management unit (HMMU). HSA compute engines will use the same page-able virtual address space as used by CPUs today.
Queuing – CPUs, GPUs and other cores can queue tasks to each other and to themselves through an HSA runtime. Queuing can be managed in hardware to avoid OS system calls and enable very low latency communication between cores.
Preemption and Context Switching – HSA enables job preemption, job scheduling and fault handling capabilities to overcome potential problems created by rogue or faulted processes.

How will HSA do this?
HSA’s goals and the issues it has chosen to address are admirable, but are difficult to achieve. In my next article I’ll discuss the means by which the HSA Foundation will simplify heterogeneous asymmetric processing. Specifically, I’ll introduce the HSA solution stack, comprising the HSA Assembler, Runtime, Finalizer, and Kernel Driver, as well as HSA software libraries and intermediate languages.

Sources
Ganssle, Jack. “The Nulticore Effect.” Embedded.com, Dec. 8, 2008.
Moore, SamuelK. “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.” IEEE Spectrum, November 2008.
Kyriazis, George (AMD). “Heterogeneous System Architecture: A Technical Review.” Whitepaper, HSA Foundation, August 2012.
Processor core performance graph is from “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.” IEEE Spectrum, November 2008 and Sandia Labs.
Qualcomm Snapdragon S4 block diagram is from http://www.cnx-software.com/wp-content/uploads/2011/10/qualcomm_snapdragon_s4_block_diagram.jpg.
HSA Solution Stack diagram is from Phil Roger’s presentation at the AMD Fusion 2012 conference titled, “The Programmer’s Guide to a Universe of Probability: The Heterogeneous System Architecture.”

—Kurt Shuler is vice president of marketing at Arteris.

Kurt Shuler

(all posts)
Kurt Shuler is vice president of marketing at Arteris IP. He is a member of the US Technical Advisory Group (TAG) to the ISO 26262/TC22/SC3/WG16 working group and helps create safety standards for semiconductors and semiconductor IP. He has extensive IP, semiconductor, and software marketing experiences in the mobile, consumer, automotive, and enterprise segments working for Intel, Texas Instruments, and four startups. Prior to his entry into technology, he flew as an air commando in the US Air Force Special Operations Forces. Shuler earned a B.S. in Aeronautical Engineering from the United States Air Force Academy and an M.B.A. from the MIT Sloan School of Management.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

SMP, Asymmetric Multi- processing And The HSA Foundation

Kurt Shuler

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Recent Comments

About

Navigation

Connect With Us

SMP, Asymmetric Multi- processing And The HSA Foundation

Kurt Shuler

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored