Making A Multicore System Work

First of Two Parts: A deep dive into the interconnects on a complex system.


If you think designing a single-core system is hard, designing multicore systems is multiple times harder. Connecting all the pieces together and making them work properly, if not together, is one of the hardest tasks design engineers and architects will ever face.

System-Level Design tracked down some of the experts in this field and sat them down around a table to discuss what’s going on. Included in the discussion were

James Aldis, system on chip architect for Texas Instruments wireless business unit; Charles Janac, president and CEO of Arteris, Drew Wingard, CTO of Sonics, and Dave Gwilt, product manager for ARM interconnect products. What follows are excerpts of that conversation.

SLD: Let’s start with a really basic question. How do you define multicore?

Gwilt: We’ve been doing multiprocessing heterogeneous stuff for a very long time and in many different markets. Multicore is running a single software image across multiple processing elements.

Wingard: That doesn’t match what we see in practical systems.

Aldis: TI has been producing multicore chips for multiple generations now. We split the software into the piece that’s going to run on the RISC and the piece that’s going to run on the DSP and the piece of application processing that’s going to be offloaded onto a hardware accelerator. That’s all a very manual process. When I think of multicore these days I tend to think of what’s coming up in the wireless space where you have a single software image and it’s magically distributed over identical cores on the same device. But multicore means more than that.

Janac: There are a number of people who have tried to do the homogeneous multiprocessor kind of approach—similar to an FPGA. That works in some applications like defense and aerospace and networking, but it doesn’t work in cost-sensitive applications like wireless and consumer. As a result, we wind up with the majority of the market being heterogeneous multiprocessor SoC’s. Those are getting increasingly complex because the wireless carriers are constantly trying to deploy new applications and handset guys are trying to approximate the function of a PC. That’s putting increasing pressure on the hardware.

SLD: What do you actually gain integrating multiple cores, which share memory and busses, versus single-core chips?

Wingard: We’re doing these high levels of integration because we’re trying to get a certain amount of function at the lowest system cost and power and with the right amount of performance. We integrate not because we want to, but because Moore’s Law says we have so many transistors. It’s the job of the system architect to figure out how to make it work. In many cases, the thing that throttles these chips is that they have to share memory, but if you don’t share memory you don’t save costs. The personal computer space is driving DRAM road maps to give us increasing bandwidth per pin. Then we want to put the right amount of processing and bandwidth on the SoC so we can maximize utilization of that extra DRAM bandwidth. Some of this is also driven by form factor. You can’t do a multichip iPhone because there isn’t enough space inside the package.

SLD: Is the heterogeneous approach because each function requires different processing power?

Gwilt: Absolutely.

Janac: I was at a presentation where one gentleman said he was proud that his company was only using 7 percent of the ARM processor and that the rest of the system was running on these proprietary algorithmic engines. I wouldn’t be very proud of that.

SLD: So that’s 7 percent utilization?

Janac: Yes. They should be adding some intelligence that makes use of that resource and reduces the cost. One of the issues is how do you route the traffic to the cores that are available. What is the idle core doing? If it is idle, can you utilize it better?

Wingard: Today, in the battery-powered domains, they’re shutting off regions of the chip and turning off the power supply to several of the cores. If they don’t have anything to do, they’re shutting them off.

Janac: Or they’re putting them in a lower operating mode.

Wingard: All these games get played, but there’s an inefficiency associated with that. If you use heterogeneous cores, you can get better results. Your battery lasts longer. You can get higher performance. And you are much more able to support these multi-mode devices, which are still not general-purpose computers. PCs don’t do it this way because economics demand that you have a single software platform and you can run anything you want to pretty well. Application flexibility is much more limited. That doesn’t mean we don’t see clustered processors like the ARM MP core being useful for these applications. It’s still valuable to span a wider range of performance points by using some number of identical cores that you can schedule software across. ARM can scale an application, and the power associated with running that application, when you play with the voltage and the number of cores that are turned on.

Gwilt: That’s the key—using that to get power scaling across a broader dynamic range.

SLD: Didn’t TI do this with its DaVinci platform?

Aldis: Yes, we did. But there’s another aspect to all of this, too. The more open you make your platform, the more you end up in the PC world that Drew described. One thing we’re seeing in the wireless space with the advent of the iPhone and the mobile Internet devices that are coming through now is an emphasis on getting raw power out of the main processor and software portability. The wireless world, particularly at the high end, is becoming more and more like the PC world. This presents a challenge because just throwing gigahertz at something isn’t going to fly in the wireless world because of the constraints of power and form factor.

SLD: More and more, chip developers are trying to get multiple generations out of chips because of the cost of creating one. Is it harder to do with heterogeneous cores?

Janac: No, that’s where the interconnect comes in. If you have the right structure for the interconnect, you’re actually able to add in and back out IP in a much more cost- and time-efficient manner to get multiple derivatives.

Gwilt: That’s absolutely correct. Nowadays, with the type of interconnect technology that’s available, we’re able to build chips with very large numbers of cores and use the content of the cores that we require. We can choose those cores dynamically and maintain a highly optimized solution.

Wingard: There are some interesting examples where they take a subsystem, and within the context of a platform they implement that function in dedicated hardware or an optimized programmable processor. They get to higher performance and lower power that way. But in other versions of the same platform they move that same function into software. From the perspective of the application, the platform is the same. They’ve put in a layer of middleware that allows them to be agnostic. That makes it much easier to take this common platform definition and build different variants.

SLD: Your definition of interconnect is different than the historical one. This version seems to have logic built into it so you can optimize performance in multiple products.

Wingard: We want to put enough intelligence into the interconnect so that some part of the platform definition relies upon logic within the interconnect. What’s different about each chip is the set of IP cores, but there’s a set of common functions that are part of the platform definition. Some of those functions live within the interconnect—things like how do we enforce security and how do we manage to recover from errors. What scares me most about phones becoming more like computers is I really don’t like getting blue screens when I’m in the middle of a call. We expect stability in our appliances.

Gwilt: That same requirement for stability is also being driven by the need for integration. Our customers all want to pull together very significant platforms in very short periods of time. Having the ability to manage that stability through the interconnect is a valuable function.

Janac: If you use the interconnect to assemble these kinds of platform applications, you also need some automated and sophisticated tools for the design of the interconnect and for verification. It’s a matter of both the IP and the tools that come with it that are required for rapid time to market.

Wingard: The total amount of communication that we have to manage in the interconnect grows with the total number of components that have to be connected. But historically the fraction of the chip that’s dedicated to the interconnect and the main memory controllers has been remarkably constant across a wide variety of applications and design styles. Typically, between 8% and 12% of the die are interconnect and memory system components. As the chips get bigger, this is the part of the system that must change for each design. I can mix and match components, but the interconnect is going to be different every time. It is the most chip-specific IP, even in a platform definition. That’s why the automated tooling for this part of the design is so important.

SLD: But interconnects traditionally have been several steps after the initial architectural design. Has that changed?

Aldis: We’re now in our third generation of SoC platforms where we’ve known what our interconnects are going to look like—maybe not all the dots on the ‘I’s’ and crosses on the ‘T’s’ but we’ve known at a very early stage what we’re going to be using. We also know all the requirements we’re going to put on the different cores in the chip so they can plug into our interconnect environment. Nowadays, when we build a chip the interconnects are enabled before any of the cores. We have legacy cores, of course. But for any new cores, before we have working RTL we have an interconnect. This makes a huge difference between the time it takes to kick off a project and see the test cases running and starting to debug and analyze. We also have a System C model for the interconnect technology we’re using, as well. That’s part of the very initial architectural studies.

Wingard: This has a lot to do with the application domain that’s being targeted.

In those places where you put multiple cores together, you have to worry about the sharing behavior and performance. You quickly get the point that until you have a model of that system and you need to understand the implications of a shared memory and the interconnect that feeds it, you don’t know if you have an architecture that works. For those domains where it’s not, ‘Slap it together and we don’t care about performance,’ you absolutely have to have the interconnect technology and it has to be available very early in the architectural phase of the chip. There are many designers from ASIC background who aren’t used to that.

Leave a Reply

(Note: This name will be displayed publicly)