Making A Multicore System Work

Second of Two Parts: Software, derivative chips and super chips.

popularity

Making all the pieces work together in a multicore system requires a deep understanding of the technology, lots of different layers of synthesis, and some incredibly complex testing strategies.

 

System-Level Design sat down with James Aldis, system on chip architect for Texas Instruments wireless business unit; Charles Janac, president and CEO of Arteris, Drew Wingard, CTO of Sonics, and Dave Gwilt, product manager for ARM interconnect products. What follows are excerpts of that conversation.

 

SLD: It’s hard to synthesize software now. Is it possible to synthesize the interconnect?

Gwilt: You have to look at techniques for traffic generation rather than software to bring these platforms up for testing the interconnect’s properties. Other techniques are needed to put the interconnect through its paces.

Wingard: The high-level design modeling often breaks down into function versus performance. In the functional domain, you have to have the software, but in many cases you can abstract away the interconnect and the memory system. It’s just bits and bytes at that level of the model. The virtual model stuff done by Virtio, which has been bought by Synopsys, were really more focused on trying to get a functionally accurate model into the hands of the software folks as early as possible without worrying about performance. We’re more worried about the performance, and for many of the accelerators it’s pretty easy to abstract away their communication behavior. The System C or TLM (transaction-level modeling) models can be living objects that the architect uses early for provisioning design, but as the design is integrated and I have the real stuff to play with I can compare what I’m seeing versus what I predicted. If they’re different, I can re-tune and re-optimize that part of the design—and I still haven’t built the chip yet.

Janac: There are different levels of abstraction. There is tremendous benefit in doing architectural exploration using traffic to model what the SoC is going to look like even before the IP is there so you can identify latency and make estimations of performance. But as you get onto the next level, which would be the RTL level, then you can use cycle-accurate System C models, which are much closer to the real performance of the system and ultimately you get the results from the FPGA and simulation. You’re looking at three or four hierarchies of modeling accuracy. The benefit is you’re likely to come out with something that’s going to work.

 

SLD: Does that make it harder to scale a platform into derivative systems?

Wingard: There are different strategies for scaling these platforms. Some companies go after the ‘Super Chip.’ They build the most complex platform they ever want to make. It’s a very expensive way to do it, but you get an ideal software vehicle and whatever chips you build are going to be subsets of that chip. It’s very easy to turn off the hardware and software. You tend to come up with a piece of silicon that’s harder to sell because it’s more expensive and bigger than it has to be. A lot of people can’t afford to build a throwaway chip, and the Super Chip approach requires you to understand what the market is going to look like several years from now. The alternative approach is to be in continuous refinement phase. The architects’ models, which normally would be thrown away, need to be kept alive. If something changes, how does it affect the previous model? And maybe you don’t have to do all the detailed analysis work again. The last thing you want to do is figure this out at the polygon level.

Gwilt: It also helps to have debug and trace hardware built into your core, so you can understand where it’s performing well and where the inefficiencies are. Having debug features may make the difference between being able to extend the platform another generation in software versus having to rebuild the hardware platform. That can be the make or break for a platform.

Wingard: I completely agree. One of our customers shared 30 milliseconds of trace data off one of their production high-definition TV designs. We used that for optimizing the next generation of our interconnect. That kind of visibility and trace data can be invaluable for the IP provider as well as the system architect.

 

SLD: What are the problems TI is seeing with this kind of technology?

Aldis: We’re probably like the first type of customer. Our flagship OMAP product looks more like the Super Chip, and then we go through derivatives—although they would have other things on them, as well. It’s essential for us to maintain models of interconnect technology, the different cores we’re using and the applications. Traffic models give you some idea of what the system looks like when it’s running, and how much the system suffers if you increase latency. We spend a very large proportion of our time trying to recalibrate the kind of model libraries we have for our SoC platforms against the results. When the software people tell you three years in advance that they expect it to need this number of instructions and this kind of cache footprint and you come back three years later and they have the codecs running, but not necessarily with the same numbers, it takes constant work to keep these things alive.

 

SLD: How do you optimize utilization of the individual cores on the chip?

Janac: You get different levels of detail as you progress down the design path. Initially you’re working with TLM 2.0 models, which are fully untimed and which give you the function. Performance is an estimate. Then you go to cycle-accurate System C, where you get within the range of performance. Then you have FPGA data. You get more and more detail. That’s very important, because you can make all sorts of analysis and tradeoffs and estimates. Then, on your derivative, you can look at how that stacks up.

Aldis: Derivatives are easier because you have a lot more information when you build the derivative than when you build a new platform. Whenever you’re doing early architectural decisions for a major new generation, you use whatever information you’ve got. That’s often very little information, but there’s a huge time pressure to make decisions, decide what the thing is going to be like, whether it’s green or brown, 32-bit or 64-bit or 128-bit. If you don’t have a working model and the project needs to make a decision and someone’s gut instinct is strong enough that it will be okay, you go with that. With a derivative, you have a chance to put model data together and measure data off the platform, so you can make more informed decisions.

 

SLD: Is complexity getting to the point where it’s affecting business decisions about what gets developed?

Gwilt: Our customers’ systems are growing dramatically. Whether they will be able to over-engineer platforms, I don’t think so. Unless you’ve got a very significant parent company that can afford to take a few more square millimeters of silicon than they might need, there will be a lot of pressure on area, gate count, the ability to implement it both in processors and cores.

Aldis: There is no doubt it’s getting more complicated. The number of engineers we need to build a big SoC these days is huge.

 

SLD: Is the number of engineers going up, and are automation tools helping?

Aldis: It is going up. The tools are obviously helping a lot. Four generations ago, for example, we did the interconnect by hand. Now we couldn’t imagine doing that. What’s happening now is the level of modularity on the chip—the number of subsystems containing their own interconnect—is going up. We used to have a team of five or six people building the interconnect. Now we have five or six people that are responsible for 20 interconnects on one device. That could be a break-even case. In other areas there could be less tool support. When you’re doing video processing, you don’t have the level of specialist tools that you have in the interconnect space. There are a lot of interesting tools that enable you to synthesize in C code, but there’s still a lot of hard work running those things or putting those together.

Wingard: One of the approaches we see companies take, which is what TI is doing, is building subsystems. They’re building larger collections of IP into an object that is more application-domain focused, like a video processor with a lot of local memory and a collection of IP cores that is potentially re-usable. All of the software assets needed to make that thing do its job are considered part of that object. That’s a much lower-cost way of re-using that component than if it’s a bunch of separate building blocks. That’s a positive trend we see coming.