Partitioning Drives Architectural Considerations

Experts at the Table, part 1: When and how do chip architects prioritize partitioning?


There are multiple reasons for design partitioning. One is complexity, because it’s faster and simpler to divide and conquer, particularly with third-party IP. A second reason involves power, where it may be more efficient to divide up functionality so each function be right-sized. A third involves performance, where memory utilization and processing can be split up according to functional prioritization. There are also different types of partitioning.

Semiconductor Engineering sat down to explore these issues with Raymond Nijssen, vice president of system engineering at Achronix; Andy Ladd, CEO at Baum; Dave Kelf, chief marketing officer at Breker; Rod Metcalfe, product management group director in the Digital & Signoff Group at Cadence; Mark Olen, product marketing group manager at Mentor, A Siemens Business; Tom Anderson, technical marketing consultant at OneSpin; and Drew Wingard, CTO at Sonics. What follows are excerpts of that discussion.

SE: Partitioning is a really complicated topic because there are different types of partitioning. From a system level, how do you decide architecturally where to put what, how to divide the different pieces for a variety of reasons? When an architect is approaching a design, how do they prioritize decisions around partitioning?

Nijssen: Oftentimes, architects approach a problem entirely from the top down, so they have ‘this function, that function.’ They know the flow of data through the system and they are not thinking yet about how eventually it’s going to get back to the actual hardware, whether it’s CPUs, blocks of ASIC, FPGAs or something else. So they are going to first assume they can do anything they want, but of course they can’t be ignoring the reality completely. At some point they have to break it down into pieces, and that’s where the partitioning job starts and they start architecting. ‘This thing here has to go into an FPGA, but if I put this in an FPGA then it will be like terabytes of data per second flowing in and out between this block and that block.’ So that block also has to get pulled into this FPGA. But now that FPGA is all filled up, so I have to take something else out. That’s already like a partitioning task at an architectural level. It’s top-down getting more and more aware of the actual hardware capabilities and what’s out there in the market. You can’t assume that you’re going to have a [simple] chip where all of your memory is going to be on-die. There are a lot of these tasks that from the top all the way down start to get pulled into the system architecture. Those are all partitioning tasks, where different parts of the system are getting mapped to a different part of the system.

Metcalfe: On the architectural side we’re partitioning across lots of different things. But even at the system-on-chip level, there is partitioning to be done, and one thing we see quite often is the partitioning is great from an architectural point of view, not always great from an implementation point of view. Quite often, re-partitioning takes place later on where you take this initial partition and then you have to sort of change it for physical implementation.

SE: How far down do you repartition?

Metcalfe: It really happens quite early in the flow, because once you’ve made that commitment it’s very difficult to undo later. But you may take your logical partitions and start re-partitioning for physical constraints, and then you will start the system-on-chip development because you may also have IP that you have no control over. That has to be a partition all of its own. Partitioning is not necessarily just an architectural thing. It’s a re-partitioning that may come up later on.

Anderson: I want to add four important aspects. One involves the IP blocks you’re going to use. People decide fairly early on in most chips what processor they’re going to use, how many processors, and those are pretty fundamental decisions plus any changes. You need to be aware of those even in the early stage of the architectural analysis. The second aspect is how those are connected to the top-level bus structure. A hot topic now and in the last few years is networks on chips, where you reduce your bus structure by just having a bunch of independent chips that are connected by a completely asynchronous bus, resolving some of the old verification and implementation challenges. That tends to be decided very early. Power would be the third aspect. Power domains are pretty fundamental to every chip that’s out there these days, no matter what you’re building. People want more power, and there are lots of techniques, but most of them boil down to turning off or reducing power in low voltage on a whole set of blocks over time. That’s not decided quite as early, but it’s in the minds of the architects pretty early on because power is very much a part of the system specification. The last thing is the notion of designing with verification in mind, and maybe even changing your design a little bit with verification in mind. It’s something that’s been floating around for a while. I’m not convinced that it happens very often, but being in the verification space. Maybe using a NoC is one way to reduce the verification problem. So you have a bunch of blocks you really can verify independently, you verify the bus formally, make sure the protocol is right, and maybe you don’t actually do a whole lot with that whole chip as a monolithic verification entity. Those are the different dimensions as I see them.

Kelf: One thing we see happening, getting back to the whole Portable Stimulus thing, is that as you partition the chip from a design point of view, it’s really hard to partition the verification point of view. So the blocks are tested on their own, very thoroughly. A lot of UVM tests are created to address all of that. Then you come to the system-on-chip. There’s a phase of infrastructure verification, where you’re making sure that things are connected right, the blocks are basically working right. If you have a cache coherency requirement, that that’s all wrung out. All those kinds of things. And then you’ve got the software test on top of that and booting Linux, and that kind of thing. And what we’ve found is that although you can partition up the design, partitioning up verification scenarios in those individual blogs or to try to reduce the verification load by breaking up is almost impossible. So how do you create this high level scenario and drive test around the chip and make all that happen? It’s now becoming a really big problem, and as you add more of these blocks, trying to figure the verification plan out is very tough—and un-partitionable.

Ladd: Architects have to make decisions in performance and power, so they have to make sure that not only the hardware is correct, but the software management of the power is going to be correct, too. And they’re making a lot of these decisions without a lot of data, which comes in very late in the design cycle before they can actually say, ‘Are we going to meet our goals?’ How do they know they’re making the right decisions, both on the software side and the hardware side? And then when it’s finally going to get implemented, are they going to be correct? If they have to readjust the partitioning, it’s too late. They don’t have the flexibility anymore.

Wingard: Going back to your original question, I almost never get to meet a system architect who has the flexibility to start with a blank sheet of paper. Almost every system is an iteration on something else. So while everyone would like to have this abstract, top-down thing, that’s not reality for most people. Partitioning is still super important, but the good system architects are masters of abstraction. They can understand the system at the highest level, and the components that they’re not allowed to change. And they can dive down incredibly deep in those areas where they need to have change, either because there’s a new function or there was something that was too much power last time, and it’s the thing that blinks red whenever they try to characterize what’s going on and what they’re going to optimize around. So those partitioning choices are made very carefully. As a NoC provider, we want to enable those partitioning decisions to be made early and late, so the conversation about the difference between the logical partitions and the physical partitions is very real to us. And often it’s around power. The architect may say, ‘I’ve got these fundamental power domains that I need to worry about.’ And then the physical team says, ‘Well, that’s great, this part of the power domain is over here, and this part of it is over there, and logically that’s okay, but we’re going to have to restructure the netlist automatically as we go to build the thing. A good thing about NoC technology is we have a sufficiently abstract model of communication that we’re in a position to do that and hopefully protect all the performance work that’s been done in the architecture to that point. We’re also a provider of power control technology, and there’s a full dual in that domain as well that the control of those different power domains and the clock domains and everything else is also very fundamental to the partitioning of the system—especially when we start to think about extremely low power modes where battery life is dependent upon how much of the chip is always powered.

Olen: I’ve also been on the verification and test side for a long time. Partitioning isn’t anything new. I can remember back in the 1980s and 1990s, where we’d partition for testability — design for test. How do you partition your scan chains? How do you partition your synchronous versus asynchronous logic? How do you partition your memory for built-in self-test or LFSR MISRs [linear feedback shift registers, multiple input signature registers] for logical built-in self test. So that might be one of the early design-for-partitioning, architectural things that need to be taken into consideration. We’ve kind of moved into design for, I don’t know if it’s verification or not, power optimization, clock mechanism, controlling, as well—being able to turn clocks on and off, which is power-related, as well. But looking at all of that, and then the whole portable stimulus thing now, which is the ability to re-use partitioned IP-level stimulus in some way at the system level. We’re seeing, believe it or not, yet another generation of this DFX, which is design for safety, and this is another area where we’re seeing our customers looking at how to partition ASIL A, ASIL B, ASIL C, ASIL D levels. ‘How do I automatically synthesize and partition safety mechanism controllers? How do I understand how to automatically inject triple-level redundancy voting mechanisms in my high risk areas?’ All of these things are starting to creep into the architectural analysis area. And you said you’ve never seen someone start from a blank sheet of paper. There’s always some kind of requirement, whether it was DFT or design for verification, or design for power/performance, and now design for safety. There’s always some kind of level of requirement that’s early on—functional safety, data mining and machine learning.

Anderson: Adding safety to the mix is a great idea because if you look at the standards that we’re all following in these safety-dependent domains such as autonomous vehicles in aerospace and so forth, they all kind of had this notion of the safety-critical logic. It has special rules. It has to be protected against random errors in the field, and so forth. You have to be able to identify what that is, and that affects your partitioning. Here’s this clear box that I have to treat specially. I have to analyze it specially. Somebody will even have to externally certify it before this chip will be accepted.

SE: When you’re not starting from a blank sheet of paper, can you re-partition the parts of a design that you can’t change? Is that possible, or have you gone too far by that point? And how does that affect the decision-making for the rest of the design and the architecture?

Wingard: As an engineer, I have to go after the word, ‘can’t.’ We always can change. The question is what’s the cost? How steep is that wall? How many levels of management approval do I need? I could go get that new version of the processor core. That would be better and make my job easier, but someone’s going to have to cut a big check, and someone’s going to have to harden something and all kinds of stuff like that. We’re always amazed at the degree of cleverness that we find in the customer base. Things that you would think would be hard to re-partition, people find ways of doing it. Sometimes it’s not a pure re-partition. In the old days, partitioning was about hardware versus software, and what we’ve found is there’s one of those onions where at every level you peel, you now have that hardware-software tradeoff at every level. So you may have the audio subsystem, which has its own hardware/software tradeoffs, and that audio subsystem goes into this next level subsystem. And then you got this…and we’ve got many layers of this because we’ve got programmable processors all over the place. So one of the things we find is that sometimes the re-partitioning choice is to say, ‘I can’t change the bulk of this, but I can pull that function out of it. That software function that was running in this part, I can take that and I can move that over to something else, and maybe that helps me from a power perspective.’ So now there is this big honking thing so that when he’s doing the main function, he’s got to be turned on, he doesn’t have to be turned on all the time, and maybe I can optimize around my power envelope by moving this task someplace else. Maybe I’ll move that task into dedicated hardware or programmable hardware.

Metcalfe: That’s a really good point because the partitioning decision is just becoming even more complex than it was in the past. As you said, this is not a new discussion, but now we’ve got 3D chips. You’ve got wafers stuck on top of each other, so now as well as partitioning across one die, you’re now partitioning across other die. So your reuse case is an excellent one because you may keep one die the same because you don’t want to touch it. The other die that’s now put on top of it, you may add some IP on that to speed up some algorithm, and so you’re not only partitioning within a single die itself. You maybe partitioning across multiple die, and then the whole reuse thing gets even more interesting.

Leave a Reply

(Note: This name will be displayed publicly)