Adventures In Assembly

How engineering teams are approaching SoC assembly is changing.


With so few SoC designs — if any — today designed completely new from the ground up, the assembly task is an extremely important one to get right. IP components must be put together optimally and efficiently to perfectly match the application requirements, which is complex and intricately nuanced.

How companies approach IP varies significantly from one to the next, from one market segment to the next, and even within those segments from one design to the next.

That helps explain why ARM has been developing and characterizing IP for different markets, adding such features as ECC and soft error correction for the enterprise markets and what amounts to reference designs for other markets. “We’re take a reference design to the GDS level, so you can harden that and be done in weeks, or you can do variations on that and use that as the starting point,” said Ron Moore, vice president of marketing for ARM’s Physical IP Division. “So for radio IP you may want that hardened. You probably want security IP hardened. This is why POP has been successful. The IoT OEMs take chips from embedded guys, put them in a package, and they’re as successful as if they did their own silicon. It isn’t not a full custom ASIC, but how much do you need to plug to play? It’s a platform concept where someone provides 80% and you do the difference.”

Then, engineering teams must figure out the impact of the assembly decisions they make on the power and performance of the SoC. Drew Wingard, CTO of Sonics, said that even as the company was being formed in the late 1990s he realized that most of the people doing SoCs came at it from an ASIC design background, and they were used to getting an abstract specification from somebody. “In the old days it was more of a system company or a system architect. These days it’s more of a chip architect that basically says, if you hook these things up, we’re going to be good. In that case, assembly is really just about assembly. The founding team of Sonics was coming at this from more of a microprocessor design background, and in that domain, you never dreamed — for very long, at least — of trying to build a processor as just an assembled composition of micro-architectural units. You only think about those micro-architectural units in terms of how they impact the performance, and not just the frequency but the runtime of software on top of that thing. You don’t do your micro-architecture until you’ve already done your performance analysis.”

As part of that, engineering teams build simulators and performance models and try to run interesting sequences of code. “If you don’t know what code you’re going run it’s probably because you’re going to build something so general purpose that you can go use some of the benchmarks out there. You start to ask how is something going to work on this hardware, and that’s how you build it,” he explained.

In addition, reminded John Swanson, senior marketing manager, DesignWare IP at Synopsys, traditionally when we talk about SoC assembly, we’ve talked about using some type of interconnect and hooking up your processor and your Ethernet MAC to it, so something like AMBA is used with an ARC core and a time sensitive aware Ethernet MAC, and you put the thing together — and that is called IP assembly. “I think that worked for while. Now, putting it together is more involved than the interfaces — that gets a chunk of the signals but you still have to deal with power modes and clocking logic, you might be multiplexing things together to the same five ports so there’s still a lot of design. But once you get that put together, you can do updates real quickly, and I think people have accepted that. Once I build the subsystem, let’s say there’s an update to the Ethernet specification, when I want to add a new feature, I can quickly rebuild everything with a new version of the IP, and I can do that from a pretty automated point of view, and that gives me a big, flat design. What gets more challenging today is the amount of stuff we’re putting on these chips. People really have to pay more attention to the power and performance, and this not only is in the way you put blocks together but also in the way you build blocks.”

Differences matter
Wingard pointed out one thing that’s different about SoCs is that they are less general-purpose than processors in general. “Processors tend to be done in a very general-purpose way, so it’s difficult to say you’re going to make this optimization for this one example because someone’s got counterexamples to say that optimization is not good for this other case. So you end up trying to do things that generally improve performance around the board. SoCs are very different from that. Generally speaking, SoCs tend to be pretty heterogeneous because you know what applications or set of applications you’re targeting, and there is a set of components that can get something done at a better performance/power/area characteristic than running it in software on a general purpose machine.”

As soon as you start to go down that path, it becomes necessary to make sure that this collection of things is still going to offer those advantages so your performance modeling is necessary, he noted. “We’ve got all of these components that are cooperating together and the performance we’re looking at is really pretty abstract. We care about the amount of data being moved, where it’s going, how it’s broken up in terms of chunks of data. Transaction-level modeling is kind of ideal for that. We don’t care about the contents of the data, we just care about how much it is and where it’s moving to. When you worry about memory you worry a bit about the fact that access characteristics of DRAM are very dependent upon the access patterns, so you care about some of the addressing behaviors so that you can try and estimate how well you’re going to be able to keep DRAM pages open, among other things.”

Another item to be concerned about with the SoC assembly challenge is that for many SoC platforms there isn’t a single set of traffic or performance characteristics that define the application, Wingard said. “Many times these are multi-mode devices, so there are different use cases. You need to be able to look at this performance across a range of scenarios. Here, the idea has been introduced that some of this work can be done without simulation. You can characterize the traffic that would come from all of these different masters and do a static analysis to make sure that the basic network has enough capacity to meet the basic needs, and then you use the dynamic simulation more as a tuning step, reducing runtimes and allowing more architectural choices to be considered.”

Automation helps
While every design is different, there are enough common steps in the integration process that at least some of it can be handled by tools such as testbench generation for the interconnect.

“In simulation, as in emulation, you always want to limit the capacity to just the amount of verification you need to do,” said Frank Schirrmeister, senior group director, product management in the System and Verification Group at Cadence. “So if you want to run a specific subset, you would have to manually assemble all of that every time you change your verification question, and that’s really what you want to avoid.”

What assembly tools do in the front end is the interconnect topology data coming from tools like ARM CoreLink Creator and the verification components are added to this, and the RTL is assembled from there, he said. “Imagine you want to run 50 different configurations in an interconnect scenario. Imagine having to do this by hand. Just the manual assembly is error-prone enough to get lots of bugs from the way things were put together.”

That same approach carries over to performance analysis. “You limit your design to just the portion of the design you really want to analyze, Schirrmeister said. “You don’t take the full chip. For example, you kick out the compute subsystem and replace it with traffic data. The same is true at the end. At some point you want to get to the full chip and all the same things about identifying the subset apply, but it’s even worse because you have to deal with much higher complexity and you have more components to integrate. At the chip level, you want to be able to quickly, and in an automated way, assemble the chip. But doing that at the full chip level comes back to the fuel to the models. If I want to have this tool automatically create my instantiations of different configurations, I need to declare to it in this IP-XACT-type meta-models how all of the blocks look. For a really complex chip, that may come from a multitude of internal and external resources. The key item here is not so much the tooling but being able to service it with the right level of models of the IP because it becomes very important.”

Other companies, such as indie Semiconductor, use a different approach to assembly.

“We don’t build monolithic chips very often,” said Paul Hollingworth, vice president of marketing at indie Semiconductor. “What we’re doing is building multiple die in a single package. What we’ve done is try to solve the problem that you have when you do set out to build a custom microcontroller. So which process technology is the best to use? If you want something with high-performance microprocessing, high-density memory that pushes you down an embedded flash, high density path, which typically means a high NRE. And it typically means a small process geometry like 55nm for embedded flash. And that, of course, means that your NREs are expensive, your ability to handle voltages is very limited, and you may have some RF challenges and so on.”

Indie’s approach is to build a striped down micro-core as Chip No. 1, which is about 75% memory. “The flash and SRAM take up the great majority of the die, with no peripherals and very minimal I/O. That’s because the I/O only talks to the other chip. It doesn’t go off the package. As far as the customer is concerned, the fact that we have two die on the package is completely transparent. It still behaves like it’s an integrated ARM processor core with memory. As far as cost goes, it used to be very expensive to put multiple die in a package, and now it is incredibly cheap—particularly when you get the benefit that you’ve got a far more optimal arrangement of the die in process terms,” he added.

What is missing
Even with all of the progress made, given the sophisticated nature of SoC assembly today, more can be done. Simon Davidmann, CEO of Imperas, said what is lacking in assembly is a better way to describe things. With his experience in his previous company of taking Verilog to a higher level by developing it into SystemVerilog, one of the key components was the addition of the concept of interfaces to Verilog to allow large systems to be composed out of blocks.

“The idea was you separated the behavior from the communication, and we put into the language the concept of an interface, which would encapsulate the communication and abstract it, so it could help you with the design and assembly,” Davidmann said. “I’m not sure there are really good tools in the market that help people. Users are trying to do it graphically. There are formats like IP-XACT, but they are quite limited, so there’s still work to be done in evolving to high-level construction languages to make an easier job of chip construction and assembly.”