SoC Integration Headaches Grow

Every chip has in-house and commercial IP, mixed signal components and a slew of tools. What’s the best way to put them together, and how do you wade through the mountains of data?

popularity

As the number of IP blocks grows, so do the headaches of integrating the various pieces and making sure they perform as planned within a prescribed power envelope.

This is easier said than done, particularly at the most advanced process nodes. There are more blocks, more power domains, more states and use-model dependencies, and there is much more contention for memories. There are physical effects to contend with, as well—resistance/capacitance thermal effects, electromigration, electrostatic discharge, dynamic power density issues in finFETs—along with double patterning at 16/14nm and triple/quadruple patterning at 10nm. And there are far more corner cases, more design rules, and in aggregate, much, much more that can go wrong.

Put all of these factors together and it becomes imperative to be able to check the various pieces included in a design, which is frequently a mix of in-house and commercially licensed or purchased IP. And while there are fewer designs overall—as consolidation continues, fewer vendors doing them, too—there are far more tools, techniques and tests required for each new advanced design.

“Blocks now have 1,000 or more connections on them,” said Bernard Murphy, CTO at Atrenta. “The big designs are getting so large that you need a static mechanism for checking this stuff. There are companies stressing formal as a solution, but that’s like using a pile driver for cracking a walnut. We need a lot more static testing to make sure you hook everything up right, that you’ve done the clock and reset right and that you’ve done simulation correctly. You also need to test the hookup. And with more and more analog content, you have to run AMS simulation. Regular simulation is a poor way to check connections because your enable or reset signals may send the wrong parity when you hook things up.”

Tools are one solution. Architectures are another. And herein lies one of the big ongoing debates: How much of an advantage do pre-integrated and pre-optimized subsystems really provide? So far, the market has spoken in all but a couple areas, such as processors and audio/video subsystems. But there seems to be a shift underway as traditional markets consolidate and new ones are created.

In traditional markets, there are fewer IP vendors. For another, there are fewer large chipmakers to buy their IP. That means solutions can be more customized and optimized for those remaining customers, which in turn speeds time to market. But there also is a whole new crop of players in the IoT world, and for them the ability to try out new ideas quickly can be the difference between market share and a wasted effort.

“In a subsystem there is a range of IP that we are stitching together,” said Jim Wallace, director of the systems and software group at ARM. “That can include a multi-layer interconnect, flash, SRAM controller, and IP to build an IoT end node. But the device at the end node is ultra constrained. The apps use between 32K and 64K of RAM, 124K to 256K of flash, and they have DQS (data quality services) security, radio technology, and they primarily rely on battery power. If it’s an ultra-low-power device, it has to last years to months on a charge. If it’s mainstream, it needs to last months to weeks.”

Wallace said that a subsystem decreases risk for chipmakers, cuts time to market, and allows design teams to focus on what really differentiates their product. The same argument has been used for years in the standard IP business, but as the number of choices increases, particularly at advanced nodes, just being able to pick the right IP is becoming a massive problem. A subsystem eliminates at least some of those issues, and vendors are now characterizing those subsystems in trial layouts so they can better predict performance and interactions with other subsystems and IP blocks.

But even with just two basic parts—a memory controller plus a PHY—integration doesn’t always go as well as expected.

“The first integration challenge is to make sure the physical layer and the controller work well together,” said Frank Ferro, senior director of product management at Rambus. “One company sells a PHY and a controller together, and we had a complaint that one customer couldn’t get them to work properly. It’s the same with a SerDes. That has to be certified by a third party for something like PCI Express, but when you integrate that into an SoC you may not be able to match the performance you expected. So there may be performance at a certification level, but there are questions about whether your SoC can take advantage of it.”

Too much data
That doesn’t solve another problem, though. There is simply too much data in complex designs for anyone to effectively manage. While there has been much talk about continually raising the abstraction level, data management remains a big problem. And increasingly it’s becoming a problem at points in the design flow that weren’t concerned about an explosion of data in the past. The amount of data produced at every step of the flow, from architecture to physical layout to verification and test to manufacturing, is growing quickly.

“We have to get to the point where your tools are giving you specific information,” said Mark Milligan, vice president of marketing at Calypto. “As an industry we’ve gone off on the wrong track reporting more and more numbers. What we need are specific actionable recommendations. But even that is only part of the problem. Then designers have to apply that knowledge. Even when you’ve got the concept—area, performance and design—you may want to apply design creativity and do things in a different way. This isn’t about just throwing analysis at the problem. You need real recommendations. You want to be able to try out things and get rapid feedback.”

He’s not alone in that assessment. Simon Davidmann, president and CEO of Imperas Inc., said the focus of testing is power, performance, and whether you want to change the architecture. “There are all these use cases, and each one requires different modeling. This is not all RTL and it’s not all architectural design. Some of it is power analysis. And whenever you have a large software component, the first goal is to get the system to work. You’re probably not thinking about power, security or configurability.”

All of that adds data, which includes everything from RTL to library exchange format data for standard cells, and continues ballooning through GDSII, verification, and on to manufacturing, which is largely mathematical data. At the manufacturing level, in particular, the numbers are jaw-dropping.

“The amount of data is exploding,” said Tom Quan, director at TSMC. “It’s gotten to the point where everyone has to go to a cloud. Most of those are still private clouds, and in our case we have to manage all of that and keep it compartmentalized. But when you think about the amount of data coming from 450 customers with 8,000 products a year, the challenge is how you store and process that data efficiently and access it fast.”

Bigger machines
The biggest beneficiaries of this data explosion have been the hardware acceleration vendors—Mentor GraphicsCadence, Synopsys. In fact, Synopsys’ purchase of EVE (October 2012) was a direct result of the need for a brute-force approach to process more data quickly. And the fact that emulation is now being used for everything from software prototyping and modeling to verification is a sign of just how much more data is being produced. All of the Big Three EDA vendors have reported continued strong sales of emulation hardware platforms.

The recently announced deal between Ansys and Mentor Graphics, adds yet another wrinkle for how emulation is being used to process this data. Mentor has added an API onto its verification platform that allows Ansys’ power exploration tools to work with the underlying hardware acceleration engine. Mentor officials have declined to say whether the API would be expanded to allow other tools to take advantage of the hardware.

But despite the improvement in processing power offered by emulators, the growing data problem remains just that—a growing data problem.

“This is an inevitable consequence of Moore’s Law,” said ARM Fellow Rob Aitken. “There are more decisions, more corners, more and more analysis for standard cell libraries, memories. And if you break it all into small pieces, that doesn’t necessarily solve the problem anymore, either. So now the question is whether you can come up with approximate solutions to problems and not solve all the physics.”

Conclusion
That’s certainly one approach, but whether it will be a good solution for SoC design remains to be seen. More pieces, whether big or small, more interactions between those pieces, and more things that can go wrong is making it far harder to create working silicon, to integrate more IP blocks and subsystems into those SoCs, and much, much harder to optimize all of it. Even data that is created to track problems and solve issues is becoming a problem by itself.

Even placement is becoming a problem. Rambus’ Ferro said that placement of a PHY and SerDes now can affect performance and power optimization. “With HBM, you have thousands of pins, and you have to route all of those signals from the controller. Just from a functional integration standpoint, you have to figure out where you have room to place those on a die—and sometimes there are four of them on the same die. This now has to be addressed by multiple teams with package, signal integrity and circuit expertise.”

The data that is created to track problems and solve issues is becoming a problem, as well. Just to put this in perspective, a copy of War and Peace, widely recognized for its heft, is 3.6MB, according to Project Gutenberg. The amount of data SoC design teams now routinely produce is thousands of times larger than that. Finding an error in this mass of data, whether it’s integrating two IP blocks or trying to figure out why one block works better than another, is like looking for a pin on the ground from a moving airplane, with the amount of data increasing at each new node.

At 7nm, there will be quantum effects data to analyze, as well as more process variation, possibly octa-patterning if EUV isn’t commercially viable by then, and certainly far more IP and interactions between those IP blocks. And IP created for one node or one foundry’s process will no longer work on another foundry’s process because the processes themselves are diverging, with a longer list of rules and corners for each one. The design world has been very thorough in ensuring that every step of the process has a data trail, but the data trail is getting so long and so complex that its usefulness will be limited without better ways to refine it and utilize all of that data, not just pieces of it.