End User Report: Reliability

How low power affects Cisco’s technology road map and what changes at future process nodes.

popularity

John Kern, vice president of product operations inside Cisco Systems’ customer value chain management group, sat down with Low-Power Engineering to talk about the company’s internal focus on reliability and what factors are causing the most concern. What follows are excerpts of that conversation.

By Ed Sperling
LPE: How does Cisco gauge reliability?
John Kern: The bulk of our revenue today is switching and routing products, which have high use of complex ASIC and microprocessor technology. We have a pretty extensive technology qualification process, as well as an individual component quality process. At a high level, we focus on robust design with solid margin, manufacturing, and to complement all of that are high-reliability components. We have one process for all right now, but we clearly see the need to differentiate depending upon the use case.

What’s involved in that process?
We have a preventive process and a reactive process. The preventive piece starts with partnering with the right suppliers, component selection for the [bills of material].

Does that require new attention to suppliers?
For the past five years, we’ve had a process to have very tight alignment across our critical technologies. So for our ASIC supply base, SERDES and PHY, we have deep partnerships with a handful of companies that we really rely on. We count on them to make investments in areas where we have need, and to do that is we have to be articulate about what our future holds. The payback for that investment—and at times it’s a significant investment—is we reward them with new business. This has borne a lot of fruit in terms of major technology transitions for us. It also has given us access to intellectual property that has enabled our products.

Does complexity from low-power designs change anything?
Nothing is radically changing in terms of complexity. We embed a lot of differentiation and intellectual property in our ASICs. Each technology node has its own transition. The transitions of late have not been as severe. As we move to 32nm and beyond, the complexity curve goes up. The issue around power and reliability is definitely more of a challenge, and it’s something we’re spending a lot more time and energy trying to get ahead of. We’re using techniques that emphasize power reduction in our ASIC designs that are probably commonplace in handsets, but they haven’t been as prevalent in switches and routers.

These are techniques like power islands and various on/off states?
Yes, multiple Vt’s and clock gating and making use of techniques to optimize for power. We had the luxury in the past of creating an architecture, and whatever the power below was it was acceptable and we designed the system around that. We’ve flipped that around now to where we start with a system-level power budgeting process that then drives down to the individual boards and the individual components.

Why is Cisco involved that deeply?
A lot of it begins with our customers. There’s also a green component. We started to do things like embed into our requirements documents, which define the deployment of the product way up front, more considerations around green and sustainability. These are things like considerations for high-efficiency power supplies for our ASICs and recycling at end of life, which are things we never built into creation of a new product.

Is it all ASICs, or are you moving into programmable chips and SoCs, as well?
In terms of the effect we can make on the system-level, it’s largely ASICs. We are also probably one of the largest users in the world of PLDs. But optimizing in those areas doesn’t make as big an impact on the system level as we can with ASICs. And as far as SoCs, the lines are blurring between SoCs and ASICs. The distinction we make there is we control the designs.

In the low-power world, there’s so much complexity that debugging the chip is becoming more difficult. Is that a problem?
Clearly. At every process node there is a shift. Given the complexity of the ASIC designs we have, this isn’t a new phenomenon. We’ve been dealing with power modeling, signal integrity, multiple Vt planes on the same chip, some of the interaction between substrate design and chip design.

What node is Cisco at?
We’re at 65nm. We’ve launched a host of 40nm designs. We’re on an unusual schedule, though. A lot of companies will launch designs at the lowest power process because they want the learning early and the ramp to volume is faster. We require performance and lower power, so it’s really a function of where the IP qualification is and how far behind the process schedule that is.

Does Cisco do its own designs?
We do most of the ASIC work ourselves. When we enter into an SoC joint development it can be shared, where pieces are done by third parties and our suppliers and pieces are done by us. In most cases we’ll handle the stitching of the chip. We’ve outsourced that a few times, but it’s pretty rare.

Cisco doesn’t have its own fabs though, right?
No. We outsource the fabrication to the traditional players.

Does that mean you’re subject to the more restrictive design rules?
Yes. We will engage with more traditional ASIC players that have something unique in their flow, but we’ll usually wrap the design around their constraints. We use their design rules and libraries and we follow those, depending upon the supplier we’re working with. A few years ago we did all the intellectual property and back-end work. We were our own fabless chip supplier. It played out well for a couple nodes, but when we used beyond 90nm commercially it didn’t make sense.

As you push into 32nm, do you foresee more issues with quality and reliability?
For reliability, absolutely. IBM is very fearful that the techniques that have served them well for predicting the lifecycle of a product will be affected by pushing the voltage curve. Our products should last for 10 years. We’re certainly aware of the risk and we’re working with our key suppliers to learn what we can as we venture into that node.

Will you necessarily move to the next node as quickly as in the past?
I think so. It’s hard to tell how much is an aberration based on the current economy or Moore’s Law. We are certainly starting fewer designs because we can make use of the technology to pack more functionality into devices. The devices are more complex. When you look at future nodes, though, the learning curve is going to get steeper.

One last question: How does all of this affect your make or buy decision?
We’ll use as much outside content as possible for the areas that aren’t differentiating. If we do a good job articulating to our partners what we need and it works for them, we’ll usually get the investment we need. There are some cases where we need a capability that’s core to us, so we’ll partition those off. As a company, we’re also starting to get into some adjacent markets. We’ll use whatever is available to get us into the market quickly. The benefit of leverage or cost or scaling that custom silicon can give you isn’t a factor for jumping into a new market.