Trading Off Power And Performance Earlier In Designs

Complexity, tighter schedules require deeper understanding throughout the design flow.

popularity

Optimizing performance, power and reliability in consumer electronics is an engineering feat that involves a series of tradeoffs based on gathering as much data about the use cases in which a design will operate.

Approaches vary widely by market, by domain expertise, and by the established methodologies and perspective of the design teams. As a result, one team may opt for a leading-edge design based on 7nm CMOS, while another might opt for 22nm FD-SOI.

But the tradeoffs are becoming much more complicated as chips are developed for specific markets and applications, and they need to happen faster than in the past. While performance is critical in an AI device, reliability is an equally critical metric in automotive and medical applications. And those priorities can shift back and forth as different architectures gain traction in different markets and new requirements are added.

“This is already visible today through the use of optimized accelerator hardware,” said Andreas Brüning, a member of the integrated sensor electronics group at Fraunhofer IIS/EAS. “In the longer term, robust systems will be based on completely new hardware and software architectures.”

In many cases, this will require new processor architectures that are highly optimized for specific tasks and much more power-efficient, Brüning said.

Which comes first?
How to get to that point requires a combination of ground-up and top-down planning, and there is much discussion about which is the best approach.

“Historically, there was a notion that we may be able to predict everything top down, and then we go from there,” said Frank Schirrmeister, senior group director of product management at Cadence. “What has happened in reality is that things meet in the middle. People do make predictions at the higher level, starting with things like architecture descriptions, using cues and doing very high-level descriptions. Then you come to the point where you want to do a little bit of software in the mix, so you start using things like Arm fast models in that context, going to real software, going away from the pure software models. Then you refine further. As such, the fundamental separation of concerns is that at one point, you really switch from the high-level, top-down description script, to a bottom up confirmation of performance.”

As with most methodologies, the direction depends on which way the design team wants to go, and it can have a big impact on decisions about which type of DRAM and which I/O protocol to use. But even once those choices are made, there is no quick way to determine which approach is better.

“It turns out that the protocols have become so complex that a lot of those performance aspects can really be only confirmed when you use the actual implementation model,” Schirrmeister said. “What that means is that in order to have meaningful data to go through it, you do some elements with just simulation, and you parallelize simulation as much as you can. But then you come to the point where you actually have to run the same analyses in emulation, and look at longer cycles of emulation and prototyping where you can run larger payloads to get the performance properly analyzed. It’s a combination of a top-down approach, which is really done by the system guys, sometimes by the software guys, and the software guys always say, ‘Why are you hardware guys so weird? Why can’t you get me a model which is super fast and super accurate? That must be possible somehow, right?’ They are asking us to defy gravity, and that doesn’t really work. That’s why we bring in other technologies like emulation and prototyping.”

Put simply, the challenge is to make these decisions in the context of a growing number of sometimes conflicting factors.

“If you want to look at the performance of things like PCI Express, remember that a lot of the performance aspects are no longer just in the block,” Schirrmeister said. “They become systemic. How does that block actually interact? Do I have the right software access at the right time? What does cache currency do to me? Then it becomes a question of having the right test available and being able to write the right tests. That’s what it means to ask the right question. This is also where Portable Stimulus comes in to automate the generation of these testbenches for the performance analysis or for the integration. As it is, it’s really a blurry line. Remember in the good old days it all seemed so clear and straightforward? You do everything top down, optimize your performance, refine and implement. But the line is blurring between analysis, optimization, and then later on, performance validation, because you want to stress test that the performance analysis you did was actually correct. What is needed is an infrastructure in which you can combine all these things.”


Fig. 1: Verifying a system using a variety of technologies. Source: Cadence

That includes performance analysis tools, the ability to write the tests, the ability to debug the right portions of the design come into play, followed by having the variety of execution engines underneath the static ones with formal. And finally, it includes simulation, emulation and prototyping.

Consider memory controllers, for example. It’s well known that flash manufacturing becomes more challenging at smaller process geometries, but there is no single methodology for solving those challenges.

“The easiest way to mitigate buggy memory is sophisticated software for error correction and data compression,” said Chris Jones, vice president of marketing at Codasip. “But these algorithms are complex enough that they require significant processing performance, which can be problematic for low-power and cost-sensitive applications like SSDs. Just adding a high-performance embedded core is not always practical from a power and economic standpoint. So many engineering teams are now looking at custom processors with instruction sets tailored to these correction and compression routines, and wide, specialized interfaces to minimize bus traffic.”

Making tradeoffs
In developing markets, such as edge computing and devices that will reside on the edge, the tradeoffs are murky, at best. It isn’t even known yet whether part or all of the device will be always on, what voltage it will run at, or how to architect various power domains within the device.

“There was one company developing a Bluetooth Low Energy application, where the company wanted to use it in continuous read mode. Paul Hill, director of product marketing at Adesto Technologies. So they were constantly accessing the device in read. They were fetching software, floating it into cache and executing the code. For that reason, they have a power consumption issue in read mode. But occasionally, when the device goes dormant, they want to turn off the memory device and they want it to go into an ultra-low power mode. The problem with that is the ultra-low power mode has a longer wake up time, so when it goes active again there’s a longer latency before they can get the next read instruction.”

Compare that to solid state drives, where end markets are already well defined. The number of tradeoffs may be even larger than for developing markets, but the goals and methodologies are better defined.

“Say I’m computing on the device, very simple tasks, which are usually read and write,” said Jean-Marie Brunet, senior director of marketing for emulation at Mentor, a Siemens Business. “I’m looking at the IOPS (input/output operations per second), which is how quickly I can stabilize my device and then do the function on it. That is still the major function of storage devices, as well IoT devices. But what is different is they have to perform a tremendous amount of computation. This is important because when you plan for performance, when you plan for quality, they have to measure very early on some key metrics of performance way before silicon is done. If you look at the main requirement for the evolution of SSD, which is CSD, they have to measure, deterministically, latency and performance metrics over what the consumption is. This is actually how their end customer will measure if the devices are performing correctly.”

Because SSD is highly competitive, vendors have to provide performance metrics early in the development cycle. “If you provide metrics very early on, you need to have the environment to do those performance metrics,” said Brunet. “So, you need to provide this system, and one very important thing they need to provide is, deterministically, whatever it’s going to calculate or evaluate pre-silicon has to be very similar post-silicon. There’s a lot of requirement on performance. And there’s a lot of requirement on quality. Devices that are moving into the data center, on the edge of a data center, or the cloud, are highly dependent on performance because they’re computing a massive amount of data. The performance requirement is extremely important. They need to show this extremely well, and that’s what most of them do today.”

Quality here is important, as well, because the SSD/CSD controller sits at the center of information being passed back and forth so there cannot be reliability issues. To improve reliability, it is increasingly common to follow the product lifecycle, and at any point in time, insert additional reliability into the process. So at 7nm and below, this could include bigger buffers, for example, because there is so much logic available.

Power, performance, reliability
Given the interconnectedness of power, performance and reliability, the role of power must be addressed, as well. “Reliability is determined by measuring mean time between failures — that’s how you start,” said Shailander Sachdeva, AE for the power products in the verification group at Synopsys. “Then, did you functionally verify the device enough? Did you verify all the vectors, all the scenarios? Did you miss anything? Over time, most design teams have figured that out, but for many years, the focus was mostly on performance — getting the clock speed up, getting the design to megahertz then gigahertz. Now again, the focus is back to cramming more and more transistors, with power usually an afterthought. Later in the design flow, somebody will typically check whether the design meets the power budget by running a vector or scenario through a sign-off level tool at the gate level, but unfortunately by that time it’s very late. Tweaks can be done here and there, but not major changes because it’s so late. Plus, there are market pressures, so they must deliver what they have.”

On top of that, the number of options for I/O and various protocols in the market means that unless a device is very targeted, they typically have to support multiples of everything. That puts a burden on the entire supply chain, from the systems companies down to the IP suppliers, and it can add yet another set of tradeoffs.

“The number of metrics that we have to support, the number of protocols we have to support, definitely is going up,” said Hemant Dhulla, general manager of the IP Cores Business Unit at Rambus. “That adds complexity. It also adds risk. By definition, anytime complexity goes up it adds risk, so validation becomes more important. This is why it is very difficult for established tier-one system companies or chip companies to go with startups. Startups may have new technology and a more unique way of doing things, but if you haven’t been built up your stats over a few years, it’s very hard to validate it.”

So once again, design teams are caught between extremes, and that carries over to measurements in the design flow. “One extreme is to measure at the very end of the design cycle,” said Synopsys’ Sachdeva. “But the first thing they do typically at the very beginning is a very rough approximation using Excel spreadsheets. ‘How many transistors? What’s the technology node? How much is the traffic running? ‘Roughly, this would be my power budget. Let’s see whether we’re meeting that.’ It’s a very vague way of doing it.”

This is like getting to the end of the design cycle when one of the design team may ask for a power number to see whether it meets the budget or not, Sachdeva said. “If it doesn’t, they’ll have to reduce the frequency to actually reduce the performance, and we are aware of some companies who do actually reduce the performance. They do this very often because the current technology is exceeding the power budget so they reduce the frequency because again, the frequency, voltage, etc., increases the power, so they reduce the frequency and deliver the product.”

Sachdeva typically advises rather than doing this at the extremes, do something in the middle. “First of all, start with a proper estimate rather than a rough cut figure. Use tools like architectural power exploration, and/or RTL power exploration to get a better idea. That will give a more realistic figure on whether you are going to meet the budgets are not. This may be a very approximate number, but it’s still much better than a spreadsheet or a back-of-the-envelope estimation. It gives a much better idea of where to start.”

The other extreme is typically when power is measured in one scenario, or one particular situation, such as a high-traffic or low-traffic scenario. “While that doesn’t cover the whole gamut of situations the design will be exposed to, to get a more realistic idea you need to do much better in terms of vectors you are throwing at the power estimation tool, or for that matter, the chip. This is where emulation-based power analysis comes in the picture, which can be done at both RTL and gate level. This is more like a realistic scenario. Instead of running in a test condition, you’re actually running, for example, an automobile. Instead of running on a test track and getting the speed and performance of the engine, this time you are running on a real road. You actually take it out, and you see whether it actually performs the way it’s supposed to. And while it is performing, it can tell you what the mileage is.”

This is important, he stressed, because it’s very likely it will encounter more bugs under those scenarios. “Even if this is performed at gate level, which is very accurate, it can be good enough to create a trend analysis. That is good enough to identify chinks in the armor — the pieces of the design that are consuming too much power and impacting performance.”

Related Stories
Safety, Security And PPA Tradeoffs
The number of critical design metrics is expanding, but the industry still grapples with their implications.
Taking Energy Into Account
In the IC design flow, energy is just as important to consider as power.
Reducing Software Power
Software plays a significant role in overall power consumption, but so far there has been little progress on that front.
Target: 50% Reduction In Memory Power
Is it possible to reduce the power consumed by memory by 50%? Yes, but it requires work in the memory and at the architecture level.
Using Less Power At The Same Node
When going to a smaller node is no longer an option, how do you get better power performance? Several techniques are possible.



Leave a Reply


(Note: This name will be displayed publicly)