Are You Leaving Performance On The Table? Here Is One Sure Way To Find Out

As system complexity grows, so does the difficulty of correctly configuring the software stack.


Compute platforms are always hungry for more performance. This is a fact that we simply cannot escape. Whether you are targeting high performance computing, IoT, mobile, or the automotive market, you need to unlock the best performance for your specific workloads. This relentless quest for performance comes with an unwelcome side effect: system complexity. As hardware becomes more capable, the amount of software it runs is growing at an even more rapid pace – sucking up precious compute resources. Consider this: we now talk about software “stacks” – layers upon layers of software sitting on top of the hardware: you have the OS, servers, hypervisors, compilers, databases, and of course your end applications, to only name a few. These software stacks are becoming ubiquitous, and you are finding them in new places such as in the embedded world. See the example of the automotive software stack, as described by Deloitte Insights below.

Software of course can be configured. However, this is a double-edged sword: configure it right and you will optimize capital spent on hardware and software; get it wrong, and chances are you will need to answer to unhappy customers expecting better performance. To add to the issue, as system complexity grows so does the difficulty level of correctly configuring the software stack. Not so long ago, you could configure your software by hand when your system had only a dozen parameters you could toggle… but now, it is a whole different ball game. Each layer of that software stack can have hundreds (if not thousands) of individual knobs that you can experiment with. Changing each one of these parameters – or a combination of them – could potentially have a dramatic impact on how your system performs. So, the stakes are high. With a solution space so mind bogglingly large (in some cases there are many more possible combinations than the number of atoms in the known universe!), how can you be sure that the configuration you came up with is the one that would unleash your system’s best performance? How do you know you’re not leaving any performance on the table?

The brute-force answer to that question is: try each and every permutation of parameters to come up with every possible system configuration, run the system through a benchmark representative of your target workloads, and measure your performance metric (e.g. run time, power, memory usage). At the end of this experiment, you will know which configuration will yield the best performance. Easy, right? Except there is just one problem – time. As fool-proof as this methodology is, it is just not practical. Remember: the solution space is incredibly large and it would take a nearly infinite amount of time to run all the options.

A seasoned performance engineer would argue that you do not need to run every single possible configuration through the benchmark, and that an expert would know which knobs to play with in order to positively impact system performance. There is some truth to that, however even for the most talented experts the question remains: how do you know for certain that the configuration you selected is the very best?

And we are not even addressing the effort it takes to run performance optimization experiments manually. Anyone who’s done some benchmarking and optimization work can attest that this is an arduous task, that typically takes weeks if not months of constantly “babysitting scripts.”

Now you might ask: why could we not just automate that process? And you would be right, as that would certainly help make the whole process more palatable. But process automation by itself does not really help reduce the time it takes to identify the best possible configuration. We also need smart algorithms that can judiciously select the sample points during the experiment and help quickly narrow down the solution space.

Fortunately, we do have access to this type of technology today. Synopsys SLM Optimizer software is a solution that leverages machine learning and AI to provide an automated methodology for performance optimization. The tool enables you to explore the solution space of possible configurations and find which combination of settings and configuration parameters will provide optimal performance. The amount of effort required to get this software up and running is minimal: define which knobs you want to tune, specify how to invoke your workload, and where to read the metric you are tracking. Run the tool, then sit back and watch it go to work. Optimizer’s algorithms will select which configurations to test and experiment with, while minimizing the number of sample points that are needed to come up with an optimal system configuration. In fact, the algorithms are so good at this type of task that you can often see performance improvements after just a few sample points. This efficiency means that the whole experiment may take only a few hours (depending on your workload of course), whereas in the past it may have taken engineers days if not weeks to complete that same analysis.

Best of all, Optimizer is agnostic to the system it is trying to optimize. This means that you can unleash the technology in many different ways. For instance, Optimizer can be used for compiler flag mining when you are trying to find which combination of flags can yield more performance from your compiler of choice beyond simply using the -O3 optimization flag. Or you can use Optimizer to improve your infrastructure – optimizing server configurations to your specific workloads to extract unclaimed performance, which in most cases directly translates into significant savings.

In a nutshell, you can optimize pretty much any system that can be configured and use Optimizer to experiment easily and quickly, and ultimately find out which configuration will maximize system performance – leaving no doubt that you are indeed running your compute resources at their optimal settings. Get the most value out of your silicon, system, and software.

In a future blog, we will dive into a specific use case and get our hands dirty with Optimizer to find out what it really takes to optimize for performance.

In the meantime, you can find more information about Synopsys SLM Optimizer here:

Leave a Reply

(Note: This name will be displayed publicly)