Taking benchmarking to the embedded level.
You can’t optimize something without understanding it. While we inherently understand what this means, we are often too busy implementing something to stop and think about it. Some people may not even be sure what it is that they should be optimizing and that makes it very difficult to know if you have been successful. This was a key message delivered by Professor David Patterson at the Embedded Vision Summit.
His keynote, entitled “A New Golden Age for Computer Architecture,” started along the lines of the Turing Lecture originally delivered in 2018 and also given at DAC that year. A summary of that can be found here. He covered the CISC versus RISC battle, concentrating on the ways in which the two architectures could be compared in terms of performance. He talked about how MIPS is an easy metric but cannot be used to compare different architectures. He talked about the pitfalls of using programs that are too simple, or even worse — benchmarks that only try to match characteristics of real programs, such as Whetstone and Dhrystone.
Instead, he wanted to see a suite of real programs that are updated over time, and that implies there must be an organization behind it. “It’s hard to make progress if you can’t measure it,” says Patterson. “And one of my sayings is, for better or for worse, benchmarks shape a field, and there are examples where there are bad benchmarks.”
The talk goes on to show the need for domain-specific architectures and concentrates on architectures for machine learning. Again he looks at how to measure performance, noting that TOPS has even less meaning than MIPS, that it is not useful to show performance based on old networks, and that a suite of real programs is required that are updated over time. “You need to update it but maybe more frequently for machine learning to stay up to date,” says Patterson. “So you need an organization to sustain it, which was MLPerf.”
At this point, the talk goes into new territory. “So what about embedded computing. So far, MLPerf has been concentrating on the data center for training, and then for large mobile devices like phones. But ML is becoming important even for more deeply embedded applications, which often don’t need the highest performance. They just need to be fast enough.”
He talks about the need to focus on this. “People are designing hardware and software for antiquated things using the wrong benchmarking technology, so this bothered some of us so much we decided to try and fix it. We have created an organization called Embench, that tries to be better for embedded computing.”
Much of this goes back to comparing ISAs and the role that compilers play in the overall performance comparisons. Compilers have different optimization levels, and these often trade off code size for performance. A more important tradeoff in the embedded world is where memory is more limited. He compared GCC to LLVM, and also compared an Arm core to a RISC-V core when using the same compiler. What this indicated was that the compilers are having a bigger difference than the ISAs.
“We think the lessons for embedded benchmarking is that code size has to be shown with performance,” he said. “So far, none of the embedded benchmarks include the code size to get meaningful results — the importance of having a geometric standard deviation, as well as geometric mean, and it looks like the more mature architectures have more mature compilers which helps them. But newer architectures will catch up.”
What he didn’t mention are other metrics that may be equally important, such as power or energy, the impact that the memory subsystem can have on performance, and that edge devices are perhaps more application-specific than any other domain. That means finding the right set of benchmark programs will be extremely tough. Vendors will only want to pick those programs that align with their target audience.
Leave a Reply