Big Shift In Multi-Core Design

System-wide concerns in AI and automotive are forcing hardware and software teams to work together, but gaps still remain.


Hardware and software engineers have a long history of working independently of each other, but that insular behavior is changing in emerging areas such as AI, machine learning and automotive as the emphasis shifts to the system level.

As these new markets consume more semiconductor content, they are having a big impact on the overall design process. The starting point in many of these designs is data, not the hardware or software, and the main goal is figuring out how to process that data faster using the least amount of energy. What gets done in hardware and what gets done in software isn’t always clear at the outset, and it can change as the design progresses.

The perception always has been that a rift exists between hardware and software developers, which has led to an ‘us and them’ mentality, often resulting in finger-pointing when problems arise, said Colin Walls, embedded software technologist at Mentor, a Siemens Business. In contrast, working as a single team can yield great benefits, including:

  • Early access to hardware. As designs get bigger and more complex—an understatement with multi-core systems—software development needs to start earlier to be completed on schedule. Software developers can make more progress faster if they can run their code on something that looks like final hardware. If the hardware team shares its simulation models, software can be verified in much more detail long before real hardware is available.
  • Debugging. Shaking out the errors in multi-core software is extremely challenging, requiring sophisticated debuggers or analysis tools. These tools can only be used if the appropriate connection facilities (JTAG, etc.) are included in the design. This is not a huge overhead, but it needs to be considered early in the hardware design, so cooperation between the teams needs to begin at the start of the project.
  • Power management. Most modern designs have tight power consumption constraints, either for battery life preservation or environmental considerations. Power management has become a software issue, as only the software “knows” what resources (peripheral devices, CPU performance, etc.) are required for the task in hand. To implement software power management, the hardware needs to accommodate software control of peripheral blocks (allowing them to be turned on and off), dynamic voltage and frequency scaling (to tune CPU power) and provide access to the CPU’s low-power modes. As design for power must be considered from Day 1, early cooperation between teams is vital.

More cores, more confusion
The shift to multiple cores, whether those are multi-core or many-core designs, requires much more cooperation between these disciplines, and that will only become more obvious in the future.

“Multi-core means I have several processors on a single chip, and this is something that we already have had for years now,” said Xavier Fornari, product manager at ANSYS. “Your PC has several cores, and people already developed applications for that. Now, the trend is twofold. First of all, there will be no more single-core processors. Foundries will stop doing that production because there is no means to gain more power for computation by using a single core. The trend for years to come will be multi-core, for sure. That is okay for your laptop, and that is okay for video games. This is totally different when you talk about safety-critical applications, because then one must ensure that the complete electronic system, including hardware and software, will work properly.”

Engineers in automotive, aerospace and defense already are trying to figure out how to make sure the full system will work properly, he said. “In addition to that, there are now many-core platforms. The difference is that multi-core may be five or six cores. Many-core has more specialized and more computational units. They often go to 256 or even more processing units, so this is a totally different platform. Programming this is something really complex on one hand. On the other hand, you have to demonstrate that the application will work properly, and no one knows exactly how to achieve this currently.”

Still, while it makes sense for both hardware and software teams to work concurrently, side by side, throughout the entire design and verification process, that rarely happens.

“In automotive, there is a hardware team and there’s a software team and they don’t talk to each other—or rather it’s very much disconnected in time,” observed Max Odendahl, CEO of Silexica. “On the hardware side at the beginning of the project, based on Excel, I select my target hardware platform. At some point I start implementing code, and I’ll worry later about whether this will fit or not fit or whether I need to change the algorithms. For sure there’s a big gap there, but there’s even a big gap within the software team. There’s a Matlab algorithmic team, which couldn’t care less if what they are writing is actually implementable on any given hardware because they’re incentivized to have the amazing demo, the amazing algorithm, the best vision algorithm, the best recognition, the best whatever. Then, they literally throw it over the fence to the multi-core implementation team, who goes back to them to say, ‘You know what, it doesn’t fit.’ Then the back-and-forth includes, ‘You just don’t know how to implement it,’ ‘You’re not a very good implementer,’ or, ‘No, your algorithm is not possible to be implemented on any given hardware.’ So even within software teams we see a big divide.”

The shift from C to C++, from static scheduling to dynamic scheduling, from bare-metal/super embedded, real-time operating systems to POSIX, and Adaptive AUTOSAR, among others, is all happening at the same time. That widens the gap between what is being implemented versus what is running on the hardware just because there are several layers in between that.

“For example, there’s operating systems, middleware, and the end user application,” said Odendahl. “If you look at a power train, it’s much more embedded, much more bare-metal, static scheduling, so I’d probably use Excel. But there are big differences. What’s interesting is that the automotive industry seems to be going for further abstractions just because it’s so difficult. ‘I have the middleware, there’s so many different sensors I need to combine, I need to connect to the cloud, and I need to make it updatable. I want to go for isolation so I can have separate teams implement it.’ The problem, though, is then at one point you need it. That’s good for joint development, but it’s not good for putting on the hardware. It’s not good for making sure it performs according to your requirements for all the different scenarios. What else is the industry doing? ‘Oh, I run a billion miles in the cloud, but all of that is functional testing of my algorithms again.’ Is that implementable in hardware? Is that 50% utilized of my hardware or 100%? Nobody really knows. ‘We’ll do that once the great demo is actually working.’”

Minding the gap
There are indications that mindset is beginning to change, particularly in markets such as automotive where systemic complexity extends well beyond a single chip or even a single vehicle.

“In the past, if you were a software engineer, the thinking was, ‘I have this chip available. Here’s what I can produce with my software,'” said Kurt Shuler, vice president of marketing at Arteris IP. “Nowadays, especially in the ADAS side of things that have an AI component or some kind of programmable object detection for the ADAS functionality, or an AI chip—whether it’s for the data center, edge, inference or training—the thinking has shifted more to system-design decisions. If this is designed with this given set of software algorithms, it is clear what needs to happen at a system level from the hardware and software point of view. At what level of detail should I optimize this hardware for the particular software I expect to run? This means the hardware and the software are now much more tightly integrated in those use cases than they probably have ever been unless it’s a very detailed embedded application. So now, in the early stages of design for these types of chips, whether it’s the autonomous driving chips or the AI chips, the software architect is in there, too.”

This is a definite sign of progress. “Before, they didn’t care,” Shuler said. “The layer/API between hardware and software is becoming less generic and more specific for those kinds of use cases, solving those kinds of problems. What that means, though, is there are software guys who went to Stanford and trained on Java script and have no idea what a register is. Then there are hardware guys who have no idea what a hypervisor or object-oriented programming is.”

And while there may not be total agreement about the extent of the current hardware/software gap, it has been shrinking due to advances on the tool side, particularly in software prototyping. “It’s not that they work together so much now, but the development of the software can go on while the development of the hardware is occurring because of the virtual environments that have been created to allow teams to build their whole SoC in a virtual environment and do so in a way that’s cycle correct,” said Mike Thompson, senior manager of product marketing for ARC processors at Synopsys. “There’s a lot of need to work through that virtual environment to develop the software, to make sure that the software interacts appropriately with the rest of the system. The software development often starts at the very beginning of the design, even when the architectural development is still happening, because the modeling they’re doing in the virtual environment is critical to how the hardware engineers are going to put the system together.”

Beyond automotive, multi-core CPUs are becoming prevalent across all industries, driven by a couple of factors. First, there is a need to scale SoC compute performance, but that isn’t possible with a single-core CPU. Second, workloads can be efficiently consolidated from smaller CPUs and MCUs into larger multi-core SoCs.

“In automotive, there is a trend to replace electronic control units (ECUs) with multi-core SoCs in what are referred to as ‘domain controllers’ or ‘vehicle computers,’ said Robert Day, director of automotive solutions and platforms at Arm. “But in markets that are not enterprise-led, software can struggle to efficiently keep up with an increased number of cores, and the software platforms used in industrial, automotive, IoT, medical and other ‘real-time’ markets are not always optimized for multi-core systems. Hence, they cannot realize the performance gain from multiple cores without some redesign. One way of handling multi-core SoCs is to build systems with true symmetric multi-processing (SMP) operating systems.”

Another important layer in an embedded platform software stack is embedded virtualization, Day noted. Here, embedded virtualization, sometimes referred to as a bare-metal or Type-1 hypervisor, allows systems integrators to architect systems that can run multiple operating systems or applications on the different cores in a multi-core system.

Processors providers such as Arm and others enable the efficient use of hypervisors by providing hardware virtualization features across different architectures, which means the hypervisors can be lightweight, capable of real-time performance, and be safety-certifiable. It also means there can be a mix of real-time and general-purpose OSes, mixed levels of safety criticality, and a mixture of proprietary, legacy and open-source software on a single multi-core CPU. This virtualization approach allows software to help bridge the gap with latest generation of multi-core SoCs, he said.

Still, the fact that any gap remains is surprising given the fact that the hardware development process is fairly well understood at this point, and that software is an integral part of it.

“There are still challenges in putting together dozens or even hundreds of IP blocks, closing timing and so on, but generally we know how to do that,” said Rupert Baines, CEO of UltraSoC. “Similarly, the software development process is fairly well understood at a unit level. But today, most of the problems in system development come when we’re integrating the hardware and software together. That’s when the problems are first seen, and that’s where they first become evident because there are dependencies you simply cannot predict pre-silicon, even with today’s most powerful emulation platforms. You can’t emulate or simulate enough cycles to cover every possible corner case. Real silicon running real code will always throw up unforeseen phenomena.”

Many of these bugs are not show-stoppers, Baines stressed. “They may be dependencies that cause the system to perform a little less well than it could or should, and they are often security concerns that come from hardware-software interactions. This gets much worse with multi-core designs, simply because there are so many more permutations. There is literally a combinatorial explosion.”

The software dilemma
It may look worse from a software perspective. “Software development tends to be much more complex, which is due to the degrees of freedom you have when you’re developing software,” said Sam Tennent, senior manager, R&D at Synopsys. “If you’re designing a piece of hardware, there are maybe a few different architectures you can pick. With software, it’s almost infinite. You can solve the problem in many ways,”

This is particularly evident in multithreaded software development, where there are not a lot of standard ways of doing it at the moment. “While some of these issues are being at least partly addressed in new areas and new software, such as standard frameworks for AI and graphics, it doesn’t solve the problem of legacy software, which is the big issue,” Tennent said. “Many, if not most, engineering teams tend to be re-using software from previous projects in their new projects because no one wants to write all of that, and there’s not a good way at the moment of taking the old software and converting it to a multi-core environment.”

Specifically, today’s engineering teams want to speed up their applications, and the best way to do that is to parallelize operations. But unless the software was developed with that in mind, it’s very difficult to see how to do that. “The engineers who originally designed the software are not around anymore, so there may not be a great understanding of how it all hangs together and how it all works,” he noted. “So you may be faced with redesign to get a reasonable degree of performance increase in the multi-core environment. And, it’s not just these issues happening in the industry. A lot of other complexities are coming in as well. A whole other aspect of the problem is that, as well as having multi-core CPUs, many of these are distributed. If you look at a modern car, it has lots of distributed processing islands and each of these is multi-core. So there’s not only the problem of converting your software. There’s all that connectivity to deal with, as well, which makes the whole thing even more complex.”

Given the number of disparate approaches, there may be room for standardization of some of these practices. Frank Schirrmeister, senior group director for product management and marketing for emulation, FPGA-based prototyping and hardware/software enablement at Cadence, said this could start with standardizing a description of the target architecture—the topology—to figure out how the application or function maps into it.

“It makes sense to standardize the topology of the architecture and how you write your function on top of it to map it,” Schirrmeister said. “A good example is the work happening within the working groups of the Multi-core Association. If you standardize the way the software (functions) map/communicate with each other, you then can map this with the interconnect typology into the underlying hardware architecture. As soon as you standardize both, you have the ability to write compilers and have a description language, where you really differentiate by the quality of your compiler and what goes into the hardware.”

Future trends
In addition to looking at systems holistically and virtually, on-chip analysis may have a role as well, said UltraSoC’s Baines, because it provides real-time insight into actual system performance. “You can see the software running and exactly how it interacts with hardware. Some of the main benefits of building hardware monitoring into an SoC are felt by the bring-up and software teams. It’s a practical way in which the hardware team can assist the ‘downstream’ teams to do their jobs better—even at the cost of some silicon. So it’s a question of someone—the CTO, the chip architect or whoever—investing in one part of the process to get massive improvements somewhere else along the line.”

Looking ahead, Thompson sees architects getting more control over managing the device, whereby an architect or a team of architects puts the device together and then a CTO drives the design, overseeing both the hardware and software. “The architectural development of the chip is becoming more important than the hardware/software implementation,” he said. “If you don’t get it running at the architectural level, you might as well not bother with the rest of it. With some of the systems that we see in devices today, the building blocks are themselves major systems. And now you’re combining that with 10 or 12 other major systems to create the total functionality of the chip. This is especially true if you look at something like a vision chip or an AI device. The level of a processing complexity is just staggering.”

Related Articles
Looking Beyond The CPU
While CPUs continue to evolve, performance is no longer limited to a single processor type or process geometry.
Making Sure A Heterogeneous Design Will Work
Why the addition of multiple processing elements and memories is causing so much consternation.

Leave a Reply

(Note: This name will be displayed publicly)