Software and hardware interdependencies complicate debug in embedded designs. New approaches are maturing to help reduce debug time.
Debugging embedded designs is becoming increasingly difficult as the number of observed and possible interactions between hardware and software continue to grow, and as more features are crammed into chips, packages, and systems. But there also appear to be some advances on this front, involving a mix of techniques, including hardware trace, scan chain-based debug, along with better simulation models.
Some of this is due to new tools, some is a combination of existing tools, and some involves a change in methodology in which tools are used in different combinations at different times in the design-through-manufacturing flow.
“Tracing and capturing the internal signals based on trigger events, and storing them in a trace buffer that can then be read from a debug port, allows seamless collection of data without disrupting the normal execution of the system,” said Shubhodeep Roy Choudhury, CEO of Valtrix Systems. “A lot of hardware instrumentation may be required, but placing the trigger events near the point of failure can yield a lot of visibility into the issue.”
One of the big challenges revolves around the software that runs on an embedded. “To test that software is really hard,” said Simon Davidmann, CEO of Imperas Software. “I remember as an engineer trying to get access to the prototypes. You have to timeshare them because you never have enough. Use of simulation changes the ability to develop software for the embedded world because if you can build a good simulation model of your platform, it means you can test it and run it and put it in regression farms, implement continuous integration, etc. You get much higher quality software because you’ve got much more access to verifying it. But when it comes to debugging, where it is today with simulation, it’s much more efficient than debugging on a prototype or hardware, and you get a few big benefits — versioning, controllability, observability, and the ability to abstract and stop where you want — and then see everything. With a simulator, you get observability that you couldn’t get otherwise.”
This gets much more complicated as different types of chips and memories are added into designs in order to improve performance and reduce power.
“In heterogeneous systems, a typical system will have IP blocks from multiple different vendors, — as well as, in some cases, IP blocks provided by internal groups within the SoC developer — plus IP blocks from non-traditional IP suppliers,” said George Wall, product marketing director in the IP Group at Cadence. “If they have licensed a hardware module from another SoC company, each one has its own set of standards, its own implementation. So it’s a very heterogeneous type device. How do you integrate all of that at the debug level? This is a challenge. We do support open standards on the debug side, as do a lot of other commercial vendors. But the internal IP, and IP from non-traditional places, may be sourced from companies where those standards may not have been supported.”
This adds a whole other layer of complexity because the engineering team is dealing with some black boxes. They don’t know what’s inside them, and they don’t want to tinker with them very much. “They just want to make sure there is visibility externally so they can see what’s going on,” Wall said. “There really is no industry standard to say, ‘Well, here’s the right amount of visibility.’ So it’s a challenge.”
Fig. 1: Cadence/Green Hills tool integration for embedded system design. Source: Cadence
Increasing visibility
But new approaches can create new efficiencies, Davidmann said. “We’ve seen engineers who are trying to get their firmware to run with their operating systems, write traces and abstract things in order to monitor not just at the variable level or function level, but probe into the OS and watch what the OS is doing. They they can trace that. This means instead of getting a billion lines of instruction trace, they’d get a few thousand lines of function trace or scheduler trace, so they can watch what’s happening and visualize some of it at a high level. And then when things go wrong, they can drill down. For example, one user implemented assertions to monitor the OS, and if something happened it would fault. It kept a rolling buffer of, say, 10,000 instructions and 1,000 function builds in a trace. When it stopped, they could look back and see what had happened at that point.”
This can be done in hardware, but not easily. “You’ve got to build it all in,” he said. “With a simulator, engineering teams can write their own, extend it, and build their own tools in order to have better visibility, and they can put assertions in to monitor things better and point out bugs that may not be fatal but still impact performance. It’s about visibility and observability. If you use a simulator properly, nothing’s a black box.”
The big difference between debugging software applications running on generic hardware versus embedded applications is the unique hardware it’s running on.
“Generally, that’s something that’s been purpose built for that particular application, rather than a genetic compute node,” said Sam Tennent, senior manager for R&D at Synopsys. “What’s key there are the interactions between the hardware-dependent software, which is that layer right at the bottom where the software must be aware of the hardware. There will be layers above that, which are abstracted away from the hardware. We see interest in the layer that has to talk to the hardware and the specific issues that brings up, which are different from the issues that you might see with higher-level software.”
The debug engineer needs to know a little bit about hardware, he said. “They need to be aware of things like device registers, interrupts — all of these things that happen down at the hardware level that can affect what the software’s doing. They really need the visibility of not just what’s happening in the software domain, but also what’s happening in the hardware domain. And they need to be able to correlate these things.”
Virtual prototypes are one way to approach this. “The fact that virtual prototypes are using abstracted models of the hardware means you can get visibility into what’s happening at the hardware layer at the same time as you can see what’s happening in your software.,” Tennent said. “Typically you can correlate these things. You can look at an event in your software, or you can look at your software routines and see exactly how those are interacting with the hardware, for example, If you know the hardware brings up an interrupt, you can trace that, and you can see exactly what the software does in reaction to that. This is really useful when you’re debugging issues down at that level.”
Pure software simulators are still useful for debugging up at higher levels, but typically they are not used to model things like hardware registers, for example. “They have a high level API, which the software is using, but they’re not modeling right down at the register level,” Tennent said. “This means any issues that you have down at that level are not going to be picked up by something like these software emulators.”
Davidmann noted that when a processor model is encapsulated in Verilog, the engineering team can then debug using Cadence, Siemens EDA and Synopsys tools. “They can debug the software stack in our debugger and all in the one simulation. As they are single stepping, they can see the waveforms in the Cadence device, for instance, and can look in at the hardware and the software all in one simulation. It’s not one debugger because conceptually, there are signals and wires in the hardware, but it can be done in one run and be synchronized so that the engineer can click on a point in the software and see what the waveform was doing at that point in time.”
Cadence’s Wall advises engineering teams to think upfront about how to ensure the debug of the firmware running on the system. “Consider the types of interactions that will be occurring between the firmware and other devices in your SoC,” he said. “Think about how to gain visibility into those interactions. One common method at the CPU level is to implement tracing capabilities, where the trace output can be used to at least tell you what code was running when certain things happened. There also are a lot of things that need to be done at the system level to ensure the visibility of those interactions. Special visibility registers can be added to periodically check the embedded firmware that provides a state of the system. There are other techniques of implementing trace instrumentation in the other blocks in the system that can be controlled or enabled by the firmware running on the processor. So if it’s having difficulty interacting with one particular block, it can turn on the trace for that block, and then read the trace from a memory location to understand the problem.”
Improving low-level software debug
Depending on whether the application is at the bare metal level, or has an RTOS, can make a difference, too.
“The application based on these two branches are always debugged using some integrated development environment, which is the main entry point when you’re debugging something,” said Haris Turkmanovic, embedded software lead at Vtool. “How complex the debugging process is depends on how complex the integrated development environment is. If there is a well-developed integrated embedded environment, it will be much easier for to debug.”
So what does the process of debugging look like? “You first need to know what to expect,” Turkmanovic said. “If that expectation is not satisfied, you know that you have a problem. That expectation is based on various values, which are part of the memory. When you debug something, you need to go into the memory, look at the memory content to see how it’s changed, go step by step through your code. Each debugging process consists of iteration. You go through your code, step by step, watching the memory content to see if something behaves unexpectedly. Then you can catch the problem. Basically, you’re watching the memory to see if the content is written as expected. If it’s not, then you localize the problem. If you have a big system, you can divide it into parts and look at each part separately. This process can be easier if there is some kind of operating system, because embedded platforms that run embedded systems are very complex. They usually have a memory protection unit that allows you, for example, to divide a memory into multiple regions. If you want to access part of the code from one region to another region, the MPU will notice. The second approach when you have an operating system is to use the built-in functions, which will monitor the execution of your program. If you try to do something that was not planned, this function will be called, and breakpoints there can catch the error.”
Improving the efficiency of the debug process requires a systematic approach.
“The entry point of that systematic approach is to know what you expect and what the limits are,” Turkmanovic said. “If you don’t have a systematic approach to debug, and if you go into debugging just by guessing, you can go into an endless loop and the debugging will never end. If you don’t have an expectation, and if you don’t have a systematic approach, it’s very hard to debug. For example, if you try to get data by higher speeds than the system can do, you will always get a bug because you cannot do something that is impossible.”
In cases where automatically generated code and/or custom configurations are used, automation of processor core verification is particularly important. Formal verification techniques play a key role here.
“Formal verification provides faster runtime than simulation, allowing simulation licenses to be freed up for other tasks like integration testing,” noted Rob van Blommestein, head of marketing for OneSpin, a Siemens Business. “Set up is also much quicker and easier. RISC-V’s flexibility to create custom instructions creates a verification hurdle for simulation. Formal technology easily can be applied to verify custom extensions and instructions. Complete coverage of all corner cases can be achieved with formal with minimal to no effort in the development of the testbench. Unspecified behavior, such as undocumented instructions, also can be uncovered using formal. The engineering team will be able to understand coverage progress as it relates to ISA requirements throughout the verification process. Direct traceability of verification and coverage can be achieved.”
New techniques in formal verification technology also help verify that the set of assertions is sufficient to cover a RISC-V core design and ensure there is no unverified RTL code.
“Any extra functionality in the design, including hardware Trojans, is detected and reported as a violation of the ISA. This includes the systematic discovery of any hidden instructions or unintended side effects of instructions. Overall, formal delivers better quality of results with much less effort,” van Blommestein added.
With embedded Linux code, the picture gets more complex. “There we have multithreaded, multiprocessor systems, and it is not easy to debug using some kind of debugger that can go step by step, inspect memory, or something else,” said Gradimir Ljubibratic, embedded software Linux lead at Vtool. “In the Linux world, we are mostly depending on debug blocks and debug events so in real time. We can see what is going on with the system, how the system is fluctuating, how different components interact, and so on. Before we even try to test everything on a real system, we use unit tests to test the different small components of the system. We are currently in the process of implementing continuous integration testing to help us detect bugs in early stages of development. Also, different tools can be used for memory profiling to inspect what is going on with the code if we have some kind of stack overflow or memory violations and so on. This is mainly related to user space development with application development.”
Taimoor Mirza, engineering manager for the EPS IDE team at Siemens EDA, agreed. “In modern software development where multiple threads on multiple cores on multiple SoCs have to be covered, and where each of the components need to talk to others in the system, traditional debug techniques quickly reach their limit. These are important and necessary to track failures on a single thread/core, but for the full-system understanding of debugging, the view needs to be extended. This is where analysis and profiling tools come into play to help in analyzing and understanding complex system behavior and helping tracking down these sorts of problems. The user gets an overview of the overall system behavior, making it easy spot areas where problems arise such as networking issues, scheduling issues for operating systems, problems in device drivers. Tools also can import external recorded data and show in sync with the software traces.”
Fig. 2: Debugging with complex testbenches. Source: Siemens EDA
Trace points also can be added into the code to extend the usability, while APIs allow for the creation of custom agents that understand these trace-points for better supporting the user to provide a hint where to search for problems, Mirza said. “Also, IDE firmware developers can add instrumentation and use it in their own agents to help end-users finding problems with higher level code. Once a problem area is zeroed in, the user can now use problem-specific techniques to try to get more details.”
For example, if the user finds issues in the Linux Kernel or Kernel Modules, tools can be used to help debug the Linux kernel or Kernel modules. If the user is having issues with an RTOS, RTOS awareness features can be used to provide additional information on the issue.
In general, Mizra said, there are certain tips and techniques that can help when addressing any situation. For example, make code easier to debug through use of -O0 -fno-inline, which disables optimizations and in-lining, so you can step through all code, and do so naturally. You also can use -Og instead of -O0 to specifically optimize for debugging. Essentially, this asks the optimizer to assist with debug-ability, rather than hamper it or being disabled.
There are multiple other techniques available. In addition, debug teams can use static analysis tools, such as Klocwork, the valgrind suite, etc, which come with come with a learning curve and sometimes give false-positives, but they can find problems you weren’t even looking for, so it’s best to use them early in development and then continuously.
Running a project early on an emulator allows for better remote and parallel development, as well as better high-level test automation. Also, optimizing the edit-build-debug cycle can really pay off later. Development scripts also can be created, makefiles tuned, and the IDE adjusted to automate the fastest build and load. These kinds of adjustments can have a big impact on debug schedules and overall time to market.
Security concerns
Security is becoming a bigger topic with developers, too. “There is a natural tension between security requirements and debug visibility,” said Wall. “The information the SoC designer wants to get while the SoC is running is also potentially vulnerable, but valuable to a hacker. You cannot think about these aspects in silos. You have to think upfront how all these pieces will interact with each other. It has to be designed architected upfront.”
“Most organizations with whom I’ve worked aren’t focused on better debugging practices,” said Mike Fabian, principal security consultant for Synopsys’ Software Integrity Group. “Resilience, quality, and safety are all goals and objectives of organizations at differing levels of maturity across the board. They are focused on finding errors earlier using the latest advances to ensure routine releases are resilient and meet an accepted level of security. There needs to be a mandatory use of vetted design blueprints, clear SDK/framework/coding standards, supply chain diligence, active mechanisms in place to protect customer privacy, and automated governance and technical guardrails to avoid preventable errors early. Debugging an issue later in the development cycle, assuming that debugging means ‘this code isn’t working as intended,’ is a failure of those processes and controls. Finding bugs faster and earlier is more cost effective.”
Conclusion
Finally, because late bugs are always a risk to the schedule, in addition to debug techniques, a high level of emphasis should also be put on the stimulus and test generators. “Enabling software-driven stimulus and real-world use cases early in the design lifecycle increases the chances of hitting the complex bugs,” said Valtrix’s Roy Choudhury. “And since application software is not developed with the mindset of finding design bugs, and is often complex to debug, using stimulus generators which can exercise the system better and utilize the debug infrastructure present in the system is always a good idea.”
Related
Debug: The Schedule Killer
Time spent in debug is unpredictable. It consumes a large portion of the development cycle and can disrupt schedules, but good practices can minimize it.
Leave a Reply