IoT Debugging Crosses The Hardware-Software Divide

Embedded design means engineers of different disciplines need to work closely together during the design phase of a project to avoid bugs.

popularity

By Paul Hill and Gordon MacNee

Debugging is an important part of embedded design; one that necessarily crosses the hardware/software divide. At a system level, the functionality of an embedded design is increasingly defined by firmware, so avoiding bugs requires engineers with specific disciplines to work closely together during the design phase of a project. It can also mean resisting the urge to point fingers when a bug does inevitably arise.

Perhaps it is the nature of software-defined hardware that makes modern embedded design such an interesting profession. Every new microcontroller (MCU) seems to offer higher integration and more advanced features, but it is completely void of purpose until it has been programmed. While this level of integration and configuration is clearly an enabling factor and one that is delivering massive advancements in product design, it can occasionally present engineers with unforeseen issues.

The level of functionality and configurability features offered by embedded components such as MCUs is also on the increase, and these components offer many features that simply aren’t required in every design. These extra functions may just be ignored and rarely cause problems.

As most engineers will appreciate, these features will typically be controlled by registers that can be modified through software. As such, they will have a default setting at power-up and, if left unchanged, will continue to operate under those default settings. In many cases this may not present a problem, but if these features remain unused and perhaps untested, there is a chance that their impact will be felt in some unforeseen way. Bugs may develop in the system, caused by perfectly legitimate features that have perhaps been overlooked.

Finding faults can be difficult, time-consuming, and costly, even under ideal conditions. Normally a fault will be identified by its effects, which will provide enough evidence to allow engineers to trace the cause. Whether that cause is hardware or software related will be largely irrelevant but perhaps still contended; the important thing is it was found and rectified.

If the cause of a fault is a low-level feature that hasn’t been initialized correctly, then finding it could become even more challenging. Understanding how the initial state of the hardware platform could impact an entire design requires a much higher appreciation of the overall system, and tracking down these elusive conditions really can consume resources.

For example, consider an SPI bus on an MCU accessing a serial flash memory, which is a relatively simple feature used in many embedded systems. If an error is detected in the stored value, it would indicate that the memory, rather than the MCU, was suffering from a fault. This was one customer’s experience when successive reads from the status register of a flash memory showed it was detecting read/write errors. Understandably, it was assumed that the memory device was failing, a theory that was substantiated by the fact that if a short delay was introduced between status register reads, the number of faults detected seemed to reduce. In addition, a power-cycle seemed to clear the fault for a while.

The engineers believed these symptoms pointed to the serial memory failing, even though it was still well within its specified cycle limit, having only completed around 60k write cycles. When the serial flash memory device was returned to Adesto for further tests, no fault was found, even after over 300k write cycles were executed.

To track down the real fault, Adesto engineers investigated the customer’s application and probed the SPI signals. What appeared to be a fault with the memory device actually turned out to be a system noise issue, and one that could be easily corrected. Although it was due in part to the mismatch in impedance of the PCB tracks between the MCU and the flash memory, the noise wasn’t entirely the result of poor PCB design or signal integrity problems.

Even though it appeared to be a PCB or circuit design issue, the noise was in fact overshoot and undershoot on the SPI signals, caused by excessive drive strength of the signals. The overshoot was enough to disrupt the charge pump of the flash memory device and cause read and write errors. In some cases, overshoot and undershoot on a SPI signal can also be interpreted as signal transitions, which can also result in read or write errors.


Trace image showing the overshoot and undershoot present on the SPI lines

One possible solution was to put an RC circuit on the signal traces in order to slow down the transitions. However, it was discovered that the design was based on a relatively new MCU which allowed the drive strength of the I/O pins to be modified in firmware. Reducing the drive strength of the signals was enough to eradicate the overshoot and undershoot on the SPI signal lines, effectively removing the source of system-level noise.

The most important point here isn’t really that the flash memory device was doing its best to contend with a significant amount of system noise, but that a configurable feature on an MCU could introduce effects that were so easily interpreted as faults in a separate part of the design. In this instance, the fault was detected through a robust approach to design and was resolved through the diligence of Adesto engineers.

Perhaps the real lesson here is that what may appear to be a hardware fault could easily be fixed through software.  What looks to be a failure in a one component could be traced to an incorrect configuration in another component. The working relationships between hardware and software engineers, as well as customer and supplier, should be strong enough to withstand the challenges that designing with the latest technology can present.  Even though default settings are meant to help, they should be verified. Optimization of these settings can lead to significant improvements in system performance and reliability.

Read the full case study, “How system level noise in digital interfaces can lead to spurious errors in serial flash memory,” on Embedded.com.

Gordon MacNee is an EMEA applications manager for Adesto Technologies.



Leave a Reply


(Note: This name will be displayed publicly)