The MCU Dilemma

Microcontroller vendors are breaking out of the box that has constrained them for years. Will new memory types and RISC-V enable the next round of changes?

popularity

The humble microcontroller is getting squeezed on all sides. While most of the semiconductor industry has been able to take advantage of Moore’s Law, the MCU market has faltered because flash memory does not scale beyond 40nm.

At the same time, new capabilities such as voice activation and richer sensor networks are requiring inference engines to be integrated for some markets. In others, reliability, safety and/or security are adding to the levels of complexity demanded in these devices. As a result, MCU vendors are rethinking what these devices look like, dispelling many long-held assumptions.

The range of MCUs is large. “The technology required for embedded applications is constantly evolving, with thousands of choices across the MCU market,” says Thomas Ensergueix, senior director for automotive and IoT lines of business at Arm. “This ranges from simple, low-cost, deeply embedded sensors to highly complex devices running operating systems, enabling our partners to deliver everything from ultra-low power, energy harvesting devices to functionally safe AI-enabled intelligent machines. The goal is to get from idea to silicon at the lowest cost, with the lowest risk and the fastest time to market.”

The low-end remains important for MCUs. “The majority of our customers remain on 8-bit microcontrollers,” says Farzad Zarrinfar, managing director of IP at Mentor, a Siemens Business. “The 8051/2 are still very popular cores. These people are very concerned about low power, such as Bluetooth low energy types of applications.”

At the other end, the range is expanding. “Today’s applications demand flexible processing solutions in order to find the best tradeoffs in terms of processing performance, power consumption, integration of memory and programmable arrays, underlying semiconductor technology and other aspects,” says Roland Jancke, head of department for design methodology at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Existing lines of distinction between traditional concepts like CPU and MCU are softening. NXP, for instance, calls it ‘scalable processing continuum’ and is developing so-called crossover processors between MCUs and application processors.”


Fig. 1: Not so simple anymore. Arm’s newest MCU, which can be paired with an NPU for up to 480X performance improvement. 

End of the road for embedded flash
Flash was holding MCUs back. “Flash has done very well down to 40nm,” says Bard Pedersen, director of technology for Adesto Technologies. “With flash at 40nm, the cells get so small that the chance of them keeping their correct values gets smaller. When you look at embedded flash at 40nm and compare that to 28nm, the flash cell doesn’t really shrink. The rest of the logic shrinks, but the flash doesn’t shrink that much, which means that embedded flash gets more expensive.”

Others agree. “The floating gate flash that has been used is not ready to move beyond 40nm, and so alternative technologies are clearly the path forward. Ron Lowman, strategic marketing manager at Synopsys. “MCU vendors see a roadmap for the next year or two that may lead to the adoption of alternative technologies like MRAM and ReRAM. Before they do that, they want to make sure the technology is robust and reliable.”

There is progress being made on that front. “We work closely with the ecosystem to enable rapid prototyping and implementation,” says Arm’s Ensergueix. “One example of the progress is the work Samsung is doing with the first commercially available embedded MRAM (eMRAM) compiler targeting Samsung’s 28nm FDS process; this has been silicon-proven.”

Another candidate is resistive RAM. “Resistive RAM has great potential somewhere in the future,” says Adesto’s Pedersen. “However, it is not a direct competitor for embedded flash yet. ReRAM does not have the same limitations of shrinking as you have with flash. The memory cell themselves actually behave better as you shrink because the resistive RAM cell is basically an atomic string. The rest of the cell is wasted space. That wasted space can contribute noise, so the more you cut it down the better it gets.”

Other memory types are also being considered, as well. “One-time and multiple-time programmable (OTP and MTP) memories are good for security applications or calibration applications,” adds Synopsys’ Lowman. “You may even want to consider it for mature code. For example, Dialog has put Bluetooth stacks into their OTP rather than using embedded flash.”

Going external
A number of MCU vendors are working on moving memory off-chip. “Embedded flash is being squeezed by external Serial Peripheral Interface (SPI) flash,” says Lowman. “That affects the price point of when, where and how embedded flash is used. You can use a single data rate, or now you have dual and quad and octal connectivity. This is a viable option for some applications that want lots of embedded flash — especially when you are getting above 1MB.”

Putting the memory in a separate chip could make you might think this would be slower. That’s not necessarily the case, however. “The JEDEC spec goes all the way up to 200MHz, which means you get a bandwidth of 400MB/s from memory to processor,” says Pedersen. “Consider NXP’s RT1050 series, which was launched two years ago. Having a Cortex M7 core and everything including the kitchen sink on the die — and because they only have SRAM on chip, no flash memory — the die is tiny, the device is low-cost, low-power, and it can run extremely fast.”

Pedersen provides a performance comparison. “If you transfer from flash to SRAM and execute form there, the device can operate at 600MHz and they can achieve 3,000 CoreMarks. Compare that to competing devices from NXP that use embedded flash. None of them can run faster than 400MHz, which translates to about 2,000 CoreMarks. If we take the NXP device running at 600MHz and use the octal memory interface, we get around 2,000 CoreMarks. So you have a small, low-cost microcontroller, with a huge low-cost external memory device, and you have the same performance as an embedded flash device. The only difference is the two-chip solution consumes less power, it costs less in dollar terms, and you can have a much larger memory.”

Unleashing new nodes
Being freed from a node restriction opens up other possibilities. “40nm was the process node of choice for many of them,” says Lowman. “It clearly looks as if 22nm, or some other technology, is well positioned longer term. One of the key challenges is the adoption of neural processing capabilities that are needed on chip. How do you structure the memory that goes around that? This is one of the biggest challenges with neural processing to accommodate what they are trying to do.”

And MCU vendors are moving down to the finer geometries. “The devices coming out now are in a different node,” says Pedersen. “The NXP 1170 is a 1GHz device. We can assume that it is further down than 40nm. For the MCU vendor, choosing which node they go into comes down to cost versus performance. You get to the point where the device becomes pad-limited and there is no need to shrink any further. Even in 40nm, the Cortex M7 is tiny. By shrinking it further they had to add more features, so they add more SRAM and a lot more I/O functions just to fill out the fabric.”

Node migration does not make sense for all devices. “The higher nodes will continue to exist because you have a billion devices that just need a small 8-bit microcontroller to flip a switch or run a small motor,” adds Pedersen. “They will never go into the deeper nodes because that would be overkill. So, you will see a spectrum from the deepest nodes down to 7nm or 5nm for the biggest CPUs, all the way up to 350 and 500 simply for certain applications where it is the right tool.”

Newer packaging techniques also may make an entry. “The concept of chiplets is targeting this trend, as well,” says Fraunhofer’s Jancke. “It will enable tiny silicon parts to be implemented in the best-suited technology and be tightly integrated together with other parts within one package. Of course, EDA tools need to take this into account and allow designing beyond the scope of a single silicon area — even for multiple chiplets placed on an interposer within a package.”

Adding complexity
There is demand for a lot of new functionality in MCUs these days. “I look at a lot of designs and I am seeing more MCUs containing a network access controller (NAC),” says Mentor’s Zarrinfar. “This is a networking solution that regulates how secure devices connect when they first attempt to access a network. I frequently see hash circuitry or AES encryption blocks. Security is a very wide range of concerns and could involve fingerprinting, or things within the supply chain to make sure that chips are not sold to the wrong people.”

Using external memory could make the devices less secure. “Solutions may add encryption on the memory interface,” says Pedersen. “There is an AES block sitting there on the chip. Adding that didn’t cost much. Right now, most of the customers don’t even bother to use the high-speed octal DDR memory, they just use a standard quad SPI because they don’t need the performance that the chip has. They just use it because it is a good solution at a low cost. They may run it at a lower speed, using slower memory and that is good enough.”

That has been the hallmark of many MCUs. “While there is likely to be a core of functionality that most devices require – security, connectivity, CPU – many will be differentiated by bespoke accelerators and a mix of application specific sensors and actuators,” says Arm’s Ensergueix. “This requires a range of solutions that provide a pre-validated, secure subsystem, which can be augmented by our partners with their own differentiating IP.”

One area in which there is rapid progress involves neural processors. “The impact of machine learning (ML) in the industry is enormous,” adds Ensergueix. “Neural network algorithms are expected to run across multiple devices, from high performance cloud servers to low power devices. Depending on the type of applications, ML algorithms will run on the CPU, GPU, or NPU, and software frameworks greatly simplify the deployment of machine learning by abstracting the differences between the different types of accelerators available on the device.”

The software stack may enable MCU vendors to differentiate. “Neural algorithms can be compressed and condensed, and there are many types of algorithms,” explains Lowman. “I have seen an uptick in adoption beyond standard CNNs or RNNs to algorithms that have more sparsity or transforms, like spiking NNs. They could leverage smaller memories. Tools help customers model those algorithms so they can understand and find architectures that work best given their power budget. It is a really exciting time because there are so many moving pieces that you really need some innovative engineers to figure out the optimal path.”

Reducing cost
MCUs always will be cost-sensitive. “Custom design sounds good because it enables you to match an algorithm to the best possible solution,” says Zarrinfar. “If there are data intensive aspects, these can be mapped to data path architectures to provide acceleration. However, it is easier said than done and many people run into problems when they do this. Mistakes are costly, and this is why they use available MCUs.”

Total cost of ownership is important. “Several years ago the IoT was the hottest topic, and so connectivity was put on everything,” says Lowman. “But it really didn’t provide enough value to the end user. Today, there are many product ideas where the trend is to provide more value. Devices need to do more things on their own, not just provide connectivity to capture some data. It actually is doing something with the data. The ability to provide some custom acceleration in hardware allows them to ensure that the power consumption, the value that the processor can provide is good or competitive enough compared to the things that others are providing. So we do see a lot of AI accelerators out there, which the MCUs will continue to have to keep up with.”

Processor migration
Within the SoC space, the RISC-V often pops up in conversation. “RISC-V has had good penetration, particularly for the SoC guys,” says Pedersen. “RISC-V is becoming a more popular core simply because it allows a lot more customization than Arm has been allowing up until now. I believe that the SoC vendors targeting specific markets will be quicker to adopt RISC-V than general purpose microcontroller vendors. That may change in a couple of years when you have the next generation of engineers coming out of the universities. Every university is using RISC-V as a teaching tool because it is more interesting to have something they can modify rather than a block set in stone.”

Of course, Arm is not idly sitting by. “Arm recently added Arm Custom Instructions to the Cortex-M CPU architecture,” says Ensergueix. “This further enables our partners to rapidly differentiate.”

But this is not a path for everyone. “Open-source IP, especially RISC-V processors, offers advantages while enabling innovation and differentiation in hardware as well as software,” says Tom Anderson, member of OneSpin’s Technical Marketing team. “RISC-V allows user-defined instructions to be added, and other forms of IP may well follow suit in offering customization options. However, both the variety of sources for RISC-V cores and the ability to modify them demand a strong verification process. Users must ensure that a core is compliant with the RISC-V specification, that any user extensions work properly and do not break compliance, and that the core’s microarchitecture is fully verified, as well.”

At the end of the day, it may depend upon the quality of the toolchain. “RISC-V is a fascinating introduction,” says Lowman. “We have our ARC processor that supports adding custom instructions, and they are all trying to replicate what ARC delivers. A lot of companies are moving away from proprietary processors and they see RISC-V and then realize that they don’t have a processor team anymore to support it. So this requires a good toolchain. We have an ASIP designer tool, and it has a preconfigured framework that supports RISC-V. It spits out its own compiler and toolchain.”

Conclusion
When the industry hits a barrier, it finds innovative ways to deal with the situation. Sometimes, after a period of transition, the new direction can be significantly better than the previous trajectory. The limitation of flash is causing many aspects of the MCU market to be reconsidered. As a result, devices already are entering the market with higher levels of functionality and performance, at lower cost and power.

Related Stories
Memory In Microcontrollers
Different approaches where more memory is required.
Non-Volatile Memory Tradeoffs Intensify
Why NVM is becoming so application-specific and what the different options are.
Tech Talk: MCU Memory Options
A look at the tradeoffs between embedded NVM and system in package.



2 comments

Enzo Donze says:

I enjoyed reading the story. Just wanted to add that Stmicro is proposing embedded PCM for their automotive microcontrollers.

jay says:

Hmm. So RISC-V is akin to C++ of instruction set design, giving ppl enough rope to hang themselves but very flexible, while ARM is like solid C but a bit constraining. Funny plain old C is making resurgence of late in software world, verification/debug being such a major cost these days polySi price may be less worry.

Leave a Reply


(Note: This name will be displayed publicly)