Updates, last-minute changes to design add a whole new set of challenges at sign-off.
Adding different kinds of processing elements into chips is creating system-level incompatibilities because of sometimes necessary, but usually uncoordinated, firmware updates from multiple vendors.
In the past, firmware typically was synchronized with other firmware and the chip was verified and debugged. But this becomes much more difficult when multiple heterogeneous processing elements are introduced into the design for applications such as AI and networking infrastructure. Rather than updating firmware for a single type of processor, there are now multiple different processors, each with their own firmware updates for everything from performance and power to security.
The problem stems from the fact that no single processor can do everything optimally. A CPU for example, can handle high-level provisioning of resources for other compute elements more efficiently than other processors, but it is inefficient at multiply accumulate functions. In contrast, GPUs, FPGAs, eFPGAs, and DSPs are better at fixed or floating point math, but inefficient at managing resources.
The chip industry has seen this coming for some time. Starting at 90nm, chipmakers started making more tradeoffs in design than at any period prior to that, adding more cores to limit static current leakage and more memory to improve performance. At 16/14nm, these issues became acute enough that 3D transistors were required. And at 7/5/3nm, there will be more changes in transistor types (nanosheets, nanowires), materials (cobalt interconnects, possibly new substrate materials), and manufacturing and test equipment (EUV, ALD, ALE, ATE) just to maintain improvements in power and performance of 20% or less.
That’s not enough of a gain for most applications at advanced nodes, however. So to augment performance and reduce power further, chipmakers are making architectural and micro-architectural changes. This is critical in established markets such as mobile phones and networking, as well as new markets such as AI training, inferencing, 5G infrastructure and autonomous vehicles. Instead of a single processing element, chipmakers are adding multiple processing elements, often with small amounts of memory in close proximity of those processors. They also are experimenting with in-memory and near-memory processing, which can save time and energy by doing the processing closer to voluminous amounts of data, and by utilizing various types of multi-die packaging with high-speed interconnects to memory and between chips.
Each of the processing elements in these devices contains firmware, which is being constantly updated to deal with changes in algorithms and other higher-level code. But because all of these pieces are being developed by different vendors, keeping firmware updates and versions synchronized is becoming more challenging.
“We’re using processors from different vendors, and each vendor has its own release schedule and different quality of firmware,” said Mike Gianfagna, vice president of marketing at eSilicon. “This is a very big headache. Most of the problem is in the bring-up of the firmware. There are firmware interactions with other firmware. You also need to make sure everyone on the design team has the same version of firmware that they’re using.”
It gets more complicated. The same processor or memory may utilize different versions of firmware from the same vendor, depending upon the end application. Or it may use different versions of IP as that IP is updated over the course of the design cycle, each with its own unique firmware.
“So now you’ve got embedded processors, firmware and software, and when you hand off the package it gets more difficult,” said Gianfagna. “Keeping this all harmonized is very difficult.”
That may be an understatement. In many cases, there are hundreds of different processors or IP blocks, each with their own firmware.
“With heterogeneous computing, you build these systems using a lot of components,” said Ranjit Adhikary, vice president of marketing at ClioSoft. “But all applications may have different versions from different vendors. What gets updated and how that impacts them determines how they choose their firmware. This is becoming more and more of a challenge.”
Consider safety-critical markets such as medical devices and automotive, for example, where standards are still being defined and the technology is evolving to adapt to those standards.
“If you look at the medical device market, it takes a long time to make sure everything is working properly,” Adhikary said. “Typically, if something impacts you, you update the firmware. If it doesn’t impact you, you don’t do anything. But when you want to schedule an update, that’s usually for selective features. You may not need all of them. So you can choose not to do the update, or you can do the update at a different time. All of this requires a lot of testing. A lot of applications work with virtualization software like VMware. When you do simulation and get the latest handoff, you have to figure out what’s changed and whether that still works. You need to keep track of every design element and what version of the software they’re using.”
Different approaches to firmware
Firmware is software that is embedded into a hardware device to control basic operations, such as I/O, performance and power management. It can be exposed so that vendors can update it, typically using erasable programmable read-only memory (EPROM) or flash memory. Or it can be embedded permanently into ROM and never touched again.
In large chips, updating firmware can quickly spiral out of control. There are hundreds or even thousands of different elements in some chips, and there are a limited number of connections to allow everything to be updated. At older nodes this hasn’t been a serious problem because there is enough margin and potential workarounds available at a higher level, so that when something doesn’t work the chip continues to function. This is why systems companies are able to send out software patches to override hardware issues. While they may impact performance or power, that generally allows products to function until the next rev of that chip is available.
This approach becomes much more complicated in heterogeneous designs—especially in markets where products are supposed to last longer smart phones or other consumer devices, or in segments where safety is involved. The inability to update firmware that is deeply embedded in a chip can quickly render a device obsolete or limit its usefulness over time.
“If you have a many-core design, there’s a leap of faith that these devices will always be able to run simple tasks,” said Chris Jones, vice president of marketing at Codasip. “If one doesn’t run, there is no way to isolate the core. This is a tradeoff people have been willing to make in the past in name of power conservation.”
In large chips, complexity generally is tackled by segmenting portions of a design across the supply chain. This is the classic divide-and-conquer strategy that has been use successfully for many years. But as chips become more complex, the amount of third-party IP being used goes up considerably. Even large chipmakers don’t develop all of the IP in those chips, and as systems vendors increasingly develop their own chips, they are relying heavily on various commercially developed components such as instruction-set architectures, memories, I/O blocks and on-chip networks.
Add in data from multiple sources, or AI chips where many processors and memories are scattered around a chip, and firmware updates become increasingly difficult to manage.
“This is getting worse because of the heterogeneous nature of these designs,” Jones said. “The software guys are always going to push homogeneity. If you add in SMP (symmetric multi-processing), it makes the software job easier because the OS handles everything. But heterogeneity is a fact of life for low-power designs and cost.”
The problem extends to all software development, which is the fastest but not necessarily the most efficient way of keeping systems updated. But the higher the software level, the more exposed it is and the easier it is to change.
“Modern SoCs are built from different types of interconnected, scalable subsystems, with the final configurations based on the target market or customer’s requirements,” said Zibi Zalewski, general manager of Aldec‘s Hardware Division. “The scalability of subsystems allows them to grow the size and complexity very quickly, so it’s not a problem to scale from dual- to quad-core, for example, but it may be an issue to catch up with the proper tools. In addition, the hardware part of the project is no longer the dominant element. Software layers add significant complexity to the project, so it’s not just about the number of transistors. It’s the target function.”
Just getting hardware and software to work together is a challenge. Keeping them updated and debugged makes that more difficult.
“Going into future systems, whether that’s autonomous cars or 5G base stations, the problem is how to utilize the hardware, how to understand the code, and how to make all of this work,” said Max Odendahl, CEO of Silexica. “There is programming language after programming language, more modularity, more abstraction, more isolation. So you make it easier for individual teams to develop. But the problem is you have to put it all together. Maybe you need more parallelism to utilize your hardware.”
In some cases, the firmware becomes the glue to higher-level software. In some cases, it is only used internally in a processor or block. And in still other cases, the higher-level software ties into hooks that the firmware provides. All of this can get incredibly complicated very quickly.
“There are a million different ways to write your application,” said Odendahl. “There is shared memory, global variables, shared pipes and threads and multiple binaries all talking to each other. How do you understand whether the software you’ve written has anything to do with what was intended in the first place? We still see a lot of siloed thinking. The architect is doing some XOR in UML. Does that even mesh? People talk about software architecture erosion. Does the code have anything to do with what you intended in one place? To figure that out, you need very detailed software code understanding.”
That requires something of a mindset change for design teams, which need better ways to track these changes.
“We see issues with firmware in memory management and register management,” said Kurt Shuler, vice president of marketing at Arteris IP. “In an SoC, when you do a respin, the register map will change, but that’s not sent to the software guys. So you have to maintain synchronization between the registers, the peripherals and the hardware abstraction layers and the drivers, and today people are trying to track this with spreadsheets.”
Security and verification issues
Security adds another dimension to this problem. Complexity often leads to security vulnerabilities, and firmware that is not updated regularly can be one of the attack vectors. At the same time, hackers can gain entry to a system as the firmware is being updated, which frequently is done over the air.
“This is becoming more and more of a problem, because anything you do in software can be hacked,” said ClioSoft’s Adhikary. “If something happened, do you know what changed? If malicious code was inserted, can you see when and where that was done? If you understand all of this, it makes it easier to debug, but you need to know what handoffs were made, what versions were done and used. Did whoever made those changes check it in? What issues were they working on?”
It’s also becoming more challenging to track all of these changes on the verification side, because changes in firmware can affect the overall functionality of a device, and not all tools recognize that.
“We all remember the times it was enough to use simulation and implement the design for prototyping before the tape-out,” said Aldec’s Zalewski. “The next important milestone in the verification history was emulation, which was driven by the SoC projects to accelerate the simulation stage. And now, when the complexity of those doubles, hybrid co-emulation has gained in popularity, connecting high-level software modeling with hardware system verification. The integration of tools, enabling different teams working together as early as possible, has become a major challenge for the tools providers.”
Conclusion
Firmware always has been a key part of chip design, but by and large it has been inaccessible to the outside world once that code is written. That has changed over the years as more updates are required by systems companies as a way of keeping those devices current and secure, and systems companies require some of that firmware to be exposed so it can modify it as needed.
What’s changing, though, is the number of elements that need updating, in part because there are so many more components in a complex chip, and in part because many of the new markets requiring heterogeneous integration are still evolving. This is particularly evident in AI, where algorithms are changing regularly, and in 5G, where it’s not entirely clear yet how some of these devices will be tested. Firmware written for one element in those designs may be updated far more frequently in the past, and it may have to be updated over longer periods of time, particularly when chips are used in markets such as assisted and autonomous vehicles. This makes tracking firmware and potential interactions considerably more difficult, and that is unlikely to change anytime in the foreseeable future.
Related Stories
AI Begins To Reshape Chip Design
Technology adds more granularity, but starting point for design shifts as architectures cope with greater volumes of data.
New Design Approaches At 7/5nm
Smaller features and AI are creating system-level issues, but traditional ways of solving these problems don’t always work.
How To Build An Automotive Chip
Changing standards, stringent requirements and a mix of expertise make this a tough market to crack.
Building AI SoCs
How to develop AI chips when algorithms are changing so quickly.
Leave a Reply