The Cost Of Programmability

How much flexibility should be incorporated in a chip and at what level should it be programmable? Those questions are getting more complicated.

popularity

Nothing comes for free, and that is certainly true for the programmable elements in an SoC. But without them we are left with very specific devices that can only be used for one fixed application and cannot be updated.

Few complex devices are created that do not have many layers of programmability, but the sizing of those capabilities is becoming more important than in the past.

There are many reasons to add programmability. Among them:

  • Chip Bring-up. This includes programmability to make the chip operational. It may include calibration, healing circuitry for memory banks, and others functions necessary to get a functional chip.
  • Chip Configuration. One chip often can serve multiple functions. It can be programmed once by the manufacturer (burned in configuration information) or with the software stack shipped with the chip. In other cases, that programming may be done by the user, but it is likely to be done once and then not altered.
  • Firmware Patches. Typically this involves security updates, OS updates and driver improvements made by the chip manufacturer, but in some cases it may require over-the-air upgrades to address newly discovered security vulnerabilities.
  • Improved Algorithms. New algorithms could be loaded into AI systems or software updates that improve capabilities. When this involves an FPGA fabric, it is updated by the chip manufacturers because the industry has not yet worked out how to make this capability accessible to end-system companies.
  • Application. The application itself is supplied by the user.

Some levels of programmability only require the ability to write information into registers that are used to control hardware, but usually that requires some software running on a processor.

Processors are notoriously power-hungry, especially if the wrong kind of processor is used for a given task. There is an increasing array of processing options available. More recent introductions include extensible instruction-set processors, embedded FPGA fabrics, and AI engines.

Complex devices contain multiple heterogeneous processors and flexible communications fabrics. Each of those has to be appropriately sized for the task at hand, and potentially for future needs, as well. Analyzing the needs and the demands the processors place on the rest of the chip has become a lot more complex than it was in the past.

“What people are really trying to balance is performance, power and cost — the traditional three,” says Joe Mallett, senior marketing manager for Synopsys. “Do they know enough to semi-harden or potentially fully harden portions of the customized piece they are trying to insert?”

Programmability is added at multiple stages during the design flow. “Designers often will choose to design-in flexibility – especially in cases where they do not know exactly where a chip will be used, or perhaps where a standard is not fully understood,” says Tommy Mullane, senior systems architect for Adesto. “They want to be able to cover all bases. It also can be difficult to get key stakeholders to agree on a set of restrictive requirements. As a result, the approach tends to be to add more flexibility rather than less, and to allow configuration by software, or allow the hardware itself to be directly configured with embedded programmable resources. But adding more flexibility means increased die size and more man hours needed for designing and testing those extra features – all leading to increased costs.”

Notions of programmability are expanding, too. “In recent times, programmability in chips has transcended the traditional compute-based use cases of CPU and GPU – and have influenced other types of silicon, as well,” says Anoop Saha, market development manager at Mentor, a Siemens Business. “In particular, software-defined networking and software-defined storage have leveraged the programmability in chips to dramatically disrupt the market by creating both new types of silicon as well as the software stack on top of it.”

Configurations
Configurability often is required to support business models. “We see the need to be able to configure an SoC, and this is nothing new, or use the same resources in different ways at different times,” says Pierre-Xavier Thomas, group director for technical and strategic marketing for Cadence’s Tensilica IP. “It may depend upon the applications that are running or for enabling different kinds of services. The chip is capable of performing certain services, but you may want to turn on or off some services at different times. You need this level of flexibility. This is more a matter of business model enablement.”

We are beginning to see new kinds of configurability coming into SoCs. “Programmability can be classified as low speed and high speed,” says Synopsys’ Mallett. “Low speed might be something where you are configuring the chip or creating a bus that is custom, or an interface, or handling of memory. High-speed would be along the lines of specialized AI processor or security processor. When you have the SoC, they generally are trying to define an SoC that is going to be fitting the most segments and applications so that they can target as many customers as possible. By having that configuration piece allows them to potentially expand how many applications they could go into.”

And new configurability mechanisms are finding their way into an increasing number of SoCs. “The cost to design and fabricate chips is increasingly high,” says Geoff Tate, CEO of Flex Logix. “eFPGA enables a single chip design to be targeted for multiple applications/customer needs using the eFPGA block to customize it. And multiple eFPGA blocks can be distributed across the ASIC situated where flexibility is needed.”


Fig. 1: Configuration options abound, such as implementing a shared common internal protocol that allows heterogenous cache-coherent systems, like this example with AMBA ACE and CHI clusters sharing a common view of memory with hardware accelerators. Source: ArterisIP

Understanding needs
The number of vertically integrated companies, where hardware and software can be co-designed, has been growing. That allows chips to be tuned for a specific application. “There is always the battle between programmability versus performance,” says Mentor’s Saha. “A chip that targets a very specific application can be highly optimized for that application. This contrasting nature adds a whole new dimension of hardware-software co-design problems. Now it is not just the ISA – a well-defined interface between hardware and software – that separates a processor from the application. The co-design problem now has many different layers, from low-level silicon to compilers, libraries, data-type precision, and so on.”

Again, there are tradeoffs with this approach. “A bespoke solution requires work up-front to understand the problem for which the chip is being designed, and also requires making decisions in advance about what the chip will and will not do,” says Adesto’s Mullane. “Investing up-front in this work can lead to a less expensive chip that is ready to use with a minimum of work from end users. The result is that you get to market faster, with a less expensive product. While building some flexibility into some solutions can be a good idea, generally if a problem space is well-known, a chip can be crafted that solves the need without wasting resources on extra flexibility.”

But there will always be chips that are designed and used by different companies. “Companies that only develop the chip do not have all of the software in place,” adds Cadence’s Thomas. “They may have an example of software, but they need to enable their customer to develop their own algorithms and enable the full system on their chip. Now the chip needs to come with the characteristics of having enough computing power, being easy to program, enable software algorithms to be implemented efficiently using all of the programmable resources on the chip.”

To make things even more complicated, there are new programmability demands for areas such as security. “Traditional SoCs made the decision between a processor and a dedicated channel for secure boot, and figured out the pieces that need to be configured and what can be hardened,” says Mallett. “There are two aspects to security. The first part is booting the system, the second is operation. The first is handled by some kind of secure boot or variant of that. Operation is handled in many different ways, but many companies handle it through the processor, where you may be separating certain types of operations by having a virtual machine running underneath the OS. The ability to configure your entire security processor gives you the ability to potentially combine both of those use cases into a single fabric, where you might be able to have security for boot as well as security for runtime being handled by the fabric itself. That is a different usage model than what has been used in the past, but is an area that people are looking at because of the changing vulnerabilities and the changing definitions of security itself.”

Right-sizing the processor
When Moore’s Law was in full swing, each generation of chip had more compute capability than the previous generation and would likely start with a very similar software workload. That meant compute headroom was almost a given. “There are a lot of requirements for more processing and more complex processing that enable rapid deployment of algorithms onto a new chip,” says Thomas. “These algorithms may be an evolution at the time you architect the chip. People are trying to put as much capability as they can for a given die area and power budget in order to develop algorithms efficiently on that chip.”

New application areas are changing some of those requirements and require additional consideration of the system level. “For example, the amount of data you are getting from the sensor is so large that you do not want to transfer that data to a central computer or the cloud,” says Thomas. “You need to process the data fast, and a lot of data, in order to identify the important data that you need to carry to the next level of the decision chain.”

Many of these systems rely on AI processors, and those involve completely new paradigms for processing. “While processor companies may have an idea about how they want to handle their AI engine, they have not yet solidified the architecture enough, or maybe not yet had enough usage to understand which algorithms they may end up having to support,” says Mallett. “By making it configurable, or at least partially configurable, they can then extend the capability of the SoC itself in an area where the requirements may not yet be set.”

This lack of knowledge can impact more than the software. “The data being generated by sensors needs to go through a neural network or machine learning engine,” says Thomas. “AI is evolving so rapidly, and there is a constant stream of new neural networks that may have a different ‘shape’. They might not map as well, depending upon the shape of the hardware that is available. So you get some efficiency issues as new networks become available and you cannot modify the chip. That can lead to disappointing performance of the final system.”

FPGA fabrics increasingly are being used to tackle this kind of issue. “Some workloads run much better on a FPGA than processors,” says Flex Logix’s Tate. “This is behind the boom of FPGA PCIe boards in servers from Xilinx/Intel and the use of FPGAs in Microsoft’s data centers. Now, eFPGA integration enables SoC designers to use eFPGA to accelerate their heaviest workloads, offloading their processor for higher speed at lower cost.”

These types of processors have to consider more than just compute power. “You can keep throwing more MACs at the problem,” says Thomas. “But the issue is how much power, how much data has to come in and out, because the AI problem is a massive data transfer problem between the coefficients and the activation. The more MACs, the more throughput you need to bring in the necessary data.”

Mallet agrees. “Maybe the basic math engine for all of the algorithms is the same, and the configurable piece is associated with the data interconnects and how the memories talk to each other for different algorithms.”

This changes the programming paradigm. “In traditional SoCs, things like Arm processors and register controls and possibly some firmware that runs on some specific blocks tend to be things that software programmers understand,” continues Mallett. “They are the primary users of those chips. The challenge comes when you are providing hardware programmability into a software world.”

Conclusion
Today, an SoC team has more tools in their toolbox than they had in the past. In the past it was a processor and then a configurable processor and now a fully configurable fabric. Many of these new types of programmability are not what programmers are used to. It is not yet clear if this will become a new, expanded role for them, or if compilers will become more intelligent than they have been in the past.

Thomas provides one vision for the future. “Compilers will become aware of the features available in the hardware and help with the software development. We are talking about mapping algorithms into a target implementation, using library components, that are aware of the hardware features that are available. So code generation needs to have a view about how work is going to be fragmented into the different resources, and they can map efficiently to those resources — guided by metadata.”



1 comments

Kevin Cameron says:

After a few decades you would think that they would have learned how to do software/hardware co-design and optimize the methodology, but it still seems to be the old linear C code, and a pile of hard-to-debug RTL.

Leave a Reply


(Note: This name will be displayed publicly)