Hidden Costs Of Shifting Left

How much time and effort can be saved by doing certain tasks earlier, and where are the pain points.

popularity

The term “Shift Left” has been used increasingly within the semiconductor development flow to indicate tasks that were once performed sequentially must now be done concurrently. This is usually due to a tightening of dependences between tasks. One such example being talked about today is the need to perform hardware/software integration much earlier in the flow, rather than leaving it as a sequential task that starts when first silicon becomes available. But does it work, and does it really save time and effort?


Fig. 1: Impact of Shift left on hardware/software development. Source: Semiconductor Engineering

Systems have changed significantly over the past 10 years. Historically, bringing up software on hardware was not that difficult. System architectures had remained fairly constant and the task was fairly predictable. But that has changed with modern systems. “Products are getting complicated,” proclaims Alex Starr, senior fellow at Advanced Micro Devices. “A typical system today has over thirty firmware engines in it, in addition to the actual CPU. There has been a huge complexity increase plus there are also security concerns for these firmware engines, there is power management and it is all controlled by the firmware engines.”

Suman Mandal, emulation architect for Intel, adds “there are more programmable components in a system, and the list just keeps growing. You can no longer just verify the hardware. You can no longer just verify the software based on a scalable unchanging hardware that hasn’t changed much over the years. You have had the luxury of building on the shoulders of giants and not having to worry about it much. Today, we are seeing all of these coming together in a timeframe that a lot of the engineering communities are not used to.”

Even if all the IP blocks used within a system have been exhaustively verified, there are still verification challenges. “Unless there is one company who makes everything, then you will always have integration problems,” says Nasr Ullah, senior director of engineering at Samsung‘s Austin R&D Center. “Even if you have one company, you will have integration problems. Things do not work together, no matter how well they are written. That requires people to look and figure it out.”

And when software is added into the equation, things get worse. “What does software do to hardware and how do you isolate things such as non-determinism to effectively debug?” asks Mandal. “When you are doing verification, how do you figure out when something is going wrong and when something is just working as expected? Sometimes it is just that you didn’t expect it to happen in that way. Isolating these conditions are adding to the challenge.”

It is not enough to just have software running. It has to run well, and that requires coordination. “You have to do this earlier in the process in order to stay competitive,” adds Starr. “That complexity is difficult for verification engineers because it has not always been associated with verification. You can’t use the standard tools, such as simulators. You need emulation, you need virtual platforms, you have to understand what the software is doing. Nobody really knows what is going on in these large software workloads that we ultimately have to run.”

This requires new tools. “We have to establish the causality and relationship between what is happening in software with what is happening in hardware,” says Mandal. “These are the challenges that tools need to solve and we need a spectrum of tools as we go from component to sub-system to a full system.”

The need for speed
When software gets involved, run time becomes an issue. “We talk about emulation as being able to get to these software workloads, and we have done a lot of system emulation work — but it is too slow,” says Starr. “You can’t get enough done with system-level emulation.”

Mandal agrees. “Emulators are getting faster, but the designs are also getting larger. Over the past five years, the performance has essentially remained flat. FPGA-based prototypes have an edge in that domain. They can go a lot faster. It may be 5X or 10X. The differentiator is the threshold of tolerance to latency, to interactivity. That few X is sometimes sufficient to cross the boundary between the software team saying, ‘I can’t work with this,’ to ‘Okay, I will manage.'”

There is no one approach that is the best for all of the problems. “Even with the most advanced emulators, is not possible to get to many corner cases,” says Bill Neifert, senior director of market development for Arm. “Sometimes it is just raw throughput that you need. Hybrid helps, but if you need true implementation accurate behavior, you need to use FPGAs to handle that.”

What is hybrid? “The future is the marriage of simulation and emulation,” explains Ullah. “You have to mix and match and have a mechanism where they can work well together to solve both the speed and the complexity.”

That requires software models for some of the components of the system. “It was 20 years ago, when I started working with virtual modeling,” says Neifert. “Unfortunately, it has not held up to all of the early promise. But it is a valid and valuable part of the flow. What does work is the segregation of the software running on a model of the processor connected to an emulation system that is running the hardware. That is done partially for speed, but also in recognition that there is a processor and it needs to run all of the software, and this enables faster debug.”

There are other times when adding virtual models can help. “There are things that we don’t know yet because they haven’t been implemented, so we have them as software models,” says Ullah. “The key thing is time. We don’t want to spend so much time getting a hybrid solution to work such that we can’t use it.”

Hybrid creates another technical problem. “To get high enough performance in a hybrid environment you need a lot of bandwidth to the transactors,” says Starr. “How can you feed the transactors, in real time, without slowing the system down? That becomes a limiting factor, and FPGAs can’t even come close to addressing that. You can use FPGAs for focused testing, but emulators and hybrid are needed, and the vendors need to invest to improve that.”

How far left?
When is the right time to start developing and integrating software? “When Arm develops IP, we start developing a system and software as soon as it is even in the concept stage,” says Neifert. “We start with the methodology as early as we can. We are developing models now for processors that we won’t even talk to partners about for another year or two.”

Arm isn’t alone on this. “We develop performance models before the hardware gets started,” adds Ullah. “We have to be able to give the architects ideas about what they need to do, and we rely on the model to allow us to make tradeoffs. These are C++ virtual models.”

The situation is helped by the incremental nature of most designs. This is especially true of processors and many of the interfaces into a design. “We take the design from a previous generation and extract traces of the software that we can run on the models of the next generation,” continues Ullah. “Some can be virtual models, some can be hardware, and if we are lucky we get something that we can boot the OS on. Then we can look at newer software. In the past that used to be when all features were complete, but we are moving that up.”

If the benefits are so large, why do more people not do it? “I have been advocating shift left my entire career,” admits Ullah. “I am glad that the industry is moving towards that. What I did not envisions was the cost of it. It is much higher than I expected it to be. There is a hidden cost that we never looked at . We get all the models, we get the hardware and software, but making it work together is a huge cost.”

Starr agrees. “Yes, it has been effective. Yes, it has been painful. There is a technical aspect to it. But the hardest thing is the cultural aspects of it. You are dragging teams that are used to doing stuff in a post-silicon environment and pulling them pre-silicon. Software teams always run at lightspeed and trying to make them accept these slower domains, even though verification teams think they are thousands of time faster, to those being pushed left. They are incredible slow.”

It all comes back to speed. “The challenge is always from people on the right, looking left, says Mandal. “They see everything as being very slow. When we say shift left we actually mean stretch left, because what takes two hours to do in the environment they are used to could take two weeks in shift left. Yes, they will be doing it two or three quarters earlier, but it will take longer. That means that more time needs to be allocated to accomplish things.”

There are also difficulties associated with putting the methodology in place. “The problem when you start pulling things left is that the cycle for those teams is out of sync,” says Starr. “The point where they are doing emulation work may be the point they want to be on real silicon, but you don’t want them to just drop everything and go work on the silicon. It is a resourcing problem.”

Ullah suggests another way to approach the problem. “Sometimes the only way to solve it, when starting a shift left strategy, is to put it all in one team. Instead of waiting for the bring-up team to learn to do things two quarters ahead of time with new tools, you build a team that makes that the primary focus. If you change the culture at the same time as you put in new tools and methodology, it is hard to succeed.”

There is a lot that the industry can do to help. Verification tools, especially those targeting system-level tasks, have to think more about the software and the implications of integration. The Accellera Portable Stimulus Standard is an example where this is an issue today. The current draft does not include the proposal for the Hardware/Software Interface (HSI) layer. If it does not make it into the first draft, it would mean that the industry has no standard way to target integration testing. While some vendors promise to include this capability, it would not be portable between them.

Related Stories
The Ultimate Shift Left
Many implementation steps have been moving earlier in the design flow. Floorplanning is next, and it encompasses everything from architectural to physical.
Chip Test Shifts Left
Semiconductor testing moves earlier in the process as quality and reliability become increasingly important.



Leave a Reply


(Note: This name will be displayed publicly)