The role of silicon for verification and validation is changing. While there is the desire to Shift Left, silicon also has become a valuable verification engine.
Not too long ago, the return of first silicon from the foundry was a nail-biting moment as power was applied to the chip. Today, better verification methodologies, increased use of emulation, and more mature fabrication practices have transformed how teams utilize first silicon. It is about to be transformed again, and there are some interesting possibilities on the horizon.
Much of what used to be considered post-silicon verification has undergone a Shift Left. It is now a pre-silicon function. The bring up of software often starts long before tape-out on virtual models or on RTL loaded into emulators or FPGA prototypes. However, the state space of a modern chip is so large that it is impossible to fully verify a chip using these engines, and thus a significant verification challenge remains.
While verification checks to see if the product meets the specification, validation attempts to define if the product is fit for purpose. An increasing number of designs may be chasing a moving target, meaning that validation has to be continuously assessed throughout the life of the product.
“There is a big desire to do things as early as possible,” says Colin McKellar, vice president of verification platforms for Imagination Technologies. “Instead of getting the chip back and spending two months in the lab and then four months in software’s hands before the chip is ready to go, they are looking to launch the product very quickly and looking to go high volume as soon as possible.”
But the time spent in post-silicon is not gong down. “Post-silicon verification and validation are increasingly important,” says Andy Gothard, director of marketing for UltraSoC. “An increasing amount of engineering cycles are being spent on post-silicon bring-up and systems integration. They are addressing rather subtle problems, which only show up post-silicon, such as, ‘Why is the CPU not delivering the expected performance?’ or, ‘Why do some DMA transfers take too long?’ Many of these problems don’t show up until the chip is integrated into the final system, and increasingly not until the product itself is actually field-deployed. That’s why field firmware and software updates are now a fact of life.”
Bridging pre- and post-silicon verification
With an increasing number of system-level verification tests being run pre-silicon, the industry has had to develop tools that enable the development of those tests. “Developing and debugging the testcases can be as complex as design, so you want to have them done in advance,” says Frank Schirrmeister, senior group director for product management and marketing at Cadence. “Then, when you go into post-silicon, you really can say that I have verified that these testbenches are testing something valid. You do not have to debug the testbench. You can focus on debugging the design.”
Knowing that you have a testcase that can run in both environments makes a lot of sense. “Customers have a real interest to exactly rerun the pre-silicon tests in post-silicon because the setups they can use to set up the test stuff is mimicked in the pre-silicon environment virtually,” says Johannes Stahl, senior director of product marketing at Synopsys. “They can define the traffic shape, the traffic format, the way that the different Ethernet channels are being loaded with traffic. This is very valuable to shorten the time to get silicon running, and is another way that people are shortening post-silicon validation.”
Post-silicon has become just another verification engine. “Every time you move to a different engine, there is the ability to re-use tests that were done before and it is probably wise to do so,” says Stephen Bailey, director of strategic marketing for Mentor, a Siemens Business. “You use them as a smoke test. But that is just to make sure you are ready to move on and start achieving new things on that new engine. The same is true when you look at silicon itself, being the last engine. There is going to be a set of tests that were used pre-silicon. You will have written your diagnostics and different types of things that you want to use, and you will verify all of that before you have silicon. That will be done in emulation or prototypes. Once all of those things are checked off, you will start putting as much real-world testing data through it as you can. It could be on the software side internally, or from external interfaces. It really depends upon the objective and the application market. You are continually trying to expose any problems. It is better that you find it than a customer. Then you may have a software fix for it.”
There is only so much verification that can be done pre-silicon. “Pre-silicon execution will always be orders of magnitude slower that the real silicon, so you cannot run as many cycles, nor will you be able to see all of the real test cases that happen when you do validation in a system context,” says Synopsys’ Stahl. “You cannot remove that risk prior to silicon. In pre-silicon you may run 10 scenarios, and those are re-run post-silicon and they all work. When you expand the scenarios, you may find one that does not work. You have to program the pre-silicon environment to run that scenario, and that gives you more eyes because you have full visibility into the silicon. So the connect of post- and pre-silicon debug capabilities is there. It has always been there. When we have a test capability that is completely mirrored, it becomes even stronger.”
Dealing with bugs
Some bugs manage to escape. “Once a functional bug is caught, the same behavior has to be reproduced in an RTL simulation environment for proper bug exploration, debug and resolution,” says Bipul Talukdar, director of applications engineering in North America for SmartDV. “For this simulation, emulation and formal verification tools are applied in the validation process. A standardized application-oriented post-silicon validation methodology is required to be successful.”
Reproducing the bug can be challenging. “There is an element of luck in there because you have to hope you can wiggle things in exactly the same combination as the bits in the silicon which is not always easy,” says Imagination’s McKellar. “This becomes increasingly difficult as complexity goes up. If you cannot recreate the silicon issues in an easily debuggable way, it can get very painful very quickly. We can do scan dumps and we can read those back into emulators and then push things through and hope that we can then capture using the scan output to get a starting point for a last known good state. But that can be very challenging and time-consuming.”
Some companies solve this problem on their lab boards. “The emulator has an application that is called deterministic ICE,” says Mentor’s Bailey. “Even if you are taking in ICE input, it can capture that and replay it in a virtual use mode. That enables you to debug there. Here, you are taking the same concept but starting with the actual silicon. Now you want to go back and replay it in a prototype or emulation where I can debug it easier. A lab board could provide a sniffer that can capture what is going into the chip. I can see that many would use an external sniffer recorder system because the amount of data that will be coming is high – so the chip will not have the memory to capture it all. Within the chip, there could be instrumentation to help record specific sequences about what is happening inside the chip itself. It was in this mode and then this happened followed by this. That could be important for recreating bugs.”
Some IP providers embed capabilities to help analyze problems. “Silicon debug time for hardware and software issues can be quite large so there is a focus to improve visibility of SoCs,” according to an Arm spokesperson. “By using specialized circuitry, issues can be isolated to select IPs and coverage can be better analyzed. Embedded logic analyzer circuitry can be connected on chip to IP signals or interfaces to provide internal visibility by analyzing sequences, selectively tracing IP signals and therefore reducing debug time and improving coverage analysis.”
What people are looking at has changed from simple visibility to understanding what is happening at the system level. “Bring-up, post-silicon debug, customer engineering and in-field performance optimization have all required the talents of engineering staff who have an intimate understanding of the chip, the surrounding system and the software running on it,” says UltraSoC’s Gothard. “Hardware monitors have to be entirely non-intrusive because you’re looking at real-world behavior. They must be smart, so that engineers can home in on a particular behavior of interest, and dramatically reduce the amount of data they need to parse and assimilate. Automated tools can then spot patterns and anomalies which it would be beyond human capabilities to detect.”
“It is not a question of if you put this in or not – it is a necessity,” says Stahl. “It costs a little bit but without it you have very little chance to figure out what may be going wrong when the silicon doesn’t work.”
Sometimes, the bug is not functional. “A datacenter company told us that the capabilities they inserted into the chip was a couple of percent overhead,” says Cadence’s Schirrmeister. “If they see an unexpected performance drop, they are able to switch it into debug mode and can figure out and isolate the issue – typically at the hardware/software interface. They are willing to take the hit in area to be able to keep the debug logic in so they can look at the performance issues in the real silicon.”
Bridging products
Monitoring existing products can be used to help define the next generation. “There are so many processors and so many things that need to be optimized, that the importance of post-silicon analysis and monitors is increasing,” says Stahl. “In the post-silicon phase, you can do more and collect more data. That may allow you to get more information that can be utilized in the next generation architecture.”
A multi-generational product approach can be used. “Tapeout is not determined by the chip being perfect, or done with verification,” says Schirrmeister. “It is driven by needing to meet ‘this shuttle’ or to meet ‘this deadline.’ ‘I need something now and I have to have enough confidence that I have validated enough to get something reasonable.’ For the next generation, you use traces created from the previous chips and from virtual platforms to essentially define the environmental characteristics. You really make the planning across multiple generations of chips, especially for bigger companies. This is the notion of validation and optimization of performance. These are hard-to-find issues in modern design.”
Conclusion
While part of the task of post-silicon verification has been pushed into the pre-silicon phase, chip complexity also has made post-silicon verification and validation more important than ever. The complex interaction of events within a chip means that products have to be tuned for the traffic patterns and scenarios they see during deployment, and that information needs to be captured and analyzed.
New verification technologies like Portable Stimulus are making it easier to bridge the gap between the two phases, and now testbenches can be developed and debugged pre-silicon and can create huge numbers of scenarios that are then executed post-silicon. That common environment makes bringing issues back into the simulation/emulation world much easier.
Leave a Reply