The importance of moving to a high-level synthesis flow.
Every design team is looking to reduce RTL verification time in order to meet aggressive schedules. Successful teams have moved their level of design abstraction up to the C++ or SystemC level and employ High Level Synthesis (HLS) within their design flow. By taking advantage of this high-level description, these teams also plug into integrated C and RTL verification flows, reusing tests and testbenches, and automating the test environment.
Just moving to an HLS methodology simplifies code (permitting quick what-if architectural analysis), allows for running more tests, and reduces simulation time dramatically. However, this time savings can be eaten up by the time spent trying to close RTL verification. More techniques are needed to attack as many inefficiencies as possible, including:
Applying these techniques to the RTL verification closure problem provides a major step toward C++/SystemC signoff, which allows teams pressured to deliver increased functionality with fewer resources to achieve more productivity.
The first step to saving verification time is to check the C++/SystemC HLS source code for common mistakes and to formally prove user-defined assertions. Some code bugs are impossible to catch with a testbench, such as out-of-bounds array access and reading from uninitialized variables. Other common checks include divide-by-zero, incomplete switch or case statements, and illegal shifts. Without a testbench, formal-based property checking can find bugs in the code before synthesis and simulation, making the errors easier to detect and fix early in the design process.
After the checks have passed, the team can confidently generate the RTL with the assertions and coverpoints using an HLS tool. The tool synthesizes the C++/SystemC assertions into OVL or PSL assertions embedded in the generated RTL. This creates optimized code for Functional Coverage analysis.
The HLS tool builds an RTL functional test environment that reuses the original C++/SystemC testbench to simulate the RTL design and to compare its results to the HLS model. This allows the functional verification to remain primarily at a high level for faster development and simulation, while confidently proving that the RTL results are equal. The tool automatically creates transactors that convert function calls into pin-level signal activity.
The RTL simulation can also be used to measure code coverage on the synthesized RTL. To close RTL coverage in an HLS flow, a tool which can read the RTL and formally prove reachable and unreachable code is even more beneficial. Mentor Graphics‘s Questa CoverCheck automatically generates excludes for unreachable code and generates witness waveforms for the reachable code which can then be either automatically excluded or cross-probed back to the original HLS C++/SystemC source, allowing the team to add additional tests. Using this flow, the team can achieve RTL functional and code coverage closure automatically for the generated RTL.
By adopting a high-level synthesis flow, NVIDIA was able to achieve impressive improvements for their 10 million gate, video encoder/decoder. In particular, the team:
NVIDIA was able to cut the development schedule of the encoder/decoder by five months. Then, the team needed to upgrade two 8-bit video decoders to 4K 10-bit color in order to support their customers. Using the HLS flow, they successfully delivered their IP in weeks. Without HLS, they would have had to cancel these designs due to schedules that would have been impossible to meet with an RTL flow. The success of these projects removed any skepticism about HLS within NVIDIA and led to its use in all future NVIDIA video and imaging designs that include new or re-designed components or that target different standards or process technologies.
NVIDIA noted another significant advantage of moving to an HLS flow: it establishes a continuous process of refinement that leads to better end results. This process is called micro-architectural exploration and it supports performing what-if analysis by changing particular design parameters and running a quick synthesis to immediately see how these changes impact the design in terms of area, performance, and power tradeoffs. Micro-architectural exploration opens up a design to improvements that could not have even been considered in a traditional RTL flow.
A point of emphasis in the NVIDIA verification flow was to compare two different versions of the C model. One model was a high-level, purely functional C model, that the team considered the “golden” model. Using the continuous refinement process, they created the optimized HLS C model. They ran these two models next to each other using the same test sequences to ensure that they yielded the same results. Because C models are much simpler than Verilog models, verification engineers could run 1000X more tests on the C code compared to what they could run on the RTL code. This provided better coverage and allowed bugs to be fixed early on when they are easier to resolve. It also produced bug-free RTL and significantly reduced the time and effort required for RTL verification.
Standards are key to enabling HLS and to the goal of moving up in abstraction to C++/SystemC. Accellera recently announced the SystemC Synthesizable Subset which is a big step toward this goal. This standard documents SystemC constructs that a team should use to create their design in order for tools to synthesize the code correctly for RTL generation. By following this standard, teams will have confidence that their code will synthesize and simulate without violations, saving debug time.
To further accelerate the standardization process, Mentor Graphics will donate the open-source Algorithmic C (AC) Datatypes library to the Accellera Systems Initiative. AC Datatypes provide an easy way to model static bit-precision with minimal runtime overhead. The datatypes were developed to provide a way to write bit-accurate algorithms that can be synthesized into hardware and they enable precise modeling of bit-true behavior in high-level design descriptions while accelerating simulation speeds up to 100X versus other datatypes. For example, ac_int of bit widths in the range 1 to 32 can run 100x faster than the corresponding sc_bigint/sc_biguint datatypes and 3X faster than the corresponding sc_int/sc_uint datatypes.