Why a combination of verification engines is required to successfully develop a new class of chips.
I have previously written about the choices that design teams have when choosing specific verification engines—virtual, formal, simulation, emulation, FPGA and actual silicon. As a new class of SoC is emerging for machine learning and artificial intelligence with complexities previously unheard of, they further deepen the challenge of choosing the right tool for the job. Even the choice between the different hardware-based verification engines becomes even more intricate, and it becomes clear that really only a smart combination of engines will help verification teams to be successful.
It turns out I seem to have been writing about this for years. In “The Agony of Choice” back in 2012, I essentially argued that a combination of engines in what we now call the “Verification Suite” is necessary. In “The Agony of Hardware-Assisted Development Choices” in 2014, I mused about how to choose the best type of acceleration. Now, look where we are today, with chips like the Nvidia Tesla V100. It features 7.8 teraflops double-precision, 15.7 teraflops single-precision and 125 teraflops deep learning with 300GB/s interconnect and 900GB/s memory bandwidth. It’s 21.1B transistors on an 815mm2 die. And others, like Google, are joining the party too with their 180 teraflops Cloud TPU.
So how does one verify chips like this?
With great difficulty! But EDA is here to help, especially when one uses the different options provided in combined engines. The basic requirements that I outlined in previous blogs and articles are still valid. Users are weighing the cost of engines against the performance, capacity, bring-up time, accuracy and debug capabilities for hardware and software. Secondary needs, such as fine-grained execution control, system connections, and advanced use models for power and performance optimization, are expected to be supported as well—not in all engines equally, but different use models utilizing the appropriate engines for the task at hand. For instance, pure software development requires speed, and often FPGA-based approaches are best for that. Detailed hardware/software debug requires accuracy and debug versatility; processor-based approaches for simulation and emulation are superior in that domain.
The new class of SoCs described above push the boundaries to previously unseen challenges in the domain of monolithic multi-billion gate designs.
Emulation has long been pushing the envelope of capacity—our last system Palladium Z1 extends architecturally to 9.2BG; other industry players have announced a roadmap that they claim will get them to 15BG in 2020. The two drivers to get to that capacity are a trend towards emulation farms for lots of verification jobs, as well as the need for very big designs that span multiple billion gates. Regarding the former, we actually started calling emulation “Verification Compute Platforms” back in 2010, and we have now several installations that monolithically extend to 4.6BG. Regarding the latter, customers confirm that processor-based emulation is uniquely positioned to serve very, very big designs like Fujitsu did in late 2016 in “Fujitsu Adopts Cadence Palladium Z1 Enterprise Emulation Platform for Post-K Supercomputer Development.” In this press release, Akira Kabemoto, corporate executive officer and senior executive vice president, says, “In order to build a high-performance, scalable supercomputer system, we needed a solution that could accommodate designs larger than one billion gates, and the Palladium Z1 enterprise emulation platform met all of our complex requirements.”
So what makes processor-based emulation unique? It is its emulation throughput.
In “Towards A Metric To Measure Verification Computing Efficiency” back in 2015, I had outlined the need to optimize the combination of compile, allocation, execution and debug. When we announced the Palladium Z1 Enterprise Emulation Platform later that year, it became the main metric by which we measured improvements.
Many considerations impact the choice of engines for hardware-accelerated development. Charting performance over capacity, processor-based emulation is the only practical solution to deal with those very big BG designs required to enable AI and machine learning. Even in the smaller capacity domains, 300MG and below, emulations debug at full-speed differentiate it vs. FPGA-based emulation, not to mention being able to deal with 4MG small granularity and utilization of available resources, which I will leave for another blog.
So bottom line is that both FPGA-based prototyping (not emulation, I said prototyping) and processor-based emulation are needed. I argued that in “Balancing Emulation And FPGA-Based Prototyping For Software Development,” citing several customers (like Microsemi) who do it elegantly. A congruent flow between them gives you the best of both worlds and relieves users of the need to compromise on an in-between approach that is FPGA-based emulation.
Leave a Reply