Containing The Explosion In Data

Designs are getting bigger, verification runs longer, and every stage of development and deployment provides valuable data — if you can find it.

popularity

The amount of data that could be kept for every design is gargantuan, but even that may not be enough these days as lifecycle management, continuous verification, regulatory requirements, and globalization add to the data that needs to be stored.

But data has no value if it cannot be found or used in ways that provide more benefit than the cost of storing it. “Data management is not unique to IC development and verification,” says Matt Graham, product engineering director at Cadence. “Our industry is probably paying closer attention to the data because, unlike so many other big software projects, we have to have first-pass success. The expectation is that we should be able to, and we need to be able to, leverage all of the additional data that we now have the ability to collect and store. And not just to have it, but actually to improve the overall process.”

Where is the data stored? “The bigger it gets, the more pressure to go to cloud computing,” says Simon Rance, head of marketing at Cliosoft. “Then you can quickly scale. But that comes at a price. Until a couple of years ago, there were security concerns. That problem has gone away because it’s been shown those type of cloud companies have much better data protection than most on-premise solutions. Verification is going into the cloud because of the sheer amount of it and the processing needed for it. We are waiting to see more upfront digital and analog design go to the cloud. There’s a lot more manufacturing and test data, and all the data that goes from end to end. Does that have to sit in the cloud, or is there a way of collecting and maintaining all of that data on-premise?”

Verification is certainly causing a data explosion. “With emulation and prototyping we now can gather more verification data and do more exploration of broader and more complex use cases, something that truly resembles the usage of the device,” says Cadence’s Graham. “We can start to do things that look at hardware-software interactions. But data management isn’t just about data collection and keeping an audit trail. You need to be able to leverage that to meet the demands of our customers in terms of their time to market or other efficiencies.”

The data that needs to be kept is changing, and so is the value of that data. “Today, it’s all about IP blocks, and how those IP blocks are assembled,” says Dean Drako, CEO of IC Manage. “Design assembly has become a new skill set, a new set of tools, a new set of challenges. I may have three or four IP blocks that were designed to hold things together, but I’m re-using 400 blocks that were used in a previous design. It’s all about this giant assembly process of IP that already exists, some of which is new and improved.”

Multiple levels of data
When considering blocks of IP, there are two sets of data that are important. First is the data associated with the development and verification of an IP. The second involves how and where that IP is being used. In many companies there is a large repository for IP blocks, a number of which may have several versions, each modified from an original to optimize it for a particular use case or integration requirements.

The first challenge is finding the right block. “They are effectively asking for a search engine on their IP blocks so they can find the one they’re interested in,” says IC Manage’s Drako. “They need to know what version of it is available, which version of it is tested, what is the status of the 17 versions of this IP block, and which process technologies is it suitable for. And then, as appropriate, it can be pulled into the design they are assembling to basically build what they want.”

Some blocks require more work than others. “A piece of IP sitting on the shelf may have worked in silicon three times. If we are not changing it, we may just do a very cursory check on it and put it in our device,” says Graham. “We’re not going to fully re-verify it. But when we do need to fully verify it, we need to bring it off the shelf, re-achieve whatever results we had before, and use that as our baseline for moving forward. So what is the minimum set of data that we need to be able to archive something?”

That goes way beyond just the Verilog and some testbenches. “Design teams are adding notes, where they are basically sharing some of the gothchas they saw during its previous usage,” says Cliosoft’s Rance. “Those are dos and don’ts for the next go-around. They may have figured out why something didn’t work, why they had problems integrating it with other pieces as originally planned, and that is being captured as digital notes in the form of metadata. This is analysis during the current design that will be useful for the next design, or even future designs.”

What metadata to keep is important. “We attach a lot of parameters or metadata to each IP block — when was it used, what its performance is, how well tested is it, how many designs has it been used in, which engineers used it,” said Drako. “Then you can filter and search for the things that match the requirements.”

Context is important. “When a block of IP is selected, the design team may not exist anymore,” adds Rance. “But the metadata should allow them to pull up that previous chip design so you can see where it lives in the hierarchy, what it was associated with, what were all of the files that are associated with it, and all of the other pertinent data. That’s the value-added data that we’re adding to the actual design or real data.”

Store or regenerate
One of the evolving questions is whether data should be stored, or if it’s enough just to be able to regenerate the data when needed. “Because of the sheer volume, and the cost of resources such as emulators and the power they consume, we don’t have the luxury of regenerating those results all the time,” says Graham. “What amount of code coverage data do we need to store? Can it be an aggregated, summarized type of result, or do we need certain things for each individual simulation that was executed? What amount of detail of the failures? Can we just say how many there were and in what category, or do we need to store deeper information? Is it enough to have the data to be able to recreate a given failure, or do we need even deeper information? Do we need to keep wave traces and things like that? We’ve almost gotten away from this idea of recreating those results. We don’t have the time, it is a luxury to recreate those results. We need to be able to retrieve whatever data it is and be able to do some meaningful analysis with that, but without having to store everything. Storing everything we captured on the initial run, ad infinitum, is not an option.”

So many questions, and such little industry consensus. “Many people are undecided about what needs to be kept and what doesn’t,” says Rance. “It needs a solution, and nobody seems to have found that magic solution yet. There is a ridiculous amount of verification data. Customer A will say we can’t throw data away because if something goes wrong, we may need that data to figure it out and analyze why. The ability to regenerate that data is a primary driving consideration, but some want instant access to that data when something goes wrong. It’s not just the regenerating, it is also for root cause analysis.”

“It is just not possible to keep all the verification data,” says Drako. “Very rarely will you not have the ability to recreate it. That’s generally how it’s been done for the last 20 years. There are probably some folks talking about keeping more of it, probably not all of it, but keeping more of it. We do metadata creation from verification results. And then we store that metadata in databases to view trends, and to view status on convergence towards design completion, and do statistical data analysis.”

Metadata
Metadata has become a crucial aspect to stored data. “It is important to track associations,” says Rance. “What is that piece of IP associated with — not just from how it is logically connected, but where it is hierarchically in the system? It is a spider’s web, connecting everything to that piece of data so that we know not only where it is, or what it is associated with, but what its version number is and where it is hierarchically. That allows you from a search function to pull this up, and you can visualize it graphically.”

Increasing amount of data are being condensed into a much smaller amounts of metadata. “Today, we spend more time generating algorithmically complex kinds of metadata,” says Graham. “For example, 10 years ago, instead of storing an entire log file from a simulation or a formal run, we extracted failure or warning messages from that. It was compressed plain text, but it was error at time X. Some checker failed because of value x y z mismatch. We could then throw away the log file, throw away any compiled binaries or anything like that, but keep that metadata in terms of error messages and the environment. Today, the metadata is getting progressively more and more complex. For example, we may want to store toggle coverage. We do not store how many times a given signal toggled, but did it toggle at all? The metadata goes from a 32-bit counter to a single bit. We could reduce that further for a given hierarchy. We may not care that individual bits in a bus toggle. We just want to know that all the bits or none of the bits toggled. Again, we can collapse that further and further.”

There is always a danger that when collapsing data you throw away something that ends up being important. “The metadata has to be really, really, really right,” says Drako. “Otherwise, it is a recipe for disaster. Mistakes do happen, but that metadata is often more useful than the design itself. A search engine needs metadata to know that this is an ALU or an Ethernet, and it is almost impossible to know this by looking at the design files. It needs to know how fast it is, how much power it needs, does it have the ability to be put into power down mode – these are things that might be important to the designer, and can get answers without ever looking at the circuit or the code.”

Keeping track of data
Where and how data is stored and managed is not obvious. “Some people think the verification plan should be the one all-knowing, all-seeing document because it contains the representation of the results that were meaningful at the time,” says Graham. “It also can contain a bunch of other metadata or environmental data. The challenge is that it assumes that the verification tool chain and one EDA vendor is the center of the universe for all verification activity. Add to that the idea of Application Lifecycle Management and there is no single all-knowing, all-seeing document that says what was done, how was it done and how you could redo it. Instead, there is an interconnected set of tools where the verification tool can interchange data with the release management tool, and the revision control tool and the requirements management tool, and the bug tracking tool, so that all of those pieces of all of those systems are in-sync for a given project. Then when you archive that set of data, it can be archived as a set.”

There are pressures for this to become more centralized. “We are seeing the requirement, especially from industries like automotive, where they don’t want multiple systems because that is difficult to manage,” says Rance. “They want to try to define one single data management solution where somebody can go in and you can put in a search query. And you can start investigating a failure and see what it is associated with in the entire process — not just in the chip design itself, but all the way through to manufacturing.”

That certainly would make some tasks easier. “If you look at one of the design management systems, they may have thousands of designs potentially being worked on simultaneously,” says Drako. “There is giant sharing of IP across the company. You want anyone in any part of the world to be able to pull a block and use it in their chips to get their job done — until you start getting into government projects, or across different geographical boundaries, different regulations and restrictions. Now you start to get into a lot of concerns about IP security. Overlaid on top of the design management system is a very sophisticated security layer to restrict access to the proper people, to see the things that they need to see to get the job done, but not see the things they don’t need to see and shouldn’t be able to see.”

And part of the necessary security is licensing. “It’s not just who owns the data, but who has the rights to it,” says Rance. “If something is to be re-used in a year or two, the system may flag that you have these pieces of IP, but you can’t go ahead and re-spin this because the license agreements have expired. That type of information is sitting in the legal department. This is another aspect of this data problem that we are being asked to solve. Nobody wants to get in trouble for using a piece of IP they have not licensed.”

Extending verification
The problems associated with the size of verification data already have been discussed, but this will continue to grow. “Verification is no longer something that starts and ends,” says Graham. “It may ebb and flow. There are times when it’s busier and there’s a deadline to be hit. But when the deadline is hit, that’s a milestone, not a finish line.”

Verification is becoming a continuum from concept to design to deployment and back to a modified concept. “The virtual platform is becoming the means to increasing amounts of verification,” says Simon Davidmann, CEO for Imperas Software. “It is the only simulation platform until they have RTL. Once they’ve got RTL they can try to run Verilog, but it’s too slow. They can put it in an emulator, and that’s a bit better but it’s often too detailed. When they have taped out, they still use the virtual platform because they are still developing software. It’s not just for verification. It’s because they’re building applications, layering and putting all the software stacks on top. Even when the devices come back, some work continues on the simulation platform because you have much better visibility and controllability. The need for verification might diminish after tape-out, but the need for simulation continues. People are doing continuous integration, even after they’ve got the designs back and the designs are in the field, because it’s a much better development environment.”

In some industries, the concepts of test labs are being replaced by digital twins. “It is relatively normal for a test organization to keep several versions of every chip in their test lab, so that when they’re going to do a new software release, or upgrade, they can test it on every known existing hardware variant,” says Drako. “But the test lab will not have all of the versions of the hardware. Think about Microsoft’s compatibility test lab for a new release. It is massive, even though it is still not enough.”

This is causing the industry to look at becoming smarter. “We’re spending a lot of energy to analyze the genuine differences between each one of those 100 different implementations,” says Graham. “Then we can focus on those. What is the delta between two designs? In some cases it’s 100%, but in a lot of cases it’s a small percentage. Can we use data analysis to understand these differences? We need to make the models, or digital twins, be parameterized or configurable.”

Then, generating those differences in a model is easy. “With a model you can easily flip from version one to version 1.2 of the IP, and there’s no physical change,” says Imperas’ Davidmann. “If you have to reflash your FPGA in your regression farm, that is a nightmare. The use of simulation isn’t just for verification anymore. That will continue to be an important usage, but it’s for developing the software.”

Conclusion
The amount of design and verification data that can be stored is massive and growing, particularly in what has traditionally been thought of as the verification space. But the roles for models and virtual platforms and digital twins are becoming heavily intertwined.

At the same time, most SoCs are a combination of multiple IP blocks, the majority of which are reused, and this has yet to have a major impact on the amount of verification performed. Both of these are methodologies that in the past were based on brute-force solutions, but that is no longer practical. While the industry may have recognized the importance for change, that change is still a work in progress.



Leave a Reply


(Note: This name will be displayed publicly)