中文 English

Data Management Evolves

Huge increases during the design cycle require new techniques to manage data.

popularity

Semiconductor Engineering sat down to discuss data management challenges with Jerome Toublanc, business development executive at Ansys; Kam Kittrell, vice president of product management in the Digital & Signoff Group at Cadence; Simon Rance, vice president of marketing at Cliosoft; Rob Conant, vice president of software and ecosystem at Infineon Technologies; and Michael Munsey, senior director of technology solutions sales at Siemens EDA.

Above, left to right: Jerome Toublanc, ANSYS; Kam Kittrell, Cadence; Simon Rance, Cliosoft; Rob Conant, Infineon Technologies; and Michael Munsey, Siemens EDA.

 

SE: How does the industry stand with data management? Where are we headed?

Toublanc: It’s everything we care about. As you go through the design flow to sign-off for the chip-package-system, we need a lot of data, and we generate tons it. And our customers are willing to make sure this is properly done to guarantee the success of the product.

Kittrell: From SystemC through RTL, and RTL through place-and-route — as well as 3D-IC, which is emerging now — it’s mainly an assembly line of data. It’s all going from concept to building things along the way with building blocks. The concept is strong about data management within this flow context, because every customer builds their own flow that they customize on top of to handle their own unique, specific data requirements. We’ve added some utilities along the way to help people compare different runs as they’re building the chip, and we see this expanding rapidly as we go forward.

Munsey: I’ve been in this data management space now for 10 or 12 years, and I’ve seen it grow from small clumps of data that were very tool-specific or application-specific. It’s grown to the point of being very unwieldy. The metadata needs to be treated as part of the product, and that needs to happen from a product lifecycle point of view. So as the product and process develop, the data needs to be better structured and better tied back to the product development processes. You need to be able to analyze all this data as you go along, making intelligent decisions about what happened in the past, and what you need to do in the future as it maps to the way the designs are being met.

Rance: We have products that help manage design and IP data. We’ve been seeing is a huge increase of data management and IP management, in general, as well as an increase in IP and design reuse. More and more data has be managed, or has to be found. And depending on the size of the company, that data could be on one continent or different continents. You may not know where the all the design data is, so there’s a need for both design data management systems and for improved collaboration. How do you re-use those designs? Do you know the quality of that data so you can apply it to a future design? That’s where we’re seeing that transition now in the data management space.

Conant: At a different level, I came to Infineon through the acquisition of a company I started that does product analytics for connected products. What we’re doing is aggregating and collecting data about how our semiconductor products behave in the field from inside our customers processes. I would bet that each of you has consumer electronics products that use the data collection capabilities of Infineon WiFi chips. We are improving our semiconductor products and software in those products from data in the field with hundreds of millions of data points a day. It’s a massive data problem. It’s also a massive AI/ML opportunity to improve the performance of those products in the field. There are people who have been working on WiFi for 20 years, and we’ve found things in the data that blew their minds. They never knew these situations existed in the field. Their algorithms and the semiconductors are working in the field in a very different way than they expected, and we only found that by aggregating and collecting data.

SE: What do global design teams need to be concerned about from a geopolitical perspective when it comes to data management? How do you share the data properly? Who can see it? Who can’t? How do you put those protections in place?

Munsey: There are multiple levels to this, because you could have access to data, but things change if you travel to another region of the world. Just because you have access in the U.S., you can’t have access to it everywhere. A lot of IP leakage and data leakage happens inadvertently when somebody goes to China with their laptop, opens it up on their network, and then that’s technically an export of the data the minute it happens. There must be systems in place with protections like geo-fencing. There are certain regions in the world where you could block things based on those regions. Then, there are levels of restriction based on the country you’re part of, and each has certain accesses to data. You can set up guardrails to try to prevent some of the leakage, but you can’t stop them being able to do their job and work together. You still need ways to allow engineers to collaborate. The goal is to build a workspace with a representation of the design, where for the data I can’t see or shouldn’t see because I’m in a certain region, that gets black-boxed. So I could still do the work I need to do, depending on wherever I am in the world or who I’m collaborating with at given times. These different layers must be established at the country level, the person level, and then internal classifications as well, to be able to build systems that allow for collaborations to still happen while you protect the data.

Rance: To go beyond that, it’s not just managing the data. It’s having the administration and the rules to apply to that at a pretty fine granularity. A lot of problems happen by accident. The DoD has established export controls to make sure those systems will prevent data leakage, flag it, and make people aware of it. Sometimes this is done out of malice, but typically it isn’t.

Kittrell: EDA companies put a tremendous amount of effort into protecting data — our data, our customers’ data. There are several protocols, especially when they’re getting IP from customers. A lot of times, we have problems with the tools because they have to share data with us. It can go into certain regions, but only be seen by certain people. We have to enumerate who they are, and we have to think through all these things. Even for our own employees, you don’t want to expose all data to everyone because it’s too dangerous. There’s too much transition in multiple ways. So it’s got to be really thought through, and revisited often as to what data people need to share. One of the interesting things, whenever we’re going into cloud computing, is that we’re running tests on cloud computing. And some of the IP providers that were benefiting the most from IP expansion in the cloud were the most restrictive about letting their IP go to the cloud for testing and so forth, which is understandable. It’s better to be cautious, because once it’s out it’s gone.

SE: How much data are you seeing today, and how is it being managed?

Toublanc: That’s a difficult question. The amount is difficult to quantify. It will really depend on the type of product. But the volume expands very quickly. This is a trend we’ve seen over the past few years, and it’s becoming a real challenge for customers because they have to store data. They have to do that in a secure manner to make sure everything is under control. As people continue to move toward cloud computing, they’re opening the door to more simulation or scenarios, but in doing so they expand the amount of data for the same type of product. On top of that, people are now adding machine learning, which is great when you have data, because this is the idea behind machine learning. You need a lot of data, and the business is generating extra data. From a chip perspective, we did not have as much data for very similar projects in the past. If you take a project that’s very similar to another two years earlier, the amount of data is very big because there’s a big increased in bandwidth.

Munsey: Some customers have told me at the end of a project it’s not unusual to have petabytes of data. And this goes back to how you manage it, because a lot of data is unstructured. So what’s important to keep, and what can you do with it? If you do want to mine it later on and look for historical trends, it’s very difficult to predict what you need to keep and what you don’t. More people are looking at data models to define important metrics. If you can define the right metrics, then you can set up systems to scrape data out of report files, log files, other tool runs, and everything else, and then start populating that in a structured environment — even if the data is unstructured — and keep the important stuff for trend analysis and things like that.

Rance: Data volumes are certainly growing exponentially. We see this in verticals like automotive, for example, with sensor data coming back real-time, along with in-field data. Vast amounts are being analyzed in real-time. Some of it is just being stored so it can be evaluated at a later date, like performance improvements or design improvements. I don’t think anybody has the right solution yet as to how to quantify or characterize the data as to, ‘We definitely need to store this and keep that. Or, we need to keep this data for this duration for some sort of analysis, some sort of an analytics.’ That’s the next thing. How do you quantify it? How do you qualify it? How do you leverage it? And when do you finally say, ‘Okay, now we can release it because we can’t keep storing and storing forever?’

Conant: I agree the amount of data is growing exponentially. We serve a lot of different IoT companies, and when companies embark on that process, the first thing they’re trying to do is deliver some value to that customer. But then it pretty rapidly turns into what kind of data can be collected from those products, and improve the performance of those products in the field. Once you get to that point, then the company is the consumer of the data. There are a lot of use cases that are hypothetical or maybe not well defined yet, and so people have a desire to hoard that data. As a result, you end up with massive quantities of data. How long do you keep it? Ninety days? Ten years? It’s really hard to decide. The data models evolve pretty quickly. That means the older data becomes stale over time. But we’re seeing that grow quite a bit. I’ve been in this IoT space for a long time, and maybe ‘IoT’ is not even the right or meaningful word anymore. It’s really about things that are connected to the internet, which includes almost everything that’s electronic — unless it’s air-gapped military equipment. If you look at the electricity meter in a home, the smart grid space, utilities are collecting gobs and gobs of data today. Infineon provides a tremendous amount of power electronics into all sorts of different kinds of applications. When we think about data from the grid, the data from wind farms, a lot of that relates back to the power components that are inside those products, which becomes a semiconductor problem. It’s an opportunity for semiconductor companies to improve the performance of those products in real-world environments in ways that are really defensible and hard to copy. It’s a really strong, competitive differentiator if you want to have access to that data, and to be able to use it to inform the product development process.

SE: What about specific design data from specific tools? If there is learning from each run, does that mean we can just leave the data we’ve already processed behind?

Kittrell: For design implementation, we’re seeing two vectors of massive growth. One is that, just to create a 5nm, 1 billion or 2 billion instance chip, you will have at least a trillion rectangles because all the logic has to be stored there. You’ll have multiple instantiations of these to run. You don’t just have your final product. It’s a given that any time we give the engineers disk space, they’re going to run out of it in about 48 hours. So they have to manage what is important, and they have some methods to do that. Another aspect of this machine learning coming in to assist customers that have been doing designs. This takes an engineer, and usually if they’re doing a lot, they’re not just doing one run at a time. They’re doing three or four runs at a time. Typically, you can only keep three or four experiments in your head at once before you start making a bunch of mistakes along the way. The machine could do 10 at a time, but now you’re generating a lot more data. You’re having to organize that data to get some sort of feedback. This is all built into our AI platform. Where we’re taking that next is, ‘Instead of doing three or four blocks, how do you do it as a sub-system?’ Usually there’s a team that does a GPU sub-system. How do I map that to one engineer? So there’s more and more data, and it’s exploding in two different directions for us. Then it comes back to how to categorize the data? How do I manage it? What is the important information, because there’s lots of unimportant information?

Munsey: And it’s also context-specific. There will be data that’s specific to that design, in the way it was implemented, in a certain process technology. You might want to archive that, but there’s IP in that design that’s going to be re-used in other designs. There might be functional verification information that is still valid, and you want to be able to re-use that information. Again, you need to understand what’s IP-specific and can be re-used, versus what’s implementation-specific that just needs to be archived.

Kittrell: One of the things we have to do is optimize for power. We need to have interesting power scenarios to optimize around. Otherwise, we’ll do damage. We’ll actually hurt the power for the things we care about. The verification guys and the implementation guys usually don’t talk to each other that much. The verification guys have to generate, for this particular version of RTL, this power vector, which is interesting for the implementation. And then they somehow must communicate that.



Leave a Reply


(Note: This name will be displayed publicly)