CEO Outlook: Chiplets, Data Management, And Reliability

Fundamental shifts and persistent problems in current and future semiconductor design.


Semiconductor Engineering sat down to talk about changes in chip design with Joseph Sawicki, executive vice president for IC EDA at Siemens Digital Industries Software; John Kibarian, president and CEO of PDF Solutions; John Lee, general manager and vice president of Ansys’ Semiconductor Business Unit; Niels Faché, vice president and general manager of PathWave Software Solutions at Keysight; Dean Drako, president and CEO of IC Manage; Simon Segars, former CEO of Arm and board director at Vodafone; and Prakash Narain, president and CEO of Real Intent. This is the second of three parts of that conversation, which was held in front of a live audience at the ESD Alliance annual meeting. Part one is here.

SE: How do you know what you’re looking at is good data when you’re designing a complex chip?

Narain: We are focused on a very, very specific problem. If you look at security or product lifecycle management, you have to quantify that. But now we’re going to make use of that function, so you have to plan for it. It has to become part of your strategy, in which case you have to sign off on all the other variables that need improvement. At the end of the day, if you look at the workflow, it’s a collection of discrete sign-off steps that one needs to go through to get a reasonable point of failure and develop a plan.

Faché: Data management is a core capability that we all need. There’s a tremendous amount of data. It starts with requirements. It’s data from simulations that are coming from our all of our tools. Our test equipment is generating a large amount of data. The key is really to make sure there’s a good methodology to explore the data and tag it. That allows you to relate that data to a particular design or test setup, or the conditions under which it was generated, and then you can then get insights from it. You want to be able to compare simulation and test, look at that correlation, and use it to improve design and predict how an actual product will behave. That’s why data management is a critical capability.

SE: But sharing of data is a big problem. A lot of critical data comes out of the fab. How does it move through an organization so it can be shared with the chip architect and other parts of the design-through-manufacturing chain?

Kibarian: Everyone in this industry is super-secretive, yet we’re the most collaborative industry in the world. Somehow we manage to share data, and do it in a way that’s respectful. There are all sorts of agreements that get put in place to make that all work — how you share data, how you store it in the cloud, the amount of encryption required for zero trust. And then there’s the science fiction-like stuff that people are playing around with, like homomorphic encryption of data. We play around with a lot of this, too, but today it’s really about agreements for who can see what data. Today, PDF stores a few petabytes of data for customers around the world, so we deal with that piece of it. But in the future, we could do a lot more with encryption technologies that allow you to work with the data.

Lee: There’s both too much data, because you can generate a lot of data, and a problem with what you do with that data. That’s where AI/ML can and does help. At the same time, we actually don’t have enough data, and that goes back to coverage. System sizes are getting large, so you will never have complete coverage. AI can help there, too. I want to make another comment on data. If you look in a single company, and let’s say they’re doing a 3D-IC, they have chip team, they have a package team. And the data being used by these teams might be extremely crude, like a spreadsheet or an e-mail that says, ‘Hey, go look at this file.’ So the digital transformation that exists in many places outside of semiconductor design really has not been solved within the EDA community. There are a lot of good tools and techniques out there that we can bring into our industry.

SE: Most of the chiplets that have been utilized so far have been developed in-house by big chipmakers. As we get into more of a virtual marketplace, how do we make sure all these chiplets will work as expected and that they are characterized consistently? Is this even going to work? And is this really just an extension of the IP market?

Segars: You can think about IP in two ways. You can ship RTL — soft IP — to people to create software that is usable. You have to have the very latest functionality. You have to implement it in some process to get similar results in performance, and then be able to go and play around with it. The complexity around shipping libraries or memories in chiplets means you’ve now made a hard implementation of that. Then, the moment you lock it down, somebody wants something slightly different. And then all the work you did on proving it and building it and sending it off kind of goes out the window. So there’s a bit of a shift that’s going to be required to make use of chiplets. But it’s a practical way forward and people are really comfortable with chiplets once they’ve done them. I do think there is massive potential to be able to optimize everything in digital, put it onto a substrate with some analog, silicon photonics, power, and all of that in one package. To me, that opens up new dimensions for integration, performance, and energy efficiency. Right now it’s people with black belts in everything who are doing this. But this creates huge potential.

Sawicki: There are two metaphors here, and they’re both pointing in different directions. One is the IP that has existed in our industry for a while, where you put together relatively complex systems and you can buy them off the shelf. When they get integrated to your chip, you get options for customization that allow you to optimize the overall system performance in interesting ways. The other metaphor is that these chiplets are like a package part that you bring onto a board, which gives you the opportunity to not have to have all the R&D costs, but you still get that flexibility. As more and more people accept that challenge, you identify where it’s working so far and take control of that whole thing. Xilinx was the first one. Now Intel and AMD are doing all these things. Do we go back to the same model? This is an interesting dilemma.

SE: How are we going to test the chiplets before we put them together and manufacture the whole system?

Faché: If you look at chiplets, and you have the physical design, it requires a lot of physical simulations — electromagnetic and thermal. There are a lot of tools out there, but it’s critical that these tools are applied to solve the problems they’re well-suited for, and that they’re integrated in an overall workflow so you’re not going from a highly specialized tool to another specialized tool. In that case, you would have to deal with data transfer, which takes a lot of time because there are a lot of errors potentially associated with that. You need the right tool for the job and it has to be integrated in an overall workflow.

Lee: We’ve worked on worked on products that are not just for a point in time. If you look at a system in an automobile, you want that to be operating for 10-plus years. So onboard sensors, onboard data in/data out is extremely important, both pre-test and throughout the whole lifecycle of a part. And the problem there is that we can get as much data as we can measure, but there’s a lot of data we can’t measure. That’s an opportunity for AI/ML and multi-physics-based models.

Drako: Chiplets are no different than the multi-chip modules that were doing literally 30 years ago. It’s messy, it’s hard to test, but we’ve been doing it for 30 years. Today it’s a little harder, but we’ll figure it out. There’s no magic stuff there. As far as the integration between the devices, it’s no different than the TTL Data Book we had in the 1970s with your pin out. ‘Here’s what the pins do. Here’s a CPU core, etc.’ But instead of a board, we’re going to use some other substrate. We’re going to be really good at that. There’s no crazy innovation needed there. But on the data front, we’ve got production data, test data, yield data, and simulation data. And And then we have a design, and there are different piles of data. And with AI, we have a completely new pile of data, which is a training data. If you think about the data we use in the EDA industry, it falls into four or five very distinct piles. We have a system for distributing and managing that data and put a process around that so we’re not sending spreadsheets between engineers and e-mails to do the hand-off. Our customers do that kind of thing for their design data. But training data is a completely new set of data the EDA industry doesn’t deal with. So which data are you asking about?

SE: All of it. Isn’t that the issue?

Drako: We’ve got design teams in several countries working around the clock doing different things, and that data needs to be available and distributed. That’s hard. It’s big. There are at least three to five countries working around the clock doing different things. You need to have that if they’re not working from home, and that data needs to be available and distributed. That’s hard. And so how do you project it, share it, control it, and secure it? It’s a big problem, and we will continue to work on it.

SE: You’ve got this data, but you now also have these other elements coming into design, which is that chips are being used in mission- and safety-critical applications, and they’re supposed to last much longer. How do we improve reliability throughout their lifetimes? What has to change?

Kibarian: People now instrument chips to collect data when they’re in the field. There are standards like ISO 26262 that say you’re going to test the chip when it’s in the field, and that data is going to come back. You need to have the data from manufacturing to compare it with. One of the things we need to work standards around is making sure the data is not transmitted somewhere, and that it’s secure, starting with blockchain or other technologies to make sure that the information from manufacturing is really the data that was there. Whenever there’s an RMA, there’s always a fight about why is that data like this? And if that chip was bad, well, other chips may be bad. Prove to me that these are the only chips that you’re going to do a recall on. That always happens between the chipmaker and its customers with yield data. There’s another issue around zero trust. Besides encryption, it means that when you as a supplier say that you’re storing the data for 20 years, you will store it without changing it to support your needs. So besides encryption, besides long-term storage, besides the telemetry on the chip, you need to then deal with whether the data you’re pairing it was changed or manipulated. That’s what blockchain technologies always improve on. It’s immutable. It hasn’t changed.

Narain: If you have a mission-critical application, you need to design the ability into the system for diagnosis and recovery. That requires planning and designing at a very high level from this particular system you’re building. You have to figure out your reliability strategy and implementation and make sure it works well. Verification requires you to identify the single points of failure and make sure they are covered. And then, what is the tooling and verification mechanism? So the requirement is to eliminate single points of failure, and then create the tooling and verification mechanism.

Related Reading
Chiplets: Deep Dive Into Designing, Manufacturing, And Testing
EBook: Chiplets may be the semiconductor industry’s hardest challenge yet, but they are the best path forward.

Leave a Reply

(Note: This name will be displayed publicly)