Looking Inside Of Chips

ProteanTecs’ CEO talks about the growth of on-chip monitoring, the need for deep data analytics and predictive maintenance, and the importance of resiliency.

popularity

Shai Cohen, co-founder and CEO of proteanTecs, sat down with Semiconductor Engineering to talk about how to boost reliability and add resiliency into chips and advanced packaging. What follows are excerpts of that conversation.

Shai Cohen, co-founder and CEO of proteanTecsSE: Several years ago, no one was thinking about on-chip monitoring. What’s changed?

Cohen: Today it is obvious that a solution is needed for optimizing performance, power, and reliability at the same time. The big hyperscalers and automotive OEMs all are saying the same thing. There are so many issues involving variability, downtime, and safety related to changes in electronics. It’s very difficult to develop a chip or a system using 16/7/5nm, and now 3nm technology, while meeting all the requirements for mega-functionality with high reliability and high quality. Additionally, you still must meet the performance and power requirements.

SE: There seems to be a shift away from everything having to be perfect, as long as it still works. Is that correct?

Cohen: Yes, and resiliency is a major part of it. Today’s solutions are very costly, meaning there is a lot of redundancy. It’s happening in hardware, but also in software for mission-critical applications. You’ll see applications running a few times to make the right decision, and that’s very expensive and time consuming. We thought there must be a better way to approach resiliency. The whole concept of having more data and analyzing it is not new. We’ve found that analyzing data is not enough, because the data alone may not lead you to make the right decisions. Analytics by itself is limited to the underlying data, and without the right data, there are too many unknowns. At the end of the day, a data center has to remain up and running, and a car must remain safe. We came up with this concept of deep data analytics over five years ago when we founded proteanTecs. We thought it would be adopted first by the automotive market, but the reality was different. Data centers were our first adopters. Today we’re gaining a lot of momentum in the automotive area, and we’re moving fast into other applications as well, such as mobile and communications. There are so many markets that can benefit from deep data solutions.

SE: That’s a lot of data, and one of the challenges is being able to process it locally, right?

Cohen: That’s correct. In the past, we’ve talked about our multi-layered solutions, including the Agents, which are the on-chip monitors that create the data. Once this data is generated, then our machine-learning algorithms make sense of it, and finally, you can analyze the insights using software in the cloud. There is also another vital layer for us, which is a kind of proxy software that sits at the edge and helps us bring the data as close as possible to the decision point. That edge device could be many things. It could be a tester, an ECU, a server, or a switch. Software is loaded with models that are based on measurements and data collection, and even some higher-level data analytics. Customers apply these models into the edge software for near real-time decisions, such as inline outlier detection or power reduction, on the test floor. Or if the system is in the field, you can monitor its performance against thresholds for continuous diagnostics. If it’s a car for example, you can track if there is application overstress, or if there is a potentially hazardous situation involving the hardware.

SE: You’re talking about predictive analytics, where the goal is to identify potential problems before they turn into real problems?

Cohen: Exactly. The edge software and cloud analytics allow you to use models for predictive decisions. But it’s predictive based on real measurements, with calculated time to failure. And it’s correlated to the physics of failure so you can easily find the root cause. Predicting a failure — or predicting something will go wrong — allows the service provider to perform predictive maintenance or prescriptive maintenance. For prescriptive maintenance, we added one more element to the software, which is real-time applications. These applications can start with known things, such as AVS (adaptive voltage scaling) or DVFS (dynamic voltage frequency scaling). AVS and DVFS can be based on a much broader set of parameters so they are deployed in the field in a more accurate and high-reliability manner. Today, throttling voltage or frequency is not widely adopted in mission-critical applications because of the overall accuracy, safety, and reliability requirements. But you could reduce power and increase performance in a trusted manner based on the insights coming from deep data analysis.

SE: If you get more granular with the data, can you do more with it?

Cohen: It’s both granularity as well as more data types. It’s moving from today’s preventive maintenance (pre-scheduled) to predictive maintenance to even prescriptive maintenance, and at the same time, providing measurements to increase the lifetime expectancy, while improving performance and power. Our Agents can interact with the software. It’s not just a means to throttle the hardware for the right performance, life expectancy, or power. You can optimize and better utilize the hardware because you have visibility into all that is happening inside the chip.

SE: That requires partners, though. What’s going on in the ecosystem?

Cohen: The ecosystem is growing. You’ll start to see even more discussions and developments around deep data analytics and lifecycle monitoring. Companies are already adapting this internally because the need is there. There are also additional companies entering this market too, which is a sign of market validation. We started this whole new category, and today we have an end-to-end, holistic solution — from the birth of a chip to in-field applications.

SE: Any plans to use in-chip monitoring for detecting security breaches or unusual activity?

Cohen: Since there is so much information coming out from the chip, we have a way to find a signature — a unique ID, which can be used for supply chain security purposes. We’ve developed a solution for counterfeit detection and device authentication, and have customers working with us on this. But it all goes back to your question about where we can extend this technology. The main market need we are addressing is so massive, that even without having other opportunities we can grow significantly in the next few years.

SE: There’s a lot of data available from inside the chip. How much of a problem is latency in being able to utilize that data quickly enough?

Cohen: We took that into account from the beginning. The amount of data that is transmitted out of the chip is very minimal. It requires negligible bandwidth, and there is no latency effect. We’re adding Agents with very little impact on PPA (power, performance, area). That’s our first priority, because otherwise it’s going to be very hard to implement. We have minimized the amount of data flowing out to the edge or the cloud.

SE: What happens when your technology is used inside an advanced package?

Cohen: The reality is that 2.5D and 3D packaging is happening. People have been talking about that for many years. I remember doing an MCM (multi-chip module) back in 1991. But now, there is a huge wave in the industry to assemble these different types of advanced packages. For us, it’s a great opportunity. A few months back, proteanTecs joined the UCIe (Universal Chiplet Interconnect Express) Consortium, the new open industry standard for universal interconnects. One of our high growth offerings is our interconnect monitoring solution, which measures the quality of every bump in between the chiplets. We published some of our work with GUC in a white paper that features recent silicon results. Our Interconnect Monitoring Agents are integrated in their PHY, providing the ability to measure the quality of the die-to-die interconnects between chiplets in a granular way. This can be done in test, but also during operation. By connecting the whole thing, you suddenly have a full picture. Otherwise, you have thousands of bumps, and you’re basically blind to what’s happening inside. And with advanced packaging, that’s a very expensive risk.

SE: And this is where the resiliency comes in, right? The biggest stress point is probably a bump in the corner of a chip. But if you know it will fail, you can take measures to keep the device operating.

Cohen: Exactly. From an interconnect perspective, this technology is applied to choose the best lanes for data during manufacturing and during operation. During reset, you can do lane repair. It’s like putting a very small digital scope on every bump to provide full characterization of every lane. With enough software on top of the data, we can identify not just electrical parameters, but electromechanical parameters, which is one of the biggest challenges. Some of these packages are going into cars. If you have an early indication of a problem, such as degradation of the eye-opening, you could schedule a replacement. But as we discussed earlier, resiliency is expensive today — both in terms of hardware and space.

SE: Isn’t this is one of the problems with redundancy?

Cohen: Yes, and the whole 2.5D/3D push is a move to get resiliency at a lower cost, because you can mix and match different technologies in a single package and make cost calculations before you decide on a certain solution. There will always be some redundancy. But it’s different than having total redundancy. For example, if you look at the big AI machines, where they have hundreds of compute elements, maybe they can take out one or two. Our approach is that you can optimize your system to get to a very high level of reliability without redundancy. But it’s still important to have local redundancies on die-to-die connectivity, for example. There is no single solution for that.

SE: You’re saying there are many ways to get to the same place, right?

Cohen: Sort of. We enable a targeted solution rather than a very costly one.

SE: Has your customer base changed? Are you now dealing with a different type of company, or level within those companies?

Cohen: First, we’re not limited in terms of which companies we work with because our technology can benefit so many applications. But I do see more collaborations, such as analytics or interconnect partners. We’re starting to collaborate with more people in the ecosystem to help the end customer — the service provider — get the most visibility and insights out of their product.

SE: What’s next for proteanTecs?

Cohen: We are now five years old, and our company has doubled in employee count in the last year. While there are certain macro-economic concerns, we do see a lot of growing demand for our solutions in many segments. It’s on us to continue to execute and lead the market.



Leave a Reply


(Note: This name will be displayed publicly)