中文 English

Audio, Visual Advances Intensify IC Design Tradeoffs

New features highlight need for better tools, faster PHYs, more standards, and a lot of optimization.

popularity

A spike in the number of audio and visual sensors is greatly increasing design complexity in chips and systems, forcing engineers to make tradeoffs that can affect performance, power, and cost.

Collectively, these sensors generate so much data that designers must consider where to process different data, how to prioritize it, and how to optimize it for specific applications. The tradeoffs include everything from always-on, always-listening features, longer screen-on time, which must be balanced against demands for longer battery life longer. On top of that, there are persistent concerns about data security, as well as increasing need for context-aware AI algorithms.

An estimated 14 billion smart sensors will be connected to the internet by the end of 2025, said Joe Davis, senior director of product management for power integrity at Siemens EDA. “And these are just the sensors connected to the internet, which is growing the fastest because it is where it’s possible to get the data and do something with it. It’s not just observing it and taking a picture. It’s doing some processing of the data.”

Case in point: Sony has a device that can recognize jaywalkers without infringing on privacy. “It performs the action recognition, then sends a signal,” Davis said. “And because all the sensing and processing is done locally, it’s not sending someone’s face over the internet, so it protects their privacy. In terms of architecture, providers in this space traditionally have been at very mature nodes and have optimized these technologies. A lot of the technology is still there, but to get the processing needed now, they’re having to marry those sensors with more advanced technologies.”

In many cases, the power budget is extremely limited because it needs to run on one or more batteries. “Devices are getting more battery focused in terms of wanting to run on very small batteries,” said Prakash Madhvapathy, product marketing director for Tensilica audio/voice DSPs at Cadence. “Users expect to see very long battery life and continuous operation throughout the day, so 24/7/365. Devices need to be always-on, for the convenience of the user. They also need to be intelligent to understand what the user intends at a particular moment in time, without being told explicitly what it has to do.”

At the same time, these devices require more compute power because they need to process more data. “The use cases seen in the past are now evolving into much more sophisticated use cases, where the end consumer is expecting a lot more from the device than they did previously,” Madhvapathy said. “That has been a positive feedback loop, where the devices themselves are showing more capability, which has raised expectations of both the manufacturers and the end consumers. And that has driven the need for more compute power in the device itself.”

Madhvapathy observed that these two factors seem to be in contradiction with each other. “In one case, you want always-on, long battery life. In the other case, you are looking for more compute power, which is going to consume battery life. The challenge becomes how these two can coexist, and how can the manufacturer or the OEM create a product that provides the best of both worlds?”

This is evident with increasing amounts of autonomy in vehicles, which require hundreds of TOPs (tera operations per second). “In those cases, for the power/energy, there’s a little bit of tradeoff you have to do, but you don’t expect to use the same product that you use at the low end,” said Amol Borkar, director of product management and marketing for Tensilica vision and AI DSPs at Cadence. “In a typical product development cycle, you start segmenting the market to determine what the range of products to focus on for the low, medium, high products, or good, better, best type of approach. It’s very difficult to have one product that can span that entire spectrum. It can broadly span a range, but if you’re talking about something that goes to an always-on capability, but then also has to be reconfigurable to run a self-driving vehicle – that’s generally not going to happen. If it does happen, it will be over-designed and not fit the requirements for any segment.”

This holds true for audio or visual parts. What used to be discrete is increasingly being integrated into a system or sub-system.

“With greater proliferation of AI, we’re starting to see a lot of merging between these product families,” Borkar said. “In the always-on space, it started with developers saying, ‘I just want to do audio processing, such as keyword spotting and keyword detection.’ Now they are adding some vision processing for human presence detection, for instance. To take that further, developers don’t want two different IPs to do this type of multi-modal processing. They want one IP that can do both the vision processing as well as the audio processing, and this is on the low-end side. On the high-end side, it’s more about, ‘I’ve got a system that does this camera-based person detection, or ADAS/pedestrian/street sign detection, but at the same time I’m also doing short-range radar processing. I don’t want to put in a separate processing block or IP block to do that. I just want one block that does the processing, even though it’s multi-modal.”

Communications concerns
Another concern for chip architects is the ability to communicate images and video quickly enough for a particular application.

Displays today have significantly higher resolution than in the past, which in turn requires higher bandwidth. The problem is that PHY speeds are not keeping pace with the improvements in resolution, said Hezi Saar, director of product marketing for mobile, automotive and consumer IP at Synopsys. This is evident with the increasing bandwidth needs for AR/VR and mobile applications, which require increases in PHY bandwidth. The solution, at least for now, involves compression standards, such as VESA DSC and VESA VDCM.

“Visually lossless compression has been introduced to the market, which would reduce the demand for faster PHYs and faster switching, enabling lower power because you don’t need to send the same data,” Saar said. “You can compress it, and the data remains more or less in the same ballpark so the power per bit is effectively contained. This kind of compression is being adopted across the board with HDMI, DisplayPort, and MIPI, used in mobile as well as in automotive.”

Initial objections to this approach were driven by concerns about the safety implications of a lost pixel, or what would happen if a pixel was not seen for a second or a millisecond. While there are multiple opinions on this subject, screens in a vehicle typically are not used for driver safety, and compression saves a lot of resources.

“Then, the architecture questions become simpler,” Saar said. “The tradeoffs come down to, ‘What is the frame buffer going to implement? How much memory would you use internally in your SoC versus how many things externally? How many lanes of communication do you need? What will the power budget be?’ All of this is driven by the amount of bandwidth you need to drive the display.”

Due to the breadth of applications, A/V chips and IPs must be highly workload-dependent and application-specific to achieve an optimum system. This means that when system architects design those chips, they must take into account the kinds of workloads that will be running, and choose the compute blocks that are needed for meeting the performance and power profiles.

“The biggest challenges we are addressing involve just higher data rates,” said Rami Sethi, vice president and general manager at Renesas Electronics. “You’re going to see more and more compute capability moving toward the edge, doing as much as you can there and not moving everything to the cloud. Even inside of networking equipment, we’re seeing more localized compute where it’s needed. We’re even seeing more people talking about compute in memory, just putting that processing as close to the data as possible.”

At the same time, those compute elements are becoming more specialized. “We make the interface run faster, more effectively, and more reliably,” Sethi said. “But down the road, there is an opportunity to put additional functionality in there. All of the data passes through our chips, between the CPU and the memory. We can add more value on the data processing side with security, and potentially with data compression algorithms.”

Others agree. “If you want general-purpose hardware, like CPUs, you can put everything on an x86 or an Arm CPU,” Madhvapathy said. “But it won’t be power-efficient and it won’t be compute-efficient, because they’re not designed for a particular class of workloads. You never design anything for one workload only. You design them for one or two classes of workloads, so that you’re not too narrowly focused. But at least for the workloads, the DSPs will end up being a lot more efficient in processing, both in terms of time and in terms of power than the main CPUs. This is why the trend for the past decade has been to move the processing from the CPUs over to the DSPs for efficient processing, both for vision as well as for audio and speech.”

The same kinds of tradeoffs and work-arounds are happening in the consumer electronics space, as well, where there are demands for greater computational performance and longer battery life. “Traditionally engineers have worked to either optimize for low power or for high performance,” said Roddy Urquhart, senior marketing director at Codasip. “One of the few ways forward is hardware specialization in order to meet the requirements of a particular application. Twenty-five years ago, this would have been addressed by creating an ASIC. But ASICs lack flexibility, and many applications require programmability to handle different releases of standards, such as coding, or to handle firmware updates.”

So while general-purpose processors can handle a wide range of software tasks, it’s far less energy-efficient. “If they are used with specialized software, it is quite likely that many of the processor features — and hence circuits — will be simply unused or under-used,” Urquhart said. “By contrast, if a software workload is profiled to identify computational bottlenecks, then a specialized processor can be designed to address the computational bottlenecks, but without including unnecessary features. Such a design should be lean in terms of circuitry, as well as delivering good performance.”

This creates other issues, though. Creating a specialized processor from scratch requires a multi-disciplinary approach that is out of the skillset of many companies, and it’s one of the reasons why the RISC-V open instruction set has gained traction. It simplifies design by offering a base set of integer instructions, optional extensions, and provisions for teams to create custom instructions. “Another simplification is when processors are licensed using a processor description language,” he said. “The core description can be modified and tuned at a high level, and the RTL, verification environment, and software toolchain can be synthesized from the high-level description.”

More tools coming
Still, Siemens EDA’s Davis maintains there are not particularly good tools available at the system level because a lot of this is moving so fast. “There hasn’t been the opportunity in the marketplace to develop those models and get them deployed. Back in the day when everything was modeled in a data book and everything was online, you could put your system together and do all the systems tradeoffs very early on. But these capabilities are advancing so quickly that those models are not available. People generally are using spreadsheets and things like that to do this kind of analysis. There are some capabilities out there, but when you get down to the IC level, each manufacturer, each design company is having to reach out and partner with their foundries to understand the environment of the tradeoffs. There’s a lot of work that has to happen in those areas in order to make those tradeoffs.”

While the tool providers are working on the tools, there is a need today to be able to do this analysis. “People have the dream of being able to sit down and come up with the optimal solution. But as always, when you’re projecting forward, you’re going to design this chip architecture today. They’re going to design it next year, and it’s going to get fabbed and deployed a year after that. I’m looking three years into the future,” Davis said.

To contend with these issues, the answer is increasingly heterogeneous integration using some type of advanced packaging. That makes it possible to have the most advanced processing with low leakage on the digital side, and marry that to the analog side where circuits can be developed at process geometries that make the most sense.

“A lot of these historically mature node companies that are doing all these sensors and amplifiers and noise cancelling — all of this requires advanced processing, and they have to bring in the advanced technology to get the compute resources at low power,” Davis said. “Now we’re talking about system level integration, so the 2.5D/3D stack becomes much more challenging. There’s a digital die, along with one or more analog dies, because if I’m going to put in a sensor and a radio, I might have three different technologies all put together in a package. We’re seeing a lot of that. We’re also seeing silicon photonics, especially in the compute center. The compute center used to be fine as long as you didn’t melt the silicon. The attitude used to be, ‘We’re plugging it into the wall. Who cares?’ They do care now when they’ve got hundreds of thousands to millions of these cores in a building, with big cooling towers on top, because it is generating a lot of heat.”

Architectures that use die-on-die or package-on-package will be more common to solve some of these problems. “It depends on which application you’re talking about,” said Synopsys’ Saar. “Sometimes real estate is important, so you use package-on-package. Sometimes latency is very important. Or sometimes you want to do this kind of computation locally. Then you put a DDR on top of your die. Doing that would improve performance, reduce latency, and improve on power. This means when you process the video data, it can be done more efficiently. Some kind of die-to-die interfaces will become more common in the more complex systems. Automotive ADAS is a candidate. Mobile is a candidate on the SoC side. Even in an application like an IP camera or network video recorder, if you are a company that owns the whole thing — you have the AI engines in the cloud and you’re providing the full service, all the electronics, and you’re also manufacturing the SoCs — then you potentially could do an SoC that can go to the IP camera. You also may be able to connect two dies using die-to-die technology, so you can do the network video recording that connects all the IP cameras together.”

To boost the efficiency of advanced audio/visual systems, very specialized hardware is needed,. “Looking to the way this has been done in smartphones and PCs, to improve battery life in all of these devices — regardless of audio, visual, PC, ADAS, or whatever these systems may be — you can’t have your system completely running 100% all the time,” said Aakash Jani, head of technical marketing at Movellus. “Otherwise, you’re just going to kill your system. This brings the idea of toggling different power domains, creating wildly different power domains, whether you’re doing wavefront analysis one minute and you’re fully inferencing the next second after that. You’re going to have your system switch in dynamic power very quickly. If it doesn’t, that will translate into real-time latency.”

This extends well beyond just audio/visual systems. “When balancing increased intelligence on a chip, that must be balanced against battery life,” Jani said. “You need to have very fine-grained power control, depending on the workload. The power systems and power management need to parallel the different workloads that you may be seeing so that you’re not just wasting cycles by burning power.”

One of the biggest design constraints and issues designers are trying to deal with is voltage droop or IR drop. “Because clocks are so intertwined with whatever systems they are in, and because they’re such a large power contributor, they have a very personal relationship with voltage droops,” he explained. “As these systems are switching, especially high-frequency systems, for a smartphone, a PC, and even in the data center, there are large fluctuations in power. The clock network is not only a contributing factor to that, but the way it is designed could also be a solution.”

Conclusion
Big picture, all of these low-level challenges must be set against a long-term view, and designs must be scalable.

“What you do today is not what you’re going to do tomorrow,” noted Paul Karazuba, head of marketing at Expedera. “From a design perspective, most companies that do hardware are not interested in one generation. They’re interested in many generations to sustain a growing company. What I do today in the audio or video realm may be on a 4k camera. I’ll probably be on an 8k camera in a couple generations. You need to have an architecture that scales — and not just the architecture, but the underlying design languages and software ecosystems that you work within. You don’t want to have to go to a completely different architecture with each generation so you need to have something where you as a system engineer, system architect, or chip designer have confidence that your solutions are going to be able to scale and that your suppliers are going to be able to support the future needs of your product.

Add AI into the mix, which is proliferating widely across not only audio/visual applications, and things get even more complicated.

“Depending on the market, you have to start designing for algorithms that don’t exist today, and that’s completely counterintuitive,” Karazuba said. “In automotive, for example, if you design a chip today, it doesn’t hit the market for three years. It’s got to be in the market for 10 years. In that 13 years, the neural networks that it is processing are not going to remain the same. So with advanced neural networks, and custom neural networks, and networks that don’t exist today, those are the decisions that the system engineers need to make, so they can try to design for something that doesn’t exist.”

— Ed Sperling contributed to this report.



Leave a Reply


(Note: This name will be displayed publicly)