What object-based audio means for systems design.
Technology innovation can be dizzying—literally. Walk through a place as vast and cavernous as Mobile World Congress (and walk and walk and walk) and you can’t help but get wobbly at all the innovation in the building.
But find your way to the Fraunhofer booth and things get even more interesting.
Here, you take a seat outside the booth, slip on a virtual reality headset and settle into a seat in a concert hall. A buzzing sounds flits around, just outside your vision and then a lightning bug appears and floats in your field of vision, above your head and behind you. Throughout its visit, the bug’s buzz travels in 360 degrees, depending on its path.
Truly immersive
There is no outside sound—no hint of the cacophony of the convention center; just you and the lightning bug. And then a penguin arrives and does the same floating dance. Turn your head to follow the object and you not only see other aspects of the concert hall (as you’d expect to see in real life), you follow the sound as well. In other words, pivot right and the object and the sound move in front, rather than to the side, of you.
Slip the headset off and stand up and be prepared to hang on to something, because the experience is dizzying. But that’s not the point. What you’ve just experienced, thanks to a voice-over during the presentation, is object-based audio.
With object-based audio, sound is pushed to a place in 3D space, adding a height dimension to audio rendering. The audio system then chooses how and where to render that in various speaker configurations or head phones.
Previously, getting an immersive audio experience came via channel-based systems, designed to play back on a certain configuration of speakers (5.1, for instance, is designed for five speakers and one subwoofer). In the channel-based set-up, the surround sound effect occurs because audio is directed to a speaker—for example to a rear speaker to simulate the experience of a vehicle approaching from behind.
Objects in space
Object-based audio takes things to a new plane. The BBC’s R&D blog has a cogent explanation of object-based technology:
Rather than broadcasting the stereo loudspeaker signals, which contain a pre-mixed mixture of dialogue, narration, sound effects, music and background atmospheres, each of these sounds is sent as a separate audio object. Along with these objects some metadata is included which describes when and where these sounds should occur and how loud they should be. All this is broadcast to a receiver. The receiver then reassembles the audio objects in accordance with the metadata. Because this reassembly is happening in the receiver, the objects can be reassembled slightly differently for each listener by locally changing the metadata.
Why is this cool? Because it allows users to customize their experience on the receiving end of the transaction. Want to hear your local announcers instead of the network team during a broadcast sporting event? Done. Choose that object. Want to hear more viola and less violin? Done. Raise the volume on that object, lower it for the violin section.
What it means for systems design
It’s a pretty straight forward concept and pieces of it already exist in gaming environments. But what’s the real dizzying part of it is how to implement this for mobile devices. It’s one thing for a home theater system, plugged into the grid. Quite another in the mobile space where power and performance are often at design loggerheads.
“Think about what you have to process at that level, in that type of device,” said Gerard Andrews, Cadence Senior Product Manager for IP technologies. He and I visited the Fraunhofer booth together at Mobile World Congress. “Just processing the increased number of objects and 3-D positional information is a significant challenge.”
It will require increased DSP processing power, for sure, but “If you’re decoding MPEG-H through a set of headphones, you want to give the most real-life experience possible but preserve your battery as much as possible,” he said.
That requires careful thought about how to partition the audio processing within the system context. Do you relegate the duties to your general purpose processor or SoC? Or do you leverage specialty DSPs? It’s more the latter, Andrews argues.
“These emerging audio standards (MPEG-H, which is backed by Fraunhofer, Qualcomm and Technicolor; Dolby Atmos from Dolby and DTS:X from DTS) are going to be more computationally intensive,” he said. “And you need programmable audio DSPs capable of handling multiple standards.”
The wrestling match over the future direction of object-based audio is now underway: The Advanced Television Systems committee (ATSC) has begun reviewing the three proposals to determine which might drive the ATSC 3.0 next-generation television broadcast standard. The goal is to establish the ATSC 3.0 Audio System Candidate Standard this fall, according to ATSC.
However it shakes out, get ready to rethink how you design your audio subsystems and how you’ll get to enjoy a truly immersive audio experience in your earbuds.
Just make sure you take a seat.
Leave a Reply