What Was Everywhere At CES? Voice.

How DSPs enable the new ubiquitous user interface.


I still don’t know why I’d ever want a voice-enabled washing machine, but the display in the Samsung booth with voice recognition for appliances was indicative of one of the strongest trends at this year’s Consumer Electronics Show in Las Vegas. Everything is becoming voice enabled. Samsung has Bixby, Amazon has Alexa, Google has “Hey Google,” and Apple has Siri.

These big-name voice recognition applications have one thing in common—they rely on going to the cloud for voice processing. In the real world, that’s often not possible or even desirable, as there’s significant latency in cloud-based voice processing. People may want voice recognition from their smartphones, even when in airplane mode. Or you may not want your automobile to wait for cloud processing before obeying your commands.

On-device artificial intelligence (AI) handles this audio processing directly on the local device, and opportunities for this are growing. Just about every one of these new voice-enabled applications will require specialized silicon, as it’s rare to dedicate a separate chip to voice because today’s voice-processing DSPs have such small footprints. Voice can easily be integrated on the same chip with the main host controller or other functions. And voice is often used in sophisticated audio applications, like televisions and even smartphones, where multiple audio DSPs are often used—one optimized for the lowest power for always-on listening, another optimized for more high-end audio theater-quality processing.

What are the things you should consider when considering voice/audio DSPs into your next chip design? First and foremost, you need to make sure the software programs for the particular applications are already ported to the DSPs you are considering. The last thing you want to do is port it yourself—you will never meet your product delivery schedule.

What types of software will you need to work on your DSPs? It varies by application, so let’s take a look first at mobile, where many mobile phones use one DSP for always-on voice recognition (with power low enough that you don’t have to charge your battery every day) and one DSP for high-quality audio playback. Even with always-on low-power voice recognition, there are continuing improvements in voice processing algorithms to support acoustic echo cancellation (AEC), noise reduction/suppression, noise-dependent volume control, and beamforming rather than omnidirectional microphones. On the playback side, more powerful DSPs are used for an immersive headphone audio experience, which accurately recreates the studio soundstage through headphones using technology such as Dolby Atmos. VR/3D position-dependent audio is emerging, providing head tracking integrated with audio processing for a total auditory immersion with a real sense of space.

Hearables is a new class of wearable products. For products such as Apple’s AirPods, Bragi’s The Dash, and Samsung’s Gear IconX, audio-related applications include Bluetooth audio and voice, voice processing and noise reduction, voice control, and biometrics.

Most televisions now will be voice enabled, and also require the best possible sound quality out of ever-smaller speakers. 4K ATSC 3.0 broadcasts use Dolby AC-4 and MPEG-H codecs, which require a step-function increase in DSP processing with multiple hundreds of millions of clocks per second (MCPS) complexity. Televisions are moving from providing stereo sound to object-based audio such as Dolby Atmos and DTS:X, to being used in movies and gaming, which require a step-function increase in processing complexity. Soundbars are augmenting or replacing internal speakers, and often have voice recognition as well as sophisticated audio capability.

In automobiles, the big challenge is noise. Active noise cancellation (ANC) is essential—both for road (road noise cancellation (RNC)) and engine (engine order cancellation (EOC)). ANC requires sophisticated signal processing, three to six microphones in the headliner, and latency of less than three microseconds from microphone to speaker. By canceling noise with DSPs, auto makers can reduce cost and weight by eliminating sound-deadening material. In effect, DSPs are green with the resulting mileage improvement from weight reduction. And now, designers are using proximity sensors integrated with the infotainment system to create audible alerts for pedestrians of oncoming electric vehicles.

Voice is truly becoming the new ubiquitous user interface (UI). Automatic speech recognition continues to improve. Mass consumer adoption will require high accuracy, far-field support (longer distance recognition), and low latency.

With concerns about security with cloud-based always-on voice recognition, on-device recognition is an ideal solution that will drive a new generation of products.

Leave a Reply

(Note: This name will be displayed publicly)