Babblelabs: Deep Learning Speech Processing

Startup to apply deep learning technology to advanced speech processing.

popularity

Pronounced “babble labs,” a startup that is the brainchild of serial entrepreneur is setting out to transform speech processing and will leverage deep learning to do so.

Rowen, CEO of Babblelabs, has spoken for some time about move of processing to more general purpose hardware, with applications layered on top, so it’s not so surprising his next startup would be in this direction. (Editor’s Note: The startup’s name was spelled Babblabs when this story was first published in January 2018. We updated the spelling.)

Chris Rowen, CEO of Babblelabs
Source: Babblelabs

“It has been a very important and fairly rapid evolution of capabilities triggered by this deep learning revolution that have rapidly advanced speech to being one of the fundamental methods for interface to computer systems. You can sort of see it in these waves. When we had the keyboard based PC, and then we grew up to Windows, and mice. Then we grew up to the cloud as a way to capture and present information, and the cell phone with its display and touchscreen and much more display of visual information but we didn’t make very much progress on the input side, and so much of the current, very real crisis for example in cars. Part of the reason we are so eager to move to autonomous vehicles—it’s not just that people are bad at driving, but people using the historical interfaces like the screens of their phones are really dangerous and driving has gotten more dangerous, and it’s part of the reason why autonomous vehicles have become so important,” he explained.

“Speech is a fundamentally important interface for us,” he continued, “and we actually do talk a lot. In a global world, we talk to people and to services, which are far away so our electronic devices are very important to it. When you have this kind of an emerging technology breakthrough in how we process speech, it creates a flood of new opportunities.”

“Babblelabs wants to be at the cutting edge of this flowering of speech in all of these ways both in helping to make human to human communication work better and of course in human to machine interfaces that we are coming gradually to know and love in speech recognition and speech synthesis and other kinds of speech processing,” Rowen asserted.

But it’s a hard problem, he admits.

As far as a product roadmap, it is still early days, so Babblelabs isn’t talking specifically about products yet. “We’re very much still in a planning and prototyping phase on technologies that will lead to products. We are deeply experienced in audio processing and embedded processing and the key symbol of that is Dror Maydan, who was cofounder of Tensilica, and ran all of the software development, software tools effort of Tensilica over its entire history, as well as being the lead architect for Tensilica’s audio DSPs so he really mastered both the hardware and the software dimensions and has been at the heart of deploying billions of units of audio processing, in which speech processing is pretty prominent.”

Rowen has a long history with Maydan as they worked together at Silicon Graphics before the founding of Tensilica, and also share a common PhD advisor in John Hennessy of Stanford University.

For now, Babblelabls isn’t at the point of setting expectations for specific products as Rowen explained they learning a lot as they work intensively on the technologies. “However, it seems pretty clear that there are opportunities for this class of software both in the cloud, and in embedded devices, and we clearly are going to develop products that serve both. There are different issues associated with embedded products versus cloud products. Cloud products need to be very agile and support a very wide range of different kinds of uses because the leverage of the cloud is that it is so agile. Embedded of course means that you have to pay a lot of attention to the implementation: how do you make it low enough megahertz, low enough compute, small enough footprint, low enough latency, that you can take full advantage of the interactivity that is both possible and in many of the interesting use cases, is mandatory to make a good product. The fact that you have all of these natural latencies in dealing with the cloud is both a benefit and a problem. It’s a benefit in the sense that you get to focus on functionality and you just can’t worry too much about the details of every last cycle that gets spent because there is a lot of overhead in doing it. On the other hand, you have such enormous flexibility there that you can move more rapidly; you can deploy into products that serve different users very much more quickly and if you learn something in the experience, you can have another spin of the product weeks later. Whereas when you’re thinking about something which is going to go into a car or into a telephone, you’ve got to think in rather different terms about the development process, about the update process, and you have to get things really right so that they become very integral to the experience of the platform. Particularly in something like speech, people gravitate to speech because it is such a natural interface for them, but it means that the experience has to be all the more seamless because the expectations are high. In fact, it is really kind of ironic that in the use of some of the other interfaces like touchscreens and the like, we have really trained our brains to fit those interfaces, and have made it work that way but it actually takes a fair amount of effort to do that. When you go to speech, the biggest motivation for it is that we already know how to do that. We want it to be as invisible an interface as possible, and that puts a really high demand on the quality of the sound, the flexibility of the language, which you can use, and latency—the responsiveness that comes of it. Those are fairly demanding aspects. Nevertheless, we see both cloud deployments and embedded deployments as an essential part of the vision of products that we will develop.”

Babblelabs is also talking to partners and potential long term customers about what is now becoming possible with speech and to talk about what the use cases are, and what the use cases are for the speech related problems people have with language in the cloud. What is the cockpit of the car going to look like? What are the things you’re going to want in mobile phone telephony, and in mobile phone user interface? How do we find the most interesting and valuable cases associated with IoT, and there certainly are some very interesting ones.

“When you think about cases where even small improvements in the quality or the comprehensibility of speech, where it can save lives. I think that’s important,” Rowen added.

Babblelabs recently closed a $4 million seed series of investment led by Cognite Ventures, with additional funding support from Jerry Yang, founding partner of AME Cloud Ventures, along with industry gurus John Hennessy, Harvey Jones, James Hogan and Kurt Keutzer.



Leave a Reply


(Note: This name will be displayed publicly)