Preventing scam attempts that involve mimicked voices.
Voice anti-spoofing is a set of techniques designed to prevent scam attempts that involve mimicked voices and improve the overall UI/UX experience of VUI systems by preventing accidental triggers. These techniques are particularly important to prevent issues related to:
These attacks and issues can pose a significant disruption to the flawless experience of using voice systems and hence demand a robust solution.
Voice anti-spoofing works by detecting and preventing voice-spoofing attacks, which can involve recorded, computer-generated, or computer-modified voices. Here are some key components of how it works:
Fig. 1: Anti-spoofing solution components.
By using these techniques, voice anti-spoofing systems can effectively combat distinct types of voice spoofing attacks and enhance the overall UX experience…in addition to assuring smart doorbell users everywhere that it really is your neighbor at the front door.
Renesas’ Voice Anti-spoofing is engineered for speed and responsiveness while maintaining high accuracy and is completely done at the edge. We combine hardware across the RA MCU family (RA6, RA4, RA2 series) and RX MCU family with the Cyberon voice stack to identify the trigger/wake word and then use Reality AI generated models to check for real vs recorded voice in the signal.
Renesas’ Reality AI model uses “Hi Renesas” as a wake word. Users may speak with any common spoken English accent and natural vocal tonal quality (male or female) to use this solution. Our testing benchmarked the model to be 96% accurate with recorded voice played from a phone speaker (iPhone or Android) and ~99% accurate on training K-Fold validation.
Fig. 2: e² studio solution workflow.
Utilizing Renesas’ IDE, e² studio, a user can collect data, integrate Cyberon’s Voice Stack for wake-word detection (Hi Renesas), and finally integrate any AI models generated using the Reality AI Tools module.
Fig. 3: e² studio – Reality AI Tools integration workflow.
We collected real (recorded via Renesas hardware microphone) and recorded data across a small set of people. This data was fed to Reality AI’s feature extraction and training engine to develop and output a model. We achieved ~99% training K-Fold accuracy which prompted us to select the model for live testing and benchmarking.
The model was then integrated back into the e² studio project and extensively tested in live office settings with people not included in the training set for benchmarking achieving 96% accuracy.
Fig. 4: Reality AI Tools training results.
The adaptation of this application example in your VUI-based system will lead to further adaptation-based needs which are simplified using the Voice Anti-spoofing application example as a reference. For further information, you’ll find development resources on the Reality AI Tools page.
Renesas’ anti-spoofing application example demonstrates the Reality AI Tools’ capability to address real-world challenges, improve user experience, and enhance voice user interface (VUI) systems with additional features. Our AI models have a small footprint and the flexibility to expand by utilizing extensive data collection.
Leave a Reply