Voice Activity Detection (VAD) is the first component to voice control and voice assistance applications. However, it is often overlooked when it comes to the system design. Performing the Voice Pre-processing, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Keyword Spotting (KWS), and Wakeword Detection (WWD) are all important functions to the system, but will either consume unnecessary power (i.e. CPU resources), or fail horribly without a properly functioning VAD.
A well designed VAD should have the following features:
- Positively detect voice in high noise and low signal to noise ratio (SNR) scenario
- Positively detect voice in time variant noise environments
- Does produce false positives on transient and non-stationary noise sources
- Be computationally efficient
Energy level based VADs are a sufficient choice for many voice control applications due to their low computational complexity and the ability to have a floating threshold based on the observed noise characteristics.
The accuracy of energy-level based VADs begin to suffer as the SNR approaches 0dB, resulting in a failure of the voice control system. The waveform in the image below shows a noisy speech signal.
The fullband signal exhibits no observable increase in energy level that would indicate that speech is present. However, a spectrogram of the same signal, shows there is an observable speech signal in frequencies above 1000 Hz.
Spectral flatness or cepstral based detectors are engineered on two good feature sets for efficient detection of speech in low SNR scenarios. You can find more information about this topic here.
VOCAL Technologies can offer a custom designed VAD solution. VOCAL offers off-the-shelf and customizable audio processing modules designed to meet your specific audio requirements. Please contact us to learn more. VOCAL’s software may be licensed standalone, as a library or part of a complete design. Our software libraries are optimized for leading microprocessors and DSPs from ARM, TI, ADI, Intel, AMD and other vendors.
- Voice Activity Detection (VAD) Software
- Voice Activity Detect Feature Set
- Energy level based VADs
- Deep Learning Based Voice-Activity-Detection (DLVAD)
VOCAL’s optimized software is available for the following platforms. Please contact us for specific noise reduction supported platforms.