Complete Communications Engineering

Voice Activity Detection (VAD) is the first component to voice control and voice assistance applications. However, it is often overlooked when it comes to the system design. Performing the Voice Pre-processing, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Keyword Spotting (KWS), and Wakeword Detection (WWD) are all important functions to the system, but will either consume unnecessary power (i.e. CPU resources), or fail horribly without a properly functioning VAD.


A well designed VAD should have the following features:

Energy level based VADs are a sufficient choice for many voice control applications due to their low computational complexity and the ability to have a floating threshold based on the observed noise characteristics.
The accuracy of energy-level based VADs begin to suffer as the SNR approaches 0dB, resulting in a failure of the voice control system. The waveform in the image below shows a noisy speech signal.

VAD waveform

The fullband signal exhibits no observable increase in energy level that would indicate that speech is present. However, a spectrogram of the same signal, shows there is an observable speech signal in frequencies above 1000 Hz.

VAD spectrogram

Spectral flatness or cepstral based detectors are engineered on two good feature sets for efficient detection of speech in low SNR scenarios. You can find more information about this topic here.

download brochure

VOCAL Technologies can offer a custom designed VAD solution. VOCAL offers off-the-shelf and customizable audio processing modules designed to meet your specific audio requirements. Please contact us to learn more.  VOCAL’s software may be  licensed standalone, as a library or part of a complete design. Our software libraries are optimized for leading microprocessors and DSPs from ARM, TI, ADI, Intel, AMD and other vendors.

Related Information

Platforms

supported platforms

VOCAL’s optimized software is available for the following platforms. Please contact us for specific noise reduction supported platforms.

ProcessorsOperating Systems
  • Texas Instruments – C6xx (TMS320C62x, TMS320C64x, TMS320C645x, TMS320C66x, TMS320C67x), DaVinci, OMAP, C5xx (TMS320C54x, TMS320C55x)
  • Analog Devices – Blackfin, ADSP-21xx, TigerSHARC, SHARC
  • PowerPC, PowerQUICC
  • MIPS – MIPS32, MIPS64, MIPS4Kc
  • ARM – ARM7, ARM9, ARM9E, ARM10E, ARM11, StrongARM, ARM Cortex-A8/A9/A15, Cortex-M3/M4
  • Intel / AMD – x86, x64 (both 32 and 64 bit modes)
  • Linux, uClinux, BSD, Unix
  • Microsoft Windows ACM / RTC / CE / Mobile
  • Apple iOS / iPhone / iPad & MacOS
  • eCOS / eCOSPro
  • Google Android
  • Green Hills Integrity
  • Micrium μCOS
  • Symbian
  • Wind River VxWorks
  • VOCAL LANsEND