Call Today 716.688.4675

Voice-Activity-Detection (VAD) Feature Set

Many features have been used successfully for VAD detection in practice. VOCAL technologies Acoustic Library provides robust and reliable VAD module that includes the following feature set.

Energy based

The decision mechanism is implemented by the following logic,

VAD\ =\ 1,ifEs>El0,         otherwise

where E_s and E_l are the short- and long-term energy values. They can be calculated frame by frame in time domain.

E_s=\frac{1}{N}\sum_{t\ =\ 1}^{N}{x^2\left(t\right)}

where N is the frame length.

The long term can be obtained from a frame by frame moving average with an adjustable decay parameter as below,

E_l=\left(1-\gamma\right)E_s+\gamma\ E_l^{OLD}

where E_l=\left(1-\gamma\right)E_s+\gamma\ E_l^{OLD} is from previous frame and \gamma is the decay parameter.

Correlation based

The decision logic,

VAD = \{\begin{matrix} 1, & if R\tau>\lambda \\ 0, & otherwise\\ \end{matrix}

where R\left(\tau\right) is the normalized correlation function at delay \tau and \lambda is a threshold. The correlation can be calculated frame by frame in time domain.

R\left(\tau\right)=\frac{\sum_{t\ =\ 1}^{N}x\left(t\right)x\left(t+\tau\right)}{\sum_{t\ =\ 1}^{N}{x^2\left(t\right)}}

where N is the frame length.

Spectrum Flatness

The decision logic,

VAD = \{\begin{matrix}1,& if SF>\lambda\\    0,&         otherwise\end{matrix}

where  SF  is the spectrum flatness measure. It can be calculated the frequency domain as following,


where G_m\ and\ A_m are the geometric and algebraic means of the signal spectrum.

Cepstral Based

The idea is that energy based VAD will fail for high noise application. However cepstral vector is less prone to error under high noise. The decision logic is similar,

VAD = \begin{matrix} 1,& if(\Delta {C}_{s}>\Delta {C}_l) \\ 0,& otherwise \end{matrix}

where {\Delta C}_s\ and\ {\Delta C}_l are the short-term and long-term differential cepstral vectors. THey can be computed from the frequency domain.

Besides the above mentioned, VOCAL Technologies also incorporate other simpler measures in the VAD module, such as, spectral peak information, energy ratio etc., in the library.

VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713