Call Today 716.688.4675

Voice activity detector in highly degraded environments

In highly degraded environments, the object sometimes is to automatically detect only the presence or absence of speech or simply a human actor. This scenario arises in military aircraft, cargo trains and military combat environments. In such scenarios, the conventional approach of using an energy detector or the number of zero crossings will not suffice on the input signals because the signal energy is subsumed by the noise figure. we consider the use of an aggressive adaptive noise reduction approach to elevate the speech profile above the noise figure for recognition. Notice that the object here is not speech recognition, but merely a detection problem.
Consider a far field source impinging two microphones as shown in Figure 1 below:

Two Microphone Array

Figure 1: Two microphones


Suppose the signal at each microphone i \in \{1,2\} is given as:

x_i(t,w) = s(t,w) e^{\left(-jw (i-1)\frac{d}{c} \cos{\theta} \right)} +n_i(t,w)

where s(t,w) is the desired time-frequency speech signal, \theta is the direction of arrival (DOA) of the speech signal with respect to endfire,  n_i(t,w) is uncorrelated noise.
Because of the level of noise, the direction of the signals will be unrecoverable between the two microphones, leading to an expected angle of arrival \hat{\theta} = 0. Thus the signals can be transformed into

x_i(t,w) \approx \hat{n_i}(t,w) +(i-1) \hat{s}(t,w)

Where \hat{n_i}(t,w) denotes the noise and \hat{s}(t,w) denotes the correlated speech signal as noise. An adaptive noise reduction algorithm considers x_1(t,w) as a noise reference. Figure 2 below illustrates an example with an initial SNR at of \approx -6.5dB and an SNR improvement of \approx 26dB yielding a final SNR of \approx 19.5dB.


vad noise example

Fiure 2: Aggressive adaptive noise reduction output

It is clear that a post processing detector becomes far superior to a pre-processing detector.

VOCAL Technologies offers custom designed solutions for robust voice activity detector, beamforming, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!


VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713