Voice activity detector in highly degraded environments

In highly degraded environments, the object sometimes is to automatically detect only the presence or absence of speech or simply a human actor. This scenario arises in military aircraft, cargo trains and military combat environments. In such scenarios, the conventional approach of using an energy detector or the number of zero crossings will not suffice on the input signals because the signal energy is subsumed by the noise figure. we consider the use of an aggressive adaptive noise reduction approach to elevate the speech profile above the noise figure for recognition. Notice that the object here is not speech recognition, but merely a detection problem.
Consider a far field source impinging two microphones as shown in Figure 1 below:

Figure 1: Two microphones

Suppose the signal at each microphone $i \in \{1,2\}$ is given as:

$x_i(t,w) = s(t,w) e^{\left(-jw (i-1)\frac{d}{c} \cos{\theta} \right)} +n_i(t,w)$

where $s(t,w)$ is the desired time-frequency speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to endfire, $n_i(t,w)$ is uncorrelated noise.
Because of the level of noise, the direction of the signals will be unrecoverable between the two microphones, leading to an expected angle of arrival $\hat{\theta} = 0$ . Thus the signals can be transformed into

$x_i(t,w) \approx \hat{n_i}(t,w) +(i-1) \hat{s}(t,w)$

Where $\hat{n_i}(t,w)$ denotes the noise and $\hat{s}(t,w)$ denotes the correlated speech signal as noise. An adaptive noise reduction algorithm considers $x_1(t,w)$ as a noise reference. Figure 2 below illustrates an example with an initial SNR at of $\approx -6.5dB$ and an SNR improvement of $\approx 26dB$ yielding a final SNR of $\approx 19.5dB$ .

Fiure 2: Aggressive adaptive noise reduction output

It is clear that a post processing detector becomes far superior to a pre-processing detector.

VOCAL Technologies offers custom designed solutions for robust voice activity detector, beamforming, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

Complete Communications Engineering

Voice activity detector in highly degraded environments

Information