Eigen Space Voice Activity Detector For Multiple Mics

Received signal energy based voice activity detectors, VADs, are widely employed in broadband acoustic systems. There is a potential drawback in scenarios where there is high energy ambient noise. Cross correlation based VADs also have some drawback due to some level of correlation between noise samples of microphones due to their proximity to each other. An alternative approach is to use a combination of eigenvalues and coherence. The case of two microphones is discussed below.
Consider a far field acoustic signal impinging $2$ microphones with separation distance $d$ at an angle of $\theta^{\circ}$ . The signal at microphone $i$ , $x_i$ , can be denoted as

$x_i(t) = s(t - \tau_i) + \nu_i(t-\hat{\tau}_i), i \in \{1, 2\}$

where $\tau_i =\frac{d}{c} \sin{\theta}$ is the delay of the desired signal at microphone $i$ if present, $\hat{\tau}_i =\frac{d}{c} \sin{\beta}$ is the delay of the noise signal at microphone $i$ with the expectation $\mathrm{E}[\beta] =0$ , $s(t)$ is the source signal, $\nu_i(t)$ is noise and $c$ is the speed of acoustic signals.

Both $s(t)$ and $\nu (t)$ are zero mean ergodic processes. The decision problem is whether a frame contains a signal or is a noise frame.

We utilize the imaginary value of the coherence, $i\Gamma_{x_1,x2}$ , given by

$i\Gamma_{x_1,x2} = \alpha_o(\omega) (\alpha_1(\omega) \sin{(w\tau)}+\sin{(\omega \hat{\tau})})$

and note that for a pure noise signal , $i\Gamma_{x_1,x2} \approx 0$ . Further the eigenvalues of the frame co variance matrix will have the largest eigenvalue orders of magnitude larger that the smallest eigenvalue. The largest eigenvalue will correspond to the speech signal if present. Denote the eigenvalues as $\lambda_{max}$ and $\lambda_{min}$ . We form a metric

$M_{x_1,x_2} = \lambda_{max} i\Gamma_{x_1,x2}$

and compare the metric to the noise floor which is a function of previous $\lambda_{min}$ values of noise only frames. A sample of the performance of this VAD is shown on the Figure below.

Result from VAD

VOCAL Technologies offers custom designed direction of arrival estimation solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

Complete Communications Engineering

Eigen Space Voice Activity Detector for Multiple Microphone Array