The performance of automatic speech recognition (ASR) engines in vehicle environments are degraded due to the presence of vehicle ambient noise. Compared with beamforming, nullforming solves the problem with a sharp null towards the unwanted signal as opposed to a narrow beam towards the target signal. Consider the microphone array topology illustrated in Figure 1 below: Figure 1: 2 microphone array

Two differential signals are synthesized in the time-frequency domain, thus: $x_1(w) = s(w)\left(1-e^{-jw \frac{d}{c}(1+\cos{\theta})}\right)$

and $x_2(w) = s(w)\left(e^{-jw \frac{d}{c}}-e^{-jw \frac{d}{c}\cos{\theta}}\right)$

An optimal adaptive filter is desired to extract the speech signals using: $y(w) = x_1(w) - W(w) x_2(w)$

where $W(w) \in \mathbb{R}$. We can minimize the output power such that in the noise free case: $J(w) = \mathbb{E} [y(w) y(w)^*] =0$

Here,  $^*$ denotes the complex conjugate. Thus $J(w) = \Phi_{x_1(w),x_1(w)} - W(w)\Phi_{x_1(w),x_2(w)} - W(w)\Phi^*_{x_1(w),x_2(w)} +|W(w)|^2\Phi_{x_2(w),x_2(w)}$

where $\Phi_{x_i(w),x_j(w)} = \mathbb{E}[x_i(w) x^*_j(w)]$. $\frac{\partial{J(w)}}{\partial{W(w)}} = -2\mathbb{R}e\{\Phi_{x_1(w),x_2(w)}\} +2 W(w) \Phi_{x_2(w),x_2(w)}$

This leads to the optimal weighting satisfying: $W_{opt}(w) = \frac{\mathbb{R}e\{\Phi_{x_1(w),x_2(w)}\} }{\Phi_{x_2(w),x_2(w)}}$

As a custom design house, VOCAL Technologies offers custom designed solutions for blind signal separation with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!