A metric that is used in quantifying the performance of a beamforming algorithm is the is the ability to suppress spatially uncorrelated noise. This normally comes from the quality of the microphones which are deployed due to the electric self noise that is generated. The technical term for this quantity is the white noise gain. The design of any beamformer should have the white noise gain in mind since amplifying white noise will negate any improvements gained from amplifying the desired signal with the use of multiple microphones.

Consider the microphone array as shown in Figure 1: Figure 1: N microphone array

The signal received at each microphone can be denoted in the time frequency domain as $x_i(t, \omega) = s(t,w) e^{-j\omega\tau_i} + v_i(t,w)$

where $x_i(t, \omega)$ is the recorded signal at microphone $i$, $s(t,w)$ is the desired signal, and $v_i(t,w)$ is the noise at microphone $i$. All the microphone signals can be cascaded in matrix notation such that ${\bf X}(t, \omega) = s(t,w) {\bf d}(\omega,\tau_1, \cdots, \tau_N) + {\bf v}(t,w)$

with ${\bf X}(t, \omega) = [x_1(t, \omega), \cdots, x_N(t, \omega)]^T$, ${\bf d}(\omega,\tau_1, \cdots, \tau_N) = [e^{-j\omega\tau_1}, \cdots, e^{-j\omega\tau_N}]^T$ and ${\bf v}(t, \omega) = [v_1(t, \omega), \cdots, v_N(t, \omega)]^T$. Now, suppose and estimate of the steering vector ${\bf d}(\omega,\tau_1, \cdots, \tau_N)$ denoted ${\bf W}(\omega,\tau_1, \cdots, \tau_N)$ is returned by an algorithm, them the output signal mow becomes $y(t,\omega) = {\bf W}^H (\omega,\tau_1, \cdots, \tau_N) {\bf X}(t, \omega)$

The output power spectral density, PSD, then becomes: $\Phi_{y(t,\omega) y(t,\omega)} = {\bf W}^H (\omega,\tau_1, \cdots, \tau_N) \Phi_{{\bf X}(t,\omega) {\bf X}(t,\omega)} {\bf W} (\omega,\tau_1, \cdots, \tau_N)$

In the noise free case, the PSD becomes: $\Phi_{y(t,\omega) y(t,\omega)} |_{{\text signal}} = |s(t,w)|^2{\bf W}^H (\omega,\tau_1, \cdots, \tau_N) \Phi_{{\bf d}(\omega,\tau_1, \cdots, \tau_N) {\bf d}(\omega,\tau_1, \cdots, \tau_N) } {\bf W} (\omega,\tau_1, \cdots, \tau_N)$

Similarly, in the signal free case, the PSD becomes: $\Phi_{y(t,\omega) y(t,\omega)} |_{{\text noise}} = {\bf W}^H (\omega,\tau_1, \cdots, \tau_N) \Phi_{{\bf v}(t,w) {\bf v}(t,w) } {\bf W} (\omega,\tau_1, \cdots, \tau_N)$

The white noise gain is the SNR for a unit variance uncorrelated noise, which implies that the expectation of the noise PSD becomes: $\Phi_{y(t,\omega) y(t,\omega)} |_{{\text white~noise}} = {\bf W}^H (\omega,\tau_1, \cdots, \tau_N) {\bf W} (\omega,\tau_1, \cdots, \tau_N) = N$

Thus, the white noise gain is given as: $WNG(t,\omega) = \frac{|s(t,w)|^2 |{\bf W}^H (\omega,\tau_1, \cdots, \tau_N) {\bf d}(\omega,\tau_1, \cdots, \tau_N)|^2}{ N}$

It should be noted that for non-line audio signals, there are also ambient white noise, hence the noise coherence may not be identity. The final beamforming algorithm design should factor in all these scenarios.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!