The rule of thumb for delay and sum beamformers is that a doubling of the number of microphones leads to a $3dB$ gain. This holds $iff$ there is no interfering signal which is correlated across all microphones. We derive the expected signal to noise plus interference ratio (SNIR) gains for ULA microphones. Consider a far field source impinging N ULA microphones through an anechoic medium as shown in Figure 1:

Figure 1: N ULA microphones

Suppose the signal at each microphone $i \in \{1, \cdots, N\}$ is given as:

$x_i(w) = s(w) e^{\left(-jw \frac{(i-1) d}{c} \sin{\theta} \right)} + v(w) e^{\left(-jw \frac{(i-1) d}{c} \sin{\beta} \right)} + n_i(w)$

where $s(w)$ is the desired speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to the normal to the axis joining all the microphones, $v(w)$ is the directional interfering signal, $\beta$ is the DOA of the interfering signal and $n_i(w)$ is zero mean i.i.d noise with variance $\sigma_n^2$.

The input SNIR per frequency bin $w$, denoted $iSNIR(w)$ is given as:

$iSNIR = \frac{|s(w)|^2}{|v(w)|^2 + \sigma_n^2 }$

After the delay and sum beamformer, the output becomes:

$x(w) = s(w) + v(w) \frac{1}{N} \frac{\sin{\left(w N \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}}{\sin{\left(w \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}} e^{\left(jw \frac{N-1}{2} \frac{d}{c}( \sin{\theta} - \sin{\beta}) \right)}+ \frac{1}{N} \sum\limits_{n=0}^{N-1} n_{i+1} (w) e^{\left(jw n \frac{d}{c}\sin{\theta} \right)}$

The output SNIR per frequency bin $w$, denoted $oSNIR(w)$ is given as

$oSNIR = \frac{|s(w)|^2}{\frac{\left |v(w) \right|^2}{N^2} \left| \frac{1}{N} \frac{\sin{\left(w N \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}}{\sin{\left(w \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}}\right| + \frac{1}{N} \sigma_n^2}$

The SNR improvement, SNIRI then becomes:

$SNRI = \frac{oSNR}{iSNR} = \frac{N^2 (\alpha(w) +1)}{\alpha(w) \left |\frac{\sin{\left(w N \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}}{\sin{\left(w \frac{d}{2c}( \sin{\theta} - \sin{\beta})\right)}} \right|^2 + N}$

where $\alpha(w) = \frac{|v(w)|^2}{\sigma_n^2}$.

A sample expected SNIRI at a frequency of $4kHz$ is shown in Figure 2 below $d = 50mm$ using $4$ microphones for $\alpha =0$ and $\alpha = 100$ or 20dB. It should be noted that $\alpha =0$ reduces to conventional N fold improvement as shown on the plot. The desired direction is $90^{\circ}$.

Figure 2: SNRI (dB) for different DOA(degrees) and $latx d$

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!