Conventional derivations for the signal to noise ratio improvements using delay and sum beamformer is that you get $3dB$ gain for every doubling of the number of microphones being deployed. This holds $iff$ the noise is not directional or in other words uncorrelated. We derive the expected SNR gains for uncorrelated noise on ULA microphones.
Consider  a far field source impinging N ULA microphones as shown in Figure 1: Figure 1: N ULA microphones

Suppose the signal at each microphone $i \in \{1, \cdots, N\}$ is given as $x_i(w) = s(w) e^{\left(-jw \frac{(i-1) d}{c} \sin{\theta} \right)} + v_i(w)$

where $s(w)$ is the desired speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to the normal to the axis joining all the microphones, $v_i(w)$ is the uncorrelated noise such that $\mathbb{E}[v_i(w) v_j^*(w)] = 0, i \neq j, \{i,j\} \in \{1, \cdots, N\}$

and $\mathbb{E}[s(w) e^{\left(-jw \frac{(i-1) d}{c} \sin{\theta} \right)} v_j^*(w)] = 0, \forall {i,j} \in \{1, \cdots, N\}$.

The input SNR per frequency bin $w$, denoted $iSNR(w)$ is given as $iSNR = \frac{\mathbb{E}\left[|s(w)|^2 \right]}{\mathbb{E}\left[\left |v_1(w)\right|^2 \right]}$

where $\mathbb{E}[.]$ is the expectation operator.  After the delay and sum beamformer, the output becomes $x(w) = s(w) + \frac{1}{N} \sum\limits_{n =0}^{N-1} v_{n+1}(w) e^{\left(jw n \frac{d}{c} \sin{\theta} \right)}$

The output SNR per frequency bin $w$, denoted $oSNR(w)$ is given as $oSNR = \frac{\mathbb{E}\left[|s(w)|^2 \right]}{\mathbb{E}\left[\left | \frac{1}{N} \sum\limits_{n =0}^{N-1} v_{n+1}(w) e^{\left(jw n \frac{d}{c} \sin{\theta} \right)} \right|^2 \right]}$

But $\left| \frac{1}{N} \sum\limits_{n =0}^{N-1} v_{n+1}(w) e^{\left(jw n \frac{d}{c} \sin{\theta} \right)} \right|^2 =\frac{1}{N^2} \sum\limits_{n =0}^{N-1} |v_{n+1}(w)|^2 + \frac{1}{N^2} \sum\limits_{n =1}^{N} \sum\limits_{m \neq n}^{N} v_{n}(w) v_{m}^*(w) e^{\left(jw (n-m) \frac{d}{c} \sin{\theta} \right)}$

Since by assumption $\mathbb{E}[v_i(w) v_j^*(w)] = 0, i \neq j, \{i,j\} \in \{1, \cdots, N\}$ $\mathbb{E}\left[\left| \frac{1}{N} \sum\limits_{n =0}^{N-1} v_{n+1}(w) e^{\left(jw n \frac{d}{c} \sin{\theta} \right)} \right|^2\right] = \frac{\mathbb{E}[|v_1(w)|^2]}{N}$

This leads to an oSNR of $oSNR = N \frac{\mathbb{E}\left[|s(w)|^2 \right]}{\mathbb{E}[|v_1(w)|^2]}$

The SNR improvement, SNRI then becomes $SNRI = \frac{oSNR}{iSNR} = N = 2^{\frac{\log{N}}{\log{2}}}$

Thus, if $N$ is increased by a factor of $2$, the SNRI increases by a factor of $10 \log{2} \approx 3dB$.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!