Statistical optimal beamforming are used for super-directive beamforming, where maximum energy of the steering vector is concentrated at fixed points. Due to the presence of noise and reverberation, the known or estimated steering vector may contain some errors, which may lead to attenuation of the desired signal in mot cases. To deal with this problem, diagonal loading is employed to broaden the beamwidth of the of the principal beam.
Consider a far field source impinging N uniform linear array (ULA) microphones as shown in Figure 1 below:

Figure 1: N ULA microphones

Suppose the signal at each microphone $i \in \{1, \cdots, N\}$ is given as:

$x_i(w) = s(w) W_{i}(w) +\sum\limits_{l=1}^L v_l(w) e^{\left(-jw (i-1)\frac{d}{c} \sin{\left(\psi_l \right)} \right)} +n_i(w)$

where $s(w)$ is the desired speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to the broadside,  $v_l(w)$ is the correlated desired signal such that $\mathbb{E}[v_l(w) v_l^*(w)] = \sigma_{v_l^2}$ and  $\mathbb{E}[ s(w) e^{\left(-jw \frac{d}{c} \sin{\left((i-1)\psi - \theta\right)} \right)} v_l^*(w)] = 0, \forall i \in \{1, \cdots, N\}, \forall l \in \{1,\cdots,L\}$. $\psi_l$ is the direction of arrival (DOA) of the $l^{th}$ correlated noise with respect to the broadside. Here,

$W_{i}(w) = e^{\left(-jw (i-1)\frac{d}{c} \sin{\theta} \right)}$

and

$W(w) = [W_1(w), \cdots, W_N(w)]^T$

The statistical optimal robust beamformer is posed as:

$\underset{W(w)}{\mathrm{argmin}} ~~\mathbb{E}\left [{W^H(w) R_{x}(w) W^H(w)}\right] ~ ~ ~ s.t ~ ~ |s(\theta,w)^H W(w)|^2 \ge 1 , ~ ~ \theta \in [\theta_1, \theta_2],$where $R_{x}(w)$ is the signal plus noise covariance matrices.

To compensate for the mismatch, a diagonal penalty is imposed to transform the problem to;

$\underset{W(w)}{\mathrm{argmin}} ~~{W^H(w) R_{x}(w) W^H(w)} + \gamma |W(w)|^2~ ~ ~ s.t ~ ~ |s(\theta,w)^H W(w)|^2 \ge 1 , ~ ~ \theta \in [\theta_1, \theta_2],$

This can be interpreted as the covariance matrix subsuming the error in the weight matrix. Using $\gamma$ to be too big will cause the algorithm to put all efforts at suppressing white noise and will result in a performance akin to a delay and sum beamformer.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!