Beamformers are grouped into two general classes: data independent and statistically optimal beamformers. Data independent beamforming relies solely on spatial properties such as forming a beam at a specific direction of arrival whilst nulls are formed at other all other directions. In statistical optimal beamforming, the filter weights used are dependent on the statistics of the received data. The multiple sidelobe canceler (MSC) is a popular statistical optimal beamformer that is used mainly for simplicity. \noindentConsider a far field source impinging N uniform linear array (ULA) microphones as shown in Figure 1: Figure 1: N ULA microphones

Suppose the signal at each microphone $i \in \{1, \cdots, N\}$ is given as: $x_i(w) = s(w) e^{\left(-jw (i-1)\frac{d}{c} \sin{\left(\theta \right)} \right)} +\sum\limits_{l=1}^L v_l(w) e^{\left(-jw (i-1)\frac{d}{c} \sin{\left(\psi_l \right)} \right)} +n_i(w)$

where $latex s(w)$  is the desired speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to the broadside, $v_l(w)$  is the correlated undesired noise such that $\mathbb{E}[v_l(w) v_l^*(w)] = \sigma_{v_l^2}$

and $\mathbb{E}[ s(w) e^{\left(-jw \frac{d}{c} \sin{\left((i-1)\psi - \theta\right)} \right)} v_l^*(w)] = 0, \forall i \in \{1, \cdots, N\}, \forall l \in \{1,\cdots,L\}$. $\psi_l$is the direction of arrival (DOA) of the $l^{th}$  correlated noise with respect to the broadside.
Define the beamforming only solution as $y_b(w) = s(w) +\frac{1}{N}\sum\limits_{i=1}^N \sum\limits_{l=1}^L v_l(w) e^{\left(-jw (i-1)\frac{d}{c} \left(\sin{\left(\psi_l \right)} -\sin{\left(\theta \right)}\right)\right)} + \frac{1}{N}\sum\limits_{i=1}^N n_i(w) e^{\left(jw (i-1)\frac{d}{c} \sin{\left(\theta \right)} \right)}$

The statistical optimality criterion for MSC is to find weights $W_{msc}(w)$  such that: $\underset{W_{msc}(w)}{\mathrm{argmin}} ~~\mathbb{E} [|y_b(w) - W_{msc}^H(w) {\bf X}(w)|^2]$

where ${\bf X}(w) = [x_1(w),\cdots, x_N(w)]^T-s(w)[1,\cdots, e^{\left(jw (N-1)\frac{d}{c} \sin{\left(\theta \right)} \right)}]^T$

and $^H$  denotes the complex conjugate transpose operator (or hermitian).
It can be shown that the optimum weights correspond to : $W_{msc}(w) = \mathbb{E} [{\bf X}(w) {\bf X}^H(w)]^{-1} \times \mathbb{E} [{\bf X}(w) y_b(w)]$

The main drawback or challenge of this approach is the estimate of the signal free data ${\bf X}(w)$. A compromise of estimating ${\bf X}(w)$  when there is absence of desired speech is sometimes used together with a temporal averaging window to remove some effects of the white noise $n_i(w)$ . For desired signals arriving on the broadside of the array, a difference between adjacent sensors is utilized for example in the generalized sidelobe canceler approach of Griffith-Jim.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!