Beamformers are grouped into two general classes: data independent and statistically optimal beamformers. Data independent beamforming relies solely on spatial properties such as forming a beam at a specific direction of arrival whilst nulls are formed at other all other directions. In statistical optimal beamforming, the filter weights used are dependent on the statistics of the received data. The reference signal beamformer (RSB), is a popular because the direction of arrival of the desired signal needs not be known. This beamformer is formulated in a similar way as the conventional acoustic echo canceler.
Consider a far field source impinging N uniform linear array (ULA) microphones as shown in Figure 1:

Figure 1: N ULA microphones

Suppose that the desired signal $y_d = s(w)$ is known a prior. Also suppose the signal at each microphone $i \in \{1, \cdots, N\}$ is given as

$x_i(w) = s(w) W_{i}(w) +\sum\limits_{l=1}^L v_l(w) e^{\left(-jw (i-1)\frac{d}{c} \sin{\left(\psi_l \right)} \right)} +n_i(w)$

where $s(w)$ is the desired speech signal, $\theta$ is the direction of arrival (DOA) of the speech signal with respect to the broadside,  $v_l(w)$ is the correlated desired signal such that $\mathbb{E}[v_l(w) v_l^*(w)] = \sigma_{v_l^2}$ and  $\mathbb{E}[ s(w) e^{\left(-jw \frac{d}{c} \sin{\left((i-1)\psi - \theta\right)} \right)} v_l^*(w)] = 0, \forall i \in \{1, \cdots, N\}, \forall l \in \{1,\cdots,L\}$. $\psi_l$ is the direction of arrival (DOA) of the $l^{th}$ correlated noise with respect to the broadside. Here,

$W_{i}(w) = e^{\left(-jw (i-1)\frac{d}{c} \sin{\theta} \right)}$

The statistical optimal criterion for RSB is to find weights $W_{i}(w)$ such that:

$\underset{W_{i}(w)}{\mathrm{argmin}} ~~\mathbb{E} [|y_d(w) - W_{i}^H(w) x_i(w)|^2]$

where

$x_i(w) = s(w) e^{\left(jw (i-1)\frac{d}{c} \sin{\left(\theta \right)} \right)} + n_i(w)$

Notice that the directional signal $\nu_l$ is not used in the estimate of the filter weights.
It can be shown that the optimum weights correspond to :

$W_{i}(w) = \mathbb{E} [y_d(w) - W_{i}^H(w) x_i(w)]^{-1} \times \mathbb{E} [x_i(w) y_d(w)]$

with the output signal being

$y_o(w) = \frac{1}{N} \sum\limits_{i=1}^N W_{i}^H(w) x_i(w) = s(w) + \frac{1}{N} \sum\limits_{i=1}^N W_{i}^H(w) n_i(w)$

In the presence of $\nu_l$, the signal $W_{i}^H(w) x_i(w)]$ is subtracted from each signal $x_i(w)$ before any further processing is done. The main drawback or challenge of this approach is that the desired signal has to be known a priori making it useful in cases of joint echo cancellation and beamforming.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!