Call Today 716.688.4675

Statistical optimal beamformers: reference signal beamformer

Beamformers are grouped into two general classes: data independent and statistically optimal beamformers. Data independent beamforming relies solely on spatial properties such as forming a beam at a specific direction of arrival whilst nulls are formed at other all other directions. In statistical optimal beamforming, the filter weights used are dependent on the statistics of the received data. The reference signal beamformer (RSB), is a popular because the direction of arrival of the desired signal needs not be known. This beamformer is formulated in a similar way as the conventional acoustic echo canceler.
Consider a far field source impinging N uniform linear array (ULA) microphones as shown in Figure 1:

N ULA microphones

Figure 1: N ULA microphones


Suppose that the desired signal y_d = s(w) is known a prior. Also suppose the signal at each microphone i \in \{1, \cdots, N\} is given as

x_i(w) = s(w) W_{i}(w) +\sum\limits_{l=1}^L v_l(w) e^{\left(-jw (i-1)\frac{d}{c} \sin{\left(\psi_l \right)} \right)} +n_i(w)

where s(w) is the desired speech signal, \theta is the direction of arrival (DOA) of the speech signal with respect to the broadside,  v_l(w) is the correlated desired signal such that \mathbb{E}[v_l(w) v_l^*(w)] = \sigma_{v_l^2} and  \mathbb{E}[ s(w) e^{\left(-jw \frac{d}{c} \sin{\left((i-1)\psi - \theta\right)} \right)} v_l^*(w)] = 0, \forall i \in \{1, \cdots, N\}, \forall l \in \{1,\cdots,L\}. \psi_l is the direction of arrival (DOA) of the l^{th} correlated noise with respect to the broadside. Here,

W_{i}(w) = e^{\left(-jw (i-1)\frac{d}{c} \sin{\theta} \right)}

The statistical optimal criterion for RSB is to find weights W_{i}(w) such that:

\underset{W_{i}(w)}{\mathrm{argmin}} ~~\mathbb{E} [|y_d(w) - W_{i}^H(w) x_i(w)|^2]


x_i(w) = s(w) e^{\left(jw (i-1)\frac{d}{c} \sin{\left(\theta \right)} \right)} + n_i(w)

Notice that the directional signal \nu_l is not used in the estimate of the filter weights.
It can be shown that the optimum weights correspond to :

W_{i}(w) = \mathbb{E} [y_d(w) - W_{i}^H(w) x_i(w)]^{-1} \times \mathbb{E} [x_i(w) y_d(w)]

with the output signal being

y_o(w) = \frac{1}{N} \sum\limits_{i=1}^N W_{i}^H(w) x_i(w) = s(w) + \frac{1}{N} \sum\limits_{i=1}^N W_{i}^H(w) n_i(w)

In the presence of \nu_l, the signal W_{i}^H(w) x_i(w)] is subtracted from each signal x_i(w) before any further processing is done. The main drawback or challenge of this approach is that the desired signal has to be known a priori making it useful in cases of joint echo cancellation and beamforming.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

More Information

VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713