Complete Communications Engineering

Resolution of the angle of arrival of speech signals impinging on a microphone array is critical for beamforming and noise reduction in online applications. Due to computation and memory constraints of embedded DSPs, an efficient but fast algorithm is required to achieve any meaningful gains for beamforming. In speech processing, frame lengths on orders of 10 milliseconds are typical, further buttressing the need for a quick resolution of algorithms. The problem we wish to tackle is as follow: Given M frame recordings of a single source of speech, one frame recording for each microphone, each frame having N samples, determine the angle of arrival of the signal assuming a far field model. Figure 1 illustrates a typical microphone array.

efficient GCC algorithm

Figure 1: Square microphone array.

We limit our analysis to the 2-D case for brevity of presentation. The general approach is to use cross-correlation to find the delay between paired microphones. GCC-PHAT however requires \mathcal{O}(\hat{N}\log_2{\hat{N}}) additions and multiplications because the DFT’s of both signals need to be computed. Here, \underset{m \in \mathbb{Z}}{\mathrm{argmin}}~ \hat{N}=2^{m}\ge N. A faster approach will be to find the cross-correlations by leveraging the known maximum delay between a pair of microphones. This approach reduces the number of computations to \mathcal{O}(NL), where L is the maximum number of samples that can be delayed. The maximum delay on a typical DSP platform with a microphone spacing of 40mm will be as small as 2 samples using a sampling rate of 16kHz, thus making our approach at least 4 times faster than the so called GCC-PHAT and other DFT based approaches. For the illustrated microphone array in Figure 1, define the time delay between microphone i and microphone j as \tau_{i,j} =t_j - t_i,\{i,j\} \in \{1,\cdots,4\}, with t_i and t_j being the arrival times of a common sample data. Define the speed of sound as c. Let d be the distance between consecutive microphones m_i and m_j, as labeled on Figure 1, such that |i-j| = 1. Also let the distance between m_i and m_j be \sqrt{2} d for \mod{(|i-j|,3) =2}. Then the time difference of arrival of signals at the microphone arrays obey the following:

\begin{bmatrix}\tau_{1,2}, \tau_{1,3}, \tau_{1,4} ,\tau_{2,3} , \tau_{2,4} , \tau_{3,4}\end{bmatrix}^T = \frac{d}{c} \begin{bmatrix}\sin{\theta},\sin{\theta} + \cos{\theta},\cos{\theta} ,\cos{\theta} ,\cos{\theta}-\sin{\theta} ,\sin{\theta}\end{bmatrix}^T

Here, the \tau_{i,j}‘s are estimated using pairwise correlations using:

\underset{k \in [-L,L]}{\mathrm{argmax}}~ crr[k] = \sum\limits_{n=1}^{N} x_i[n]x_j[k+n], i \neq j, \{i,j\} \in \{1, \cdots, M\}

where the x_i[n]‘s are sampled data n at microphone i. The inter-sample delays may also be dealt with by simply using the adjacent bins near the peak. For example, suppose the peak of the correlation was at the k^{th} sample, then we will use the intersample value of:

\hat{k} =\frac{2k crr[k]-(k-0.5)crr[k+1]-(k+0.5)crr[k-1]}{2crr[k]-crr[k-1]-crr[k+1]}.

As a custom design house, VOCAL Technologies’ angle of arrival algorithms are applicable to a wide range of the microphone arrays that exist in reverberation environments. The selection of the algorithm is based on the requirements of the application and the available hardware configuration.

More Information