Precise location estimates of sound sources are becoming important in many applications such as gaming, beamforming and conferencing. In most of these applications, a half-plane resolution is required since, for example, in gaming applications, the the user stands in front of the device only. This has led to many gadgets employing a linear array of microphones. The combination of the sensors to estimate the location of the source leads to many different algorithms. Many of the existing algorithm combine the signals at the senors to achieve a resolved angle of arrival of the signal. The approach employed here assumes that the incident angle at all the senors are identical since a far field assumption is used. We present an alternate which does not make this assumption and can be used for both far field and near field models and further produce a closed form solution devoid of searching over a grid. Figure 1: M ULA microphones

The pair-wise time difference of arrival, $\tau_{i,j} = \tau_j - \tau_i$ can be obtained using GCC-PHAT, SRP-PHAT, coherence, or any optimal algorithm for an particular application. Note that, even though these approaches, GCC-PHAT etc, are used primarily for angle of arrival estimations, their primary intermediate output the time difference of arrival at paired sensors before data fusion to determine angle of arrival. The range of the source from each sensor, $1$ to $M$, from left to right on Figure 1 is: $r_i = \sqrt{((i-1)d + r\sin{\theta})^2 +(r\cos{\theta})^2}$

Thus the pairwise time difference of arrivals obey: $\left((c\tau_{i,j})^2-2r -((j-1)^2+(i-1)^2)d -(j+i-2)d r \sin{\theta}\right) = 4(r^2+(j-1)^2 d + (j-1)dr\sin{\theta}) (r^2+(i-1)^2 d + (i-1)dr\sin{\theta})$

We can pivot around the first microphone to reduce the equation above to $r = \frac{(c\tau_{1,j})^2-(j-1)d}{2c\tau_{1,j}+(j-1)d\sin{\theta}}, j \neq 1, j \in \{2,\cdots,M\}$

We can further resolve the angle of arrival, and inherently the source using: $d \begin{bmatrix}c^2\left(2\tau_{1,2}^2-\tau_{1,3}^2\right)+2d \\c^2\left(3\tau_{1,2}^2-\tau_{1,4}^2\right)+6d\\\vdots \\c^2\left((M-1)\tau_{1,2}^2-\tau_{1,M}^2\right)-(M-1)(2-M)d \\c^2\left(3\tau_{1,3}^2-2\tau_{1,4}^2\right)+6d\\\vdots\\c^2\left((M-1)\tau_{1,3}^2-2\tau_{1,M}^2\right)-2(M-1)(3-M)d\\\vdots\\c^2\left((M-1)\tau_{1,M-1}^2-(M-2)\tau_{1,M}^2\right)+(M-2)(M-1)d \end{bmatrix} \sin{\theta} = 2 \begin{bmatrix}c^3 \tau_{1,3}\tau_{1,2} (\tau_{1,3}-\tau_{1,2})+2 d \\c^3 \tau_{1,4}\tau_{1,2} (\tau_{1,4}-\tau_{1,2})+6 d\\\vdots\\c^3 \tau_{1,M}\tau_{1,2} (\tau_{1,M}-\tau_{1,2})-(M-1)(2-M) d\\c^3 \tau_{1,4}\tau_{1,3} (\tau_{1,4}-\tau_{1,3})+6 d\\\vdots\\c^3 \tau_{1,M}\tau_{1,3} (\tau_{1,M}-\tau_{1,3})-2(M-1)(3-M) d\\\vdots \\c^3 \tau_{1,M}\tau_{1,M-1} (\tau_{1,M}-\tau_{1,M-1})+(M-2)(M-1) d \end{bmatrix}$

Characterizing the noise in the time difference of arrival estimate as a zero mean random process for each microphone will lead to the errors in the least squares solution to the parameter $\theta$ approaching zero as the number of microphones become large.

VOCAL Technologies offers custom designed direction of arrival estimation solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!