
The evolution of telepresence and hands-free communications has been made possible by the advanced signal processing within the hardware devices. There are two main acoustic signal processing challenges of hands-free device. The first is the feedback of the loudspeaker signal reflected through the room to microphone forces an echo control system to be in place to ensure the echoes are not of a disturbance to the far-end user. The second problem is the low signal-to-noise ratio of the near-end speaker due to the long distance between the speaker and the microphone. By tackling these challenges in a joint effort, more efficient algorithms can be developed while yielding equivalent performance to a system than using separate algorithms for each problem.
It is more beneficial to the overall performance of the system if the echo canceller comes before the echo and noise reduction system. The main disadvantage of the echo canceller preceding the combined reduction system is that the adaptive filter of the echo cancellation system has to process noisy signals, which puts a theoretical bound on the achievable attenuation of canceller. In most scenarios, this disadvantage is out weighed by the fact, that placing by a noise reduction filter before the echo canceller adds variability to the echo path, significantly limiting the ability of the adaptive filter to train to the echo path. An additional advantage of placing the echo canceller before combined reduction system is that the level of the echo (i.e. non-stationary noise sources) is greatly reduced.
In the combined echo and noise reduction system, the signal model is y(n) = s(n) + b(n) + d(n), where y(n) is the microphone signal after the echo canceller, s(n) is the desired near-end speech, b(n) is the additive noise signal, and d(n) is the residual echo signal at time instance n. The goal is to design an adaptive filter such that Ŝ(ω,n) = Hc(ω,n)(Y(ω,n)). In Post Filtering for Residual Echo Control, it was shown that the attenuation factor for residual echo was
|
(1) |
where Ŝd(ω,n) is the estimate of the spectrum energy of the residual echo at frequency, ω at time n. Similarly,
|
(2) |
can be used as the attenuation factor for the ambient noise components, where Ŝd(ω,n) can be estimated using techniques described in Noise Reduction of Non-stationary Noise Sources. Therefore, (1) and (2) can be combined to produce
| Hc(ω,n) = max{ Hmin, Hb(ω,n) ⋅ Hres(ω,n) } | (3) |
where Hmin is maximum allowable attenuation.
The method above describes a single channel estimation approach to for combined echo and noise control. Other approaches also can modify the combined filter to psychoacoustic designs of human ear and/or use multiple channels to take advantage of the spatial coherence of the noise sources to further improve the perceptual quality of the system.