Echo Paths for Hands-Free Terminals

Echo cancellation, as a part of Voice Enhancement, is expected to perform under various echo path situations. For Line Echo Canceller (LEC) and Network Echo Cancellers (NEC) these requirements are outlined in detail in the normative document ITU-T G.168-2012 (cf. [1]). For Acoustic Echo Cancellers there are no requirements that would correspond to the requirements as in [1], with the same degree of detail; the level of detail of ITU-T G.167 (on Acoustic Echo Controllers) was relatively close to earlier version of [1] but this document is already defunct. Note that unlike G.168 mandating testing with specific echo path models, the G.167 (and any superseding documents) did not provide echo path models; instead it mandated, using general language, that AEC devices were expected to perform well for echo path delays of 400 milliseconds.

Figure 1: P.340 recommended models for acoustic impulse responses for teleconference systems, hands-free terminals and videophones and for mobile radio terminals

One of the standard documents, ITU-T P.340 ([2]), provides requirements regarding acoustic echo paths yet it too does not include the actual numerical models or otherwise. A general characteristic of P.340 recommendation is that instead of giving echo path models (in a form of impulse response vectors or lookup tables) it refers to real rooms or enclosures with appropriate acoustic characteristics; thus, the exposure of the device under test (DUT) to real life conditions as a means of verifying its functionality is the preferred recommended approach.

Nevertheless echo paths simulated by electronic devices like digital reverberators with non time-varying reflection patterns can be used as well provided the terminal is equipped with an adequate interface.

Instead of the actual digital models of the echo paths (including the reverberation part), the recommendation indicates that a digital simulator of echo paths (which could be a vector representing the echo path and defined in the respective software module) should comply with the values recommended for real rooms or enclosures. The envelope of the simulated echo path should also comply with envelopes of real-life echo path impulse responses.

With these recommendations of the ITU-T P.340 standard, the approach to creating digital models includes the following practical actions:

(a) Collection of audio data for the real test rooms and other rooms and generate the actual impulse response using pre-defined location of the excitation and sensor

(b) Verification of the AEC functionality by evaluating its model (or prototype) provided the data representing the echo path impulse response can be “plugged in” the simulation model

Document ITU-T P.340 does not elaborate on any specific method of collecting impulse response. Thus, any of the methods described briefly in Ref.[3] should be adequate. It is understandable though that the environmental background noise influences the collected audio data. Thus, some specific methods are more prone to environmental factors that the other methods. For example, if the approach to generating the impulse responses is based on the Golay method, then the potential errors are minimized.

More tangible requirements regarding echo path impulse responses, a.k.a. acoustics impulse responses (AIR) or room impulse responses (RIR) follow.

– The echo path delay for teleconference systems shall be 400ms; the recommendation document refers to “reverberation time averaged over the transmission bandwidth” and it is certainly open to different interpretations. Nevertheless, its typical interpretation is that RT₆₀ for the whole band (i.e., for the narrow band with Fs of 8000Hz, the whole band is [300Hz , 3400Hz] and for the wide band with Fs of 16000Hz the whole band is [150Hz , 7000Hz]. This then implies that the typical AIR length for simulation shall be 400ms; moreover the recommendation specifies characteristics of RIR in two sub-bands:

For the lowest octave the RT₆₀ shall be less than twice of the average value (i.e., less than 800 ms)
For the highest octave the RT₆₀ shall be less than half of the average values, that is less than 200 ms.
The volume of the test room should be 90 m3 approximately.

– For hands-free terminals and videophones, the average reverberation time shall be 500ms; the lowest octave and the highest octave RT₆₀ is defined similarly to the case of teleconference systems (i.e., 1000ms and 250ms , respectively); the volume of the test room should be approximately 50 m3.

– The recommendation also defines conditions for typical test room for mobile radio terminals (RT₆₀ of 60 ms and volume of approximately 2.5 m3.

Figure 1 illustrates the above recommended requirements for RIR in the form of waveform as well as in a form profile of short-term energy. The recommended conditions of verifying real-life echo path delays (or for creating them via collection of audio data and generating RIR using off-line computations) are adequate for creating our own RIR models that comply with the P.340.

VOCAL’s Sub-band Acoustic Echo Canceller design has been successfully tested in typical acoustical environments and deployed widely. Its nominal echo path coverage is 128ms and, if necessary, it can be increased or decreased according to the specific requirements. VOCAL’s Acoustic Echo Canceller is ported onto any of the typical DSP processors. Contact us to discuss your application with our engineering staff.

More Information

References

ITU-T, G.168-2012 (02/12); Digital Network Echo Cancellers.
ITU-T P.340, Transmission characteristics and speech quality parameters of hands-free terminals (05/2000)
ON IMPULSE RESPONSE ESTIMATION FOR AUDIO VIA COMPLEMENTARY SEQUENCES

Complete Communications Engineering

More Information

References