Echo Reduction Detection Algorithm

Pre/post echoes are a disturbing artifact in transform coding. These artifacts are most notable when there is a sudden increase or decrease in signal energy. Because of its affect on the quality of the communication, standards such as G.729.1 have a module to reduce the echoes present in the decoded speech. Building on, The Cause of Echoes in Coding, one can begin to decide on how to reduce the affect of echoes.

There are two approaches to handling pre-echoes: in encoder and/or in the decoder. Both cases rely on detector for determining the transitions regions that cause echo. G.729.1, which has reduction built in the decoder, has the advantage of the time-domain coded signal (which will not have any echo in it) as reference signal for reducing the echo in the Modified Discrete Cosine Transform (MDCT) enhancement layers of the coder. In this system there are two levels of detection. First, two frames of the synthesized MDCT signal are concatenated. This concatenated signal is split up into 8 subframes of 5ms. If one of the subframes has significantly more energy than its neighboring subframes, this frame has a potential for having generated pre/post echo. The next level of detection compares the level of signal energy between the decoded time-domain and transform domain signals. If the signal energy of the MDCT signal is greater than that of the time-domain, then an echo region can safely be claimed as an artifact. The gain used to modified the data is:

$g(n) = \left( \frac{E_{TD}}{E_{MDCT}} \right)^{\frac{1}{2}}$

,where E_TD is the energy of the time-domain signal and E_MDCT is the energy of the transformed signal.

Methods for reducing echoes in the encoder use psychoacoustic properties of the human ear and adaptive window lengths. The detector in an encoder also observes the energy in small subframes and looks ahead for significant changes in energy. More advanced detectors will make use of some pre-filtering to ensure any significant transients likely to cause echoes are detected. For example, if there is steady low frequency signal source, and a high frequency signal source is added to it, then the high frequency source will not get detected when the low frequency signal energy dominates. Therefore, the signal should be high-pass filtered so high frequency transients can be detected.

Once a potentially troubled area of the signal has been detected, the encoding mechanism can be modified to reduce the affects of the echo. This is where the masking effects of psychoacoustic analysis and adaptive window lengths come into play. The pre-masking effect is when a target signal is inaudible before the onset of a masker. Pre-masking has a duration of about 5ms. Pre-echo artifacts generally fall below this masking threshold. The post-masking effect is when for about 50 to 300ms, a target signal after the offset of the masker is inaudible. This is why post-echo artifacts are of a less concern than pre-echoes. When a transient period has been detected, the window length can be adaptively shortened so the affect of the echo is less than 5ms, and imperceptible.

More Information

Speech Enhancement Design

Complete Communications Engineering

Echo Reduction in Coding

More Information