
The concept for de-reverberation is similar to that of acoustic echo cancellation. De-reverberation attempts to model the impulse response of the acoustic environment and filter it from the received microphone signal. The problem is more complex than in acoustic echo cancellation because here we do not have access to the original sound source. A sound source located a distance from a microphone will have its direct path signal arrive at the microphone and then reflections of the original signal will be captured. If the sound source is located close to the microphone then the signal to reverberation ratio will be low, but as the distance from the source to the microphone increases the effect of reverberation increases.
The transfer function between the sound source and microphone is called the room impulse response. In order to recover the original sound source, the received microphone signal can be convolved with the inverse of the room impulse response. Generally, the system can only be approximated because it is rarely minimum-phase, i.e. causal and invertible.
There are several approaches to obtaining an estimate of the transfer function. One approach is to use cepstrum analysis. Cepstrum is the Fourier transform of the log spectrum , DFT(log(X(ω))). This is a measure of the frequency of variation in the log spectrum. Speech is considered slowly varying relative to the reverberant components in the log spectrum. Therefore, the speech and the transfer function components can be separated.
Another approach to estimating the transfer function is to use the linear prediction (LP) residuals. Clean speech components cause the LP residuals to remain close to zero, while reverberation causes LP residuals to be time-varying. Thus, reverberation lowers the kurtosis of the probability distribution of the LP residuals relative to clean speech. The objective function of the adaptive filter used for removing the reverberation components is to maximize the kurtosis of the LP residuals.