Complete Communications Engineering

Human ears do not distinguish signals that are too close in either frequency or time. This phenomenon is generally called noise masking. Noise masking has been used in speech enhancement.

A noise masking threshold can be used to adjust the parameters for various modified spectrum subtraction-based enhancement algorithms. Noise masking threshold may be obtained through modeling the frequency selectivity of the human ear and its masking property.

Human hearing system has a minimum threshold of audibility. However, the threshold rises if there are other signals that are close in either frequency or time. Signals below this threshold will be not detected, therefore, will not be audible. Inaudible noises or interferences do not have to be removed from the perceptual perspective. Noise reduction algorithms so defined are referred to as perceptual-based noise reduction algorithms.

Figure 1: Perceptual-Based Spectrum Subtraction

Figure 1 shows the perceptual-based spectrum subtraction noise reduction process. The new additions are the noise mask calculation unit and the new spectrum subtraction block with adjusted parameters in response to the noise mask T(f).

For example, in the following spectrum subtraction algorithm,

\left|\hat{S}\left(f\right)\right|^\gamma=\{\begin{matrix}\left|X\left(f\right)\right|^\gamma-\alpha\left|N\left(f\right)\right|^\gamma,&\left|X\left(f\right)\right|^\gamma/\left|N\left(f\right)\right|^\gamma\ >\ \alpha+\ \beta\\\beta\left|N\left(f\right)\right|^\gamma,&otherwise\\\end{matrix}

Instead of sending \left|\hat{S}\left(f\right)\right|^\gamma out as the noise reduction output, a perceptual mask T(f) is calculated and applied. T(f) makes enhancing during speech bursts realistic.

To discuss the details of this algorithm is beyond the scope of this short article, the main advantages are summarized below: 1) Noise mask can be calculated more accurately than SNR, 2) Noise mask is smoother than SNR, and 3) Parameters derived from noise mask achieves better perceptual results during speech bursts.