Call Today 716.688.4675

Using Noise Modeling for Speech Enhancement

Using noise modeling for speech enhancement to determine parameters such as fundamental frequency, it is important that the spectrum of the noise be taken into account. When the noise is white, all bands will be equally corrupted by noise, and hence the bands with the highest clean speech energy will be the most reliable estimators.

In contrast, when the noise is pink or brown and the SNR is low enough, the low band will be least reliable. In this scenario, a more accurate estimation can be found by considering only the mid to high frequency bands, assuming the noise is uncorrelated with the clean speech. When these bands are extracted via a high pass filter, the majority of the noise power will be filtered out due to the roll off of the noise spectrum.

Estimating the color of the noise can be done using a variety of methods. The simplest way is to assume the first N frames of the incoming signal are noise only, and use the Welch or Bartlett method to estimate the noise power spectral density. This estimate can then be recursively smoothed through time.

Another approach is to use Minimum Statistics. In this method, the frames with the least power are assumed to be noise only and estimation is done using the data only in these frames. Alternatively, a Voice Activity Detector can be used to estimate the noise power spectrum only during periods of non-speech activity.

Once a reliable estimate of the noise power spectrum has been obtained, regression analysis can be performed to estimate the slope of spectral envelope. The slope of the regression line will determine the color of the noise. Often, this coefficient will not be an integer and hence the data will have a mixed color. Never the less, from this information, you can determine up to what band does x% of the noise power lie, and high pass at that point to extract what clean speech is actually in the observed signal. When this clean signal comprises a large portion of the original bandwidth, a reliable full signal parameterization can be obtained.

More Information

VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713