Ask Us or Call +1-716-688-4675

Speech Enhancement and Speech Intelligibility

While speech enhancement may improve the perceptual quality of communications, it does not guarantee an improvement in speech intelligibility. In noisy environments, speech enhancement (or noise reduction) algorithms are typically employed to improve the quality of communications.

The general goal of a speech enhancement algorithm is to estimate the spectrum of the noise signal or estimate the clean voice signal in order to improve the overall signal-to-noise ratio (SNR). Unfortunately, the overall (time-domain) SNR is not highly correlated with intelligibility. In other words, an improvement in SNR does not necessarily increase comprehension. However speech enhancement using frequency domain SNR’s segmented by the perception of the human auditory system have a much higher correlation with both the overall voice quality and speech intelligibility.

In addition, an inherent flaw in most noise reduction techniques for speech enhancement is the distortions introduced by the uncertainties in noise estimation. While this issue is unavoidable in non-stationary environments, the effects of the distortions on speech intelligibility can be appropriately handled via the design criteria. There are two types of distortions: attenuation distortion and amplification distortion.

Attenuation distortion occurs when the estimated spectrum is less than the actual voice spectrum which is generally a result of an over-estimation of the noise spectrum. Amplification distortion occurs when the estimated spectrum is greater than the actual spectrum which is due to an under-estimation of the noise spectrum or the presence of a masker signal. It has been shown that an amplification distortion has more of an adverse affect on speech recognition rates and intelligibility than attenuation distortion.

Therefore, to improve both the quality and intelligibility of speech, enhancement algorithm design goals should incorporate perceptually motivated SNR criteria along with a constraint on the distortions. Rather than attempting to maximize the overall SNR, these modifications to the algorithm help return an enhanced signal that is closer to the desired signal.

For more information

VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713