VOCAL Print Logo
Voice Quality Enhancement >  Speech & Audio Enhancement >  Speech Enhancement and Speech Intelligibility

Speech Enhancement and Speech Intelligibility

In noisy environments, speech enhancement (or noise reduction) algorithms are typically employed to the improve the speech quality of the communication. The general goal of speech enhancement algorithms is to estimate the spectrum of the noise signal or estimate the clean speech signal in order to improve the overall signal-to-noise ratio (SNR). While this may improve the perceptual speech quality, it does not guarantee an improvement in speech intelligibility. The overall (time-domain) SNR is not highly correlated with speech intelligibility. In other words, an improvement in SNR does not increase speech intelligibility. Frequency domain SNR's segmented by the perception of the human auditory system, have a much higher correlation with both speech quality and speech intelligibility.

Another inherit flaw to most noise reduction techniques is the distortions introduced by the uncertainties in noise estimation. While this flaw is unavoidable in non-stationary environments, the effects of the distortions on intelligibility can be intelligently handled via the design criteria. There are two types of the distortions: attenuation distortion and amplification distortion. Attenuation distortion occurs when the estimated spectrum is less than the actual speech spectrum (generally resulting from an over-estimation of the noise spectrum). Amplification distortion occurs when the estimated spectrum is greater than actual speech spectrum (due to under-estimation of noise spectrum or a presence of a masker signal). It has been shown that an amplification distortion has more adverse affects on recognition rates than attenuation distortion.

Therefore, for a speech enhancement algorithm to improve both speech quality and speech intelligibility, the design goal should incorporate perceptually motivated SNR criteria, and a constraint on the distortions. These modifications for the speech enhancement algorithm help return an enhanced signal closer to the desired signal rather than attempting to maximize the overall SNR.