The Artifacts of Spectral Subtraction

In hands-free communications, the desired speaker and the device microphone are located some distance apart, resulting in a signal with a low SNR. As a result, speech enhancement is required to improve the perceptual quality of the communication. The classical approach for speech enhancement is single-channel noise reduction using spectral subtraction. In spectral subtraction, the idea that the noise is additive, Y_ƒ[n]=S_ƒ n]+B_ƒ[n] where Y_ƒ, is the Short Time Fourier Transform (STFT) of the received microphone signal at frame n for a frequency bucket ƒ. S_ƒ and B_ƒis the STFT of the desired speech and noise signals, respectively. Once an estimate of the noise spectrum is obtain it can be subtracted from the noisy-speech signal.

∣Ŝ_ƒ(n)∣ = {	∣Ŷ_ƒ(n)∣ – β∣B̂_ƒ(n)∣	if	Ŷ_ƒ > βB̂_ƒ
∣Ŝ_ƒ(n)∣ = {	0	else

(1)

where β is a scaling factor. This subtraction of spectral magnitudes can sometimes result in distortions and artifacts.

An example of the artifacts of spectral subtraction for noise reduction is musical noise. Musical noise are little islands of spectrum power in a signal, that appear randomly in different frequency buckets from frame to frame. This results in a twinkling sounding noise, that can be quite annoying to the listener. Why does it occur? It is a result of half-wave rectification in (1). Since negative spectrum magnitude does not exist; when the noise estimate is greater than the noisy speech signal, the half-wave rectification forces the estimated speech signal for that frequency to be zero.

The value of β can be adjusted to reduce the affect this artifact. Lowering the value of β, will effectively raise the noise floor, raising the valleys of spectral energy. Raising the value of β, often referred to as oversubtraction, can eliminate the islands of spectral energy. This will likely result in some distortion of desired speech. Besides adjusting β, there are several other methods used to mitigate the affects of musical noise. For example, the value of S_ƒ[n] can be smoothed temporally and/or over frequency buckets to reducing the peaks of energy. In addition, classification of the spectral energy of the time-frequency can be made by studying the time-frequency buckets surrounding the point of interest. The classification can be used to decide if the energy in the time-frequency bucket is related to musical noise and can be removed.

More Information

Speech Enhancement Design

Complete Communications Engineering

More Information