VOCAL Print Logo
Voice Quality Enhancement >  Noise Reduction >  The Artifacts of Spectral Subtraction

The Artifacts of Spectral Subtraction

In hands-free communication device, the desired speaker and the microphone are located some distance apart, resulting in a signal with a low SNR. Therefore, speech enhancement is required to improve the perceptual quality of the communication. The classical approach for single-channel speech enhancement is spectral subtraction. In spectral subtraction, the idea that the noise is additive, Yƒ[n]=Sƒ[n]+Bƒ[n] where Yƒ, is the Short Time Fourier Transform (STFT) of the received microphone signal at frame n for a frequency bucket ƒ. Sƒ and Bƒ is the STFT of the desired speech and noise signals, respectively. Once an estimate of the noise spectrum is obtain it can be subtracted from the noisy-speech signal.

  
ƒ(n)∣ = {  ∣ƒ(n)∣ - βƒ(n)∣ if ƒ > βƒ
 0 else  
(1)

where β is a scaling factor. This subtraction of spectral magnitudes can sometimes results in distortions and artifacts.

An example of the artifacts of the spectral subtraction is musical noise. Musical noise are little islands of spectrum power in a signal, that appear randomly in different frequency buckets from frame to frame. This results in a twinkling sounding noise, that can be quite annoying to the listener. Why does it occur? It is a result of half-wave rectification in (1). Since negative spectrum magnitude does not exist; when the noise estimate is greater than the noisy speech signal, the half-wave rectification forces the estimated speech signal for that frequency to be zero.

The value of β can be adjusted to reduce the affect this artifact. Lowering the value of β, will effectively raise the noise floor, raising the valleys of spectral energy. Raising the value of β, often referred to as oversubtraction, can eliminate the islands of spectral energy. This will likely result in some distortion of desired speech. Besides adjusting β, there are several other methods used to mitigate the affects of musical noise. For example, the value of Sƒ[n] can be smoothed temporally and/or over frequency buckets to reducing the peaks of energy. In addition, classification of the spectral energy of the time-frequency can be made by studying the time-frequency buckets surrounding the point of interest. The classification can be use to decide if the energy in the time-frequency bucket is related to musical noise and can be removed.