Wideband Energy Detection in Scalable Speech Coding

In scalable speech coding that supports wideband (sampling rate of 16kHz) and superwideband (sampling rate of 24 kHz) as well as the standard narrowband (sampling rate of 8kz) sound, it is important to decide if wideband (WB) or superwideband (SWB) is necessary. These higher frequency modes can provide better sound quality if there is sound in the higher frequency ranges that they support. But, there is a trade off. These higher sampling rates require more processing, on the order of double or triple the number of computations. Thus, it is important to determine whether there is enough WB energy to make the added complexity an acceptable trade off.

This is accomplished by first performing a highpass filter that has a cutoff at 4.5kHz. This cutoff was chosen so that after allowing for some transition between the passband and stopband, there is almost complete attenuation of the energy below 4kHz, which would still be present if the speech was sampled at the narrowband (NB) rate of 8kHz. We can then check how much residual energy this leaves. If the energy is above a certain threshold for a long enough period of time we determine that WB is necessary. The threshold is set so as to allow for there to be some WB noise, but low enough so that if any significant WB sound is present, it will not be ignored. If we have not detected WB energy for a long enough period of time, we can downsample the speech and treat it as NB. For this reason, we only require about 300ms of WB energy before we classify the speech as WB.

Complete Communications Engineering