Denoising speech signals to enhance the quality of communication channels is a non trivial issue. Most approaches reducing noise introduce their own artifacts. One common artifact is musical noise which has an irritating effect and thus, seemingly degrades the speech quality further. The basic speech enhancement systems proceeds as follows:

Consider a speech signal $s[n]$ corrupted by additive noise signal $\nu[n]$ such that the received signal $y[n]$ is given as

$y[n] = s[n] + \nu[n]$

A short time Fourier transform is used to process the signals frame by frame with some overlapping scheme and the system of equations then becomes

$y(t, \omega) = s(t,\omega) + \nu{(t,\omega)}$

The magnitude of the received temporal spectrum is used for denoising by estimating a denoising filter $H(t,\omega)$ such that

$|\hat{s}(t,\omega)| = H(t,\omega)|y(t,\omega)|$

The phase of the speech signal is assumed to be that of the received noisy speech frame. Thus,

$\hat{s}(t,\omega) = H(t,\omega)y(t,\omega)$

The problem that gives rise to musical noise is the presence of salt and pepper noise in the estimated magnitude response of the denoising filter. This unwanted noise can be removed using a additional filtering in frequancy domain of only the denoising filter. Figure 1 below illustrates the resulting denoising filter and compared with a filtered version.

Figure 1: Filtered denoising filter removes the presence of salt and pepper noise

The filtered version of the denoising filter removes the presence of music noise entirely.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!