Frequency modulation detection for speech

Frequency modulation (FM) as a computational auditory scene analysis feature refers to the phoneme changes in audio streams. This is accomplished by using a two dimensional cochleagram to corresponding to the FM pattern that we want to pick up. This detection using the cochleagram is done by convolution with a time-frequency kernel.
Consider a two dimensional time-frequency zero mean Gaussian kernel defined as:

$G(\tau, \omega, \sigma_\tau , \sigma_\omega) = \frac{1}{2 \pi \sigma_\tau \sigma_\omega} \exp{\left(-\frac{\tau^2}{2\sigma_{\tau}^2} -\frac{\omega^2}{2\sigma_{\omega}^2}\right)}$

To observe a frequency change the Laplacian of the Gaussian filter is used. The Laplacian filter is:

$L(\tau, \omega, \sigma_\tau , \sigma_\omega) = \left(\frac{1}{\sigma_\omega^2} - \frac{\omega^2}{\sigma_\omega^4}\right) G(\tau, \omega, \sigma_\tau , \sigma_\omega)$

The time-frequency variances are chosen such that the Laplacian corresponds to a receptive fields in the human auditory system. Typically, the time variance is chosen to be bigger than the frequency variance. A sample Laplacian filter for FM is shown in Figure 1 below:

Figure 1: Laplacian filter for frequency change detection

FM is can be used as a feature for speech recognition systems by finding the pitch corresponding to frequency changes.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

More Information

Complete Communications Engineering