The Amazon Echo put smart speakers on the consumer market landscape in 2014 and extended the boundaries of what a Bluetooth loudspeaker device is capable of.  Smart speakers can be used essentially in any acoustic environment.  The user can be located less than a meter to several meters from the device, or in a completely different room altogether.  The smart speaker can be playing music, and the user is expected to be able to barge in and give commands that the device will be able to understand.

To overcome these acoustic challenges, Voice Quality Enhancement (VQE) software is a core component to every smart speaker device.  This software can go by many different names, such as, speech preprocessing or far-field speech enhancement, but the goal of the software is the same.  This software must provide a speech recognition engine a signal that minimizes the word error rate.  In other words, success is not determined by what another human can understand, but by what a machine can.  As compared to a hands-free full duplex voice communication system, the voice quality enhancement solution has some key design differences. For example, Speech Recognition software is not tolerant to non-linear processing as human hearing is, so the VQE software is contained to linear processing modules.

For smart speakers, acoustic beamforming is a key VQE module.  Beamforming linearly combines signals in a specified direction to improve the signal to noise ratio.  If the direction of interfering noise source can be determined, noise cancellation can be performed on the desired beam.  These linear algorithms are key to reducing the background noise.

To address the acoustic coupling between the loudspeaker and microphone, Acoustic Echo Cancellation (AEC) software is still applied, but residual echo and non-linear echo suppression features should be disabled.  Since, these features cannot be used it is common practice to implement an audio ducking module into the system.  Once it has been determined that a user is trying to speak, the audio being played out the loudspeaker can be temporarily attenuated to reduce the level of acoustic coupling.

smart speaker voice enhancement Voice Quality Enhancement block diagram