Vocal's Time Scale Modification

Time scale modification is used as part of a resampling algorithm or for variable speed playback in general. Variable speed playback can slow down messages in voicemail and recordings of court proceedings for stenographers, or speed things up (compress information) for bursty transmissions through unreliable channels.

VOCAL offers a Variable Speed Playback (or Pitch Corrected Rate Converter) as part of its voice solutions. Variable Speed Playback is a time scale modification which can slow down or speed up a sample stream while keeping the sampling rate and pitch intact.

Resampling is not just a process of decimation or interpolation in the time domain. By doing so, you will decrease or increase the pitch by the decimation or interpolation factor respectively. To effectively scale time, you need to incorporate this pitch change into an algorithm that can correct for it.

Resampling Algorithms

Multiple standard techniques are already available in the literature for just this problem. Such examples are Synchronized Overlap and Add (SOLA), Pitch Synchronized Overlap and Add (PSOLA), Weighted Similarity Overlap and Add (WSOLA) and the standard Phase Vocoder. The most effective of which is WSOLA for its high quality results and computational simplicity.

WSOLA works by computing a similarity measure between two incoming speech frames. By selecting the index at which the two frames are most similar, WSOLA can then add or delete sections of the speech accordingly. The problem with this method happens during high noise situations. In these situations, the maximum similarity is decreased, and copying many noise frames leads to undesirable long term periodicty in the autocorrelation across these copied frames leading to unnatural tones that are very displeasing. To get around this problem, a VAD can be employed to randomize the phase during periods of noise activity.

Complete Communications Engineering

Time Scale Modification

Resampling Algorithms

More Information