Vocal's Packet Loss Concealment

Voice over IP (VoIP) technology provides users with speech transmission quality similar to circuit-switching telephony (based on TDM as in PSTN), provided the packet network offers guaranteed Quality of Service (QoS). However, in general packets are transmitted through the IP network on the best-effort basis. Therefore packet networks can be quite unreliable in the sense of packet loss and packet delays. For example, if the packet network is congested packet delivery is delayed beyond predefined thresholds and thus resulting in packet arriving late, out of order or even causing packet loss, which will result in severely impaired voice/speech quality

Figure 1: Frame erasure concealment algorithm for G.711; a two-frame segment is missing from the input

There are several different strategies in place that help to mitigate audible effects of packet loss in VoIP transmission. These mitigation measures are generally called packet loss concealment (PLC) and there are several specific algorithms available for that purpose. Some of these algorithms are part of standard documents and others are in a form proprietary solutions. Unlike in the case of voice codecs, implementation of specific PLC measures does do not have to be based on algorithms offered by standards as the PLC functionality is intended to be available at the receiving ends of the voice network connections.

PLC algorithms and techniques include several groups of approaches of which two are the most common. They are (I) based on waveform substitution and (II) speech models (i.e., model based), meaning they include LP-based speech modeling.

One of the examples in Group I is the G.711 Appendix I-based PLC (cf. Refs. [1,2]). Another one, more computationally complex, is based on autoregressive modeling using a rigorous minimum mean square error (MMSE) approach. It uses a basic model capturing the short-term correlation and a more sophisticated model that also captures the long-term correlation (cf. Ref.[3]). An example of the PLC-based reconstructed signal is given in Figure 1.

One of the examples in Group II is an approach based on hidden Markov model (HMM) tracking the evolution of speech signal parameters (cf. Refs. [4,5]). Other examples include standardized PLC solutions, which are part of the Series G.72x CELP voice codecs (for example, G.723.1 and G.729-A), among other things.

A brief description of the G.711 Appendix I-based PLC algorithm is given below. The algorithm is assumed to operate at 10ms frames (80 samples with Fs of 8kHz). It is implemented at the receiving side, after G.711-decoding is performed.

Stage I: Good Frames and Preparation for the First Bad Frame:

1) A copy of the decoded output is saved in a circular buffer 48.75 ms (390 samples) long. The buffer data is used for estimating the current pitch period and extract waveforms during an erasure. No delay is introduced to the output signal by the buffering.

2) The output is delayed by 3.75 ms (30 samples) before it sent to the audio port. This algorithm delay allows for a smooth transition between the real and synthesized signal. It uses Overlap-Add (OLA) at the start of an erasure to support smooth transitions.

Stage II: First Bad Frame

At the start of the erasure, the circular history buffer is copied to a non-circular buffer, called the pitch buffer that is easier to work with. The contents of the pitch buffer are used for the duration of the erasure. An additional copy of the most recent 1/4 pitch period, called the lastq buffer, is made in case the erasure lasting longer than 10 ms.

Other stages include: Pitch Detection, Synthetic Signal Generation for First 10 ms, Synthetic Signal Generation After 10 ms, Attenuation (applicable to long erasures), First Good Frame After an Erasure.

A brief outline of the model-based PLC algorithm is illustrated in Figure 2. The algorithm relies on a hidden Markov model and on the introduction of a continuous observation vector well-suited for silence, voiced and unvoiced sounds (cf. Ref. [5]).

Figure 2: Hidden Markov model-based algorithm architecture for Packet Loss Concealment

VOCAL’s speech coder software includes a complete range of speech compression algorithms optimized for execution on ANSI C and leading DSP architectures. For example, Ref. [6] describes an implementation of the G.729-A speech codec; the standard feature set includes model-based PLC functionality.

More Information

References

ITU-T G.711 Appendix I: A High Quality Low-Complexity Algorithm for Packet Loss Concealment with G.711 (09/99)
Receiver-based packet loss concealment for pulse code modulation (PCM G.711) coder, Elsabrouty, M., et al., Signal Processing 84 (2004) 663 – 667
Autoregressive model-based speech packet-loss concealment, Zhang, G., and Kleijn, W.B; IEEE ICASSP 2008, pp: Pp: 4797 – 4800;
Hidden Markov Model-Based Packet Loss Concealment for Voice Over IP, Rodbro, C.A. et al., IEEE Trans, on Audio, Speech, and Language Proc., Vol 14 , Issue 5, Sept. 2006
A New Feature Vector for HMM-based Packet Loss Concealment , Koenig, L., et al., EuroSipCo 2009
G.729 Speech Compression

Complete Communications Engineering

More Information

References