GSM 06.60 Enhanced Full Rate (EFR) Vocoder
GSM 06.60 describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform
PCM form to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160
reconstructed speech samples. The sampling rate is 8,000 sample/s leading to a bit rate for the encoded
bit stream of 12,2 kbit/s. The coding scheme is the so-called Algebraic Code Excited Linear Prediction
Coder (ACELP).
GSM 06.60 also specifies the conversion between A-law PCM and 13-bit uniform PCM. Performance requirements
for the audio input and output parts are included only to the extent that they affect the transcoder
performance. This part also describes the codec down to the bit level, thus enabling the verification of
compliance to the part to a high degree of confidence by use of a set of digital test sequences.
GSM 06.60 Enhanced Full Rate Encoder
- The codec is based on the code-excited linear predictive (CELP) coding model. A 10th order linear prediction (LP), or short-term, synthesis filter is used. The pitch synthesis filter is implemented using the so-called adaptive codebook approach.
- In the CELP speech synthesis model the excitation signal at the input of the short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search procedure in which the error between the original and synthesized speech is minimized according to a perceptually weighted distortion measure. The weighting filter uses the unquantized LP parameters while the formant synthesis filter uses the quantified ones.
- The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8,000 sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
- LP analysis is performed twice per frame. The two sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantified using split matrix quantization (SMQ) with 38 bits. The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook parameters are transmitted every subframe. The two sets of quantified and unquantized LP filters are used for the second and fourth subframes while in the first and third subframes interpolated LP filters are used (both quantified and unquantized). An open-loop pitch lag is estimated twice per frame (every 10 ms) based on the perceptually weighted speech signal.
- Then the following operations are repeated for each subframe:
- The target signal is computed by filtering the LP residual through the weighted synthesis filter with the initial states of the filters having been updated by filtering the error between LP residual and excitation.
- The impulse response, of the weighted synthesis filter is computed.
- Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target and impulse response, by searching around the open-loop pitch lag. Fractional pitch is used. The pitch lag is encoded with 9 bits in the first and third subframes and relatively encoded with 6 bits in the second and fourth subframes.
- The target signal is updated by removing the adaptive codebook contribution (filtered adaptive codevector), and this new target, is used in the fixed algebraic codebook search (to find the optimum innovation). An algebraic codebook with 35 bits is used for the innovative excitation.
- The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively (with moving average (MA) prediction applied to the fixed codebook gain). Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in the next subframe.
- In each 20 ms speech frame, 244 bits are produced, corresponding to a bit rate of 12.2 kbit/s.
GSM 06.60 Enhanced Full Rate Decoder
- At the decoder, the transmitted indices are extracted from the received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These parameters are the two LSP vectors, the 4 fractional pitch lags, the 4 innovative codevectors, and the 4 sets of pitch and innovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 40-sample subframe:
- The excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains.
- The speech is reconstructed by filtering the excitation through the LP synthesis filter.
- Finally, the reconstructed speech signal is passed through an adaptive postfilter.
Features
- Full and half duplex modes of operation
- Passes ETSI test vectors
- Compliant with GSM 06.60 Recommendation
- MIPS/memory requirements for various platforms are available
- PSQM/PSQM+ values under different network conditions are also available.
- Optimized for high performance on leading edge DSP architectures
- Multichannel implementation
- Multi-tasking environment compatible
Configurations
- DAA interface using linear codec at 8.0 kHz sample rate
- Direct interface to 8.0 kHz PCM data stream (A-law or μ-law)
- North American/International Telephony (including caller ID) support available
- Simultaneous DTMF detector operation available - (less than 10 talkoff hits on Bellcore test tape set)
- MF tone detectors, general purpose programmable tone detectors/generators available
- Data/Facsimile/Voice Distinction available
- Common compressed speech frame stream interface to support systems with multiple speech coders
- Dynamic speech coders selection if multiple speech codecs available
- Can be integrated with G.168 Echo Canceller and Tone Detection/Regeneration modules
Links
GSM-EFR Datasheet
Audio Examples
PSQM/PSQM+ values
ETSI Recommendation GSM 06.60
RFC 3267 - RTP Packetization
RTP Parameters