- Low bit rate multi-media services
- Dual rate 5.3 kbps and 6.3 kbps
- Real-time multi-channel implementation
- Optimized for DSPs, RiSC, CISC processors
- ITU G.723.1 compliant
VOCAL’s G.723.1 codec is a dual rate vocoder used for compressing speech or other audio signal component of multimedia services at a very low bit rate. Contact us to discuss your voice codec application requirements.
VOCAL’s G.723.1 software is optimized for leading DSPs and RISC/CISC processors from TI, ADI, AMD, ARM, Intel and other vendors. G.723.1 voice compression software may be licensed as a standalone algorithm, as a library, and with a VoIP stack. Custom designs are also available to meet unique G.723.1 application requirements.
The G.723.1 algorithm specifies a coded representation to compress speech or audio for multimedia services, primarily very low bit rate visual telephony as part of the overall H.324 family of standards. G.723.1 has two bit rates associated with it, 5.3 kbit/s and 6.3 kbit/s. The higher bit rate has better voice quality. The lower bit rate still gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the G.723.1 encoder and decoder, it is possible to switch between the two rates at any 30 ms frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also available. G.723.1 Annex A defines 4 byte SID (Silence Insertion Description) frames.
A G.723.1 speech coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (Recommendation G.712) of the analogue input, then sampling at 8,000 Hz and then converting to 16-bit linear PCM for the input to the encoder. The output of the G.723.1 decoder should be converted back to analogue by similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64 kbit/s PCM data, should be converted to 16-bit linear PCM before encoding or from 16-bit linear PCM to the appropriate format after decoding.
The coder is based on the principles of linear prediction analysis-by-synthesis coding and attempts to minimize a perceptually weighted error signal. The encoder operates on blocks (frames) of 240 samples each. That is equal to 30 msec at an 8 kHz sampling rate. Each block is first high pass filtered to remove the DC component and then divided into four subframes of 60 samples each. For every subframe, a 10th order Linear Prediction Coder (LPC) filter is computed using the unprocessed input signal. The LPC filter for the last subframe is quantized using a Predictive Split Vector Quantizer (PSVQ). The unquantized LPC coefficients are used to construct the short-term perceptual weighting filter, which is used to filter the entire frame and to obtain the perceptually weighted speech signal.
For every two subframes (120 samples), the open loop pitch period, LOL, is computed using the weighted speech signal. This pitch estimation is performed on blocks of 120 samples. The pitch period is searched in the range from 18 to 142 samples. From this point the speech is processed on a 60 samples per subframe basis.
Using the estimated pitch period computed previously, a harmonic noise shaping filter is constructed. The combination of the LPC synthesis filter, the formant perceptual weighting filter, and the harmonic noise shaping filter is used to create an impulse response. The impulse response is then used for further computations.
Using the pitch period estimation, LOL, and the impulse response, a closed loop pitch predictor is computed. A fifth order pitch predictor is used. The pitch period is computed as a small differential value around the open loop pitch estimate. The contribution of the pitch predictor is then subtracted from the initial target vector. Both the pitch period and the differential value are transmitted to the decoder.
Finally, the non-periodic component of the excitation is approximated. For the high bit rate, Multi-Pulse Maximum Likelihood Quantization (MP-MLQ) excitation is used, and for the low bit rate, an Algebraic Code Excited Linear Prediction (ACELP) is used.
The G.723.1 decoder operation is also performed on a frame-by-frame basis. First the quantized LPC indices are decoded, then the speech decoder constructs the LPC synthesis filter. For every subframe, both the adaptive codebook excitation and fixed codebook excitation are decoded and input to the synthesis filter. The adaptive postfilter consists of a formant and a forward-backward pitch postfilter. The excitation signal is input to the pitch postfilter, which in turn is input to the synthesis filter whose output is input to the formant postfilter. A gain scaling unit maintains the energy at the input level of the formant postfilter.
G.723.1 coder encodes speech or other audio signals in 30 msec frames. In addition, there is a look ahead of 7.5 msec, resulting in a total algorithmic delay of 37.5 msec. All additional delays in the implementation and operation of this coder are due to:
- actual time spent processing the data in the encoder and decoder
- transmission time on the communication link
- additional buffering delay for the multiplexing protocol
- Compliant with G.723.1 specification
- 5.3k bps and 6.4k bps compressed data rates
- 30ms sample frame size
- Full and half duplex modes of operation
- Passes ITU test vectors
- Optimized for high performance on leading edge DSP architectures
- Multichannel implementation
- Multi-tasking environment compatible
- Common compressed speech frame stream interface to support systems with multiple speech coders
- Dynamic speech coders selection if multiple speech codecs available
- Integrates with Acoustic Echo Canceller, G.168 Line Echo Canceller and Tone Detection/Regeneration modules
- Available with VoIP stack
VOCAL’s speech coders are available for the following platforms. Please contact us for specific G.723.1 supported platforms.