The MELPe (Mixed-Excitation Linear Predictive enhanced) algorithm was derived using several enhancements to the original MELP standard. It is also known as military standard MIL-STD-3005 and NATO STANAG 4591. MELP was itself derived from another military coder, LPC-10. However, MELP originally only specified a 2400 bps mode of operation. MELPe is a triple-rate low codec that supports rates of 600 bps, 1200 bps, and 2400 bps. A MELPe frame interval is 22.5 ms in duration and contains 180 voice samples, at a sampling rate of 8,000 kHz. Recommended analog requirements are for a nominal bandwidth ranging from 100 Hz - 3800 Hz. MELPe can operate with a more band-limited signal but with a degradation in performance.
There are other enhancements present in MELPe other than support of 600 bps and 1200 bps, in addition to 2400 bps. MELPe also supports compressed bit-stream transcoding between the different rates. A Noise Pre-Processor can help to reduce background noise while the postfilter has been improved to further increase the quality of speech reproduction.
MELP was selected as the new 2400 bps Federal Standard speech vocoder by the United States Department of Defense (DoD) Digital Voice Processing Consortium (DDVPC) after a multi-year extensive testing program. The selection test concentrated on four areas: intelligibility, voice quality, talker recognizability, and communicability. The selection criteria also included hardware parameters such as processing power, memory usage, and delay. MELP was selected as the best of the seven candidates and even beat the FS1016 4800 bps vocoder, a vocoder with twice the bit-rate.
MELPe is robust in difficult background noise environments such as those frequently encountered in commercial and military communication systems. It is very efficient in its computational requirements. This translates into relatively low power consumption, an important consideration for portable systems. MELPe uses extensive lookup tables and models of the human voice to extract and regenerate speech; further, the codec is tuned to regenerate the english language, and speakers of non-germanic languages generally rate the coder more poorly than english speakers.
Traditional pitched-excited LPC vocoders use either a periodic pulse train or white noise as the excitation for an all-pole synthesis filter. These vocoders produce intelligible speech at very low bit rates, but they sometimes sound mechanical or buzzy and are prone to annoying thumps and tonal noises. These problems arise from the inability of a simple pulse train to reproduce all kinds of voiced speech. The MELPe Vocoder uses a mixed-excitation model that can produce more natural sounding speech because it can represent a richer ensemble of possible speech characteristics.
Many modifications were made to LPC-10 in order to improve speech quality. These include:
The mixed-excitation is implemented using a multi-band mixing model. This model can simulate frequency dependent voicing strength using a novel adaptive filtering structure based on a fixed filterbank. The primary effect of this multi-band mixed-excitation is to reduce the buzz usually associated with LPC vocoders, especially in broadband acoustic noise.
When the input speech is voiced, the MELPe vocoder can synthesize speech using either periodic or aperiodic pulses. Aperiodic pulses are most often used during transition regions between voiced and unvoiced segments of the speech signal. This feature allows the synthesizer to reproduce erratic glottal pulses without introducing tonal noises.
The adaptive spectral enhancement filter is based on the poles of the LPC vocal tract filter and is used to enhance the formant structure in the synthetic speech. This filter improves the match between synthetic and natural bandpass waveforms, and introduces a more natural quality to the speech output.
The pulse dispersion is implemented using fixed pulse dispersion filter based on a spectrally flattened triangle pulse. This filter has the effect of spreading the excitation energy with a pitch period. This, in turn, reduces the harsh quality of the synthetic speech.
Ten Fourier magnitudes are coded with an 8-bit vector quantizer. The index of the code vector, which minimizes the weighted Euclidean distance between the input and code vectors, is transmitted.