VOCAL’s Battlefield Voice Activated Transmission (VOX) software provides excellent voice detection with minimal delay while rejecting spurious signals that falsely trigger other VOX modules. VOX is commonly used in radio communications in which push to talk (PTT) is either inconvenient or not practical, thus hands-free communication is required. VOX only transmits the microphone signal when it has been determined the user is talking. Detection of voice can be skewed in noisy or battlefield conditions.
Primitive first-generation VOX technologies, which were generally energy level based, failed to be useful. Tuning the thresholds of detection either resulted in excessive false detection or in excessive clipping of the talker’s voice. The ability to discriminate voice from other high level sources of noises was not possible.
Second-generation VOX technologies use advanced signal processing techniques to determine if the signal energy actually contains information related to voice or if it is related to a noise source. The VOX analyzes the signal in the frequency domain to make a decision if the user is talking. Therefore, there is a tradeoff between the accuracy of the detection and the latency in the signal.
Voiced speech has a well-defined structure with a strong fundamental frequency often referred to as pitch, along with the peaks at the harmonics of the pitch frequency. In addition, speech is quasi-stationary so this spectrum shape is maintained for about 20ms. Pitch has continuity in voiced sound. In order words, pitch of a speaker does not change very quickly. So the continuity of the pitch is often held for about 200ms. This provides sufficient time to decipher if the audio source is from the user of the VOX system. This ability to distinguish voice from other high energy signal sources is especially useful in restaurant/drive-thru environments and in warfare situations.
Further advancements in speaker recognition allow for more selective and intelligent VOX decisions. For example, as mentioned previously voiced speech has a well-defined structure that contains certain characteristics unique to a particular user and language. Thus, distinct models can be generated based on these different speech characteristics. In order for a VOX system that makes transmission decisions based on the similarity of the incoming voice signal to the desired voice must allow for training to a user’s voice.
Like all of VOCAL’s software libraries, the Battlefield VOX is available in a variety of forms, including optimized ANSI C and assembly language optimized implementations for leading DSP architectures (including but not limited to processors from TI, ADI, AMD, ARM, MIPS, CEVA, LSI Logic ZSP, etc.). These libraries are modular and can be executed as a single task under a variety of operating systems or standalone with its own microkernel. To find out if your desired platform and processor is supported, please contact us.
Features
- Actively rejects:
- Impulse noises such as gunshots, or explosions
- Aircraft engine noise
- Land vehicle engine and road (traffic) noise
- Factory machine noise
- Restaurant kitchen noise
- Integrated with speech coders (e.g. STANAG 4591 MELPe)
- Low signal latency (less than 200ms)
- Adjustable sensitivity and thresholding
- Functions C callable
For more information: