Battlefield VOX software provides excellent voice activity detection in noisy environments with minimal delay while rejecting spurious signals that falsely trigger other VOX algorithms. VOX or voice activated transmission is commonly used in radio communications in which push to talk (PTT) is either inconvenient or not practical, thus hands-free communication is required. VOX only transmits the microphone signal when it has been determined the user is talking. Detection of voice can be skewed in noisy or battlefield conditions.
VOCAL’s Battlefield VOX software is available for license in a variety of forms, including ANSI C and assembly language implementations. Source code is optimized for leading DSP and conventional processors from TI, ADI, AMD, ARM, MIPS, CEVA, LSI Logic ZSP and other vendors. The libraries are modular and can be executed as a single task under a variety of operating systems or standalone with its own microkernel. Please contact us to discuss your voice application requirements.
Primitive first-generation VOX technologies, which were generally energy level based, failed to be useful. Tuning the thresholds of detection either resulted in excessive false detection or in excessive clipping of the talker’s voice. The ability to discriminate voice from other high level sources of noises was not possible.
Second-generation VOX technologies use advanced signal processing techniques to determine if the signal energy actually contains information related to voice or if it is related to a noise source. VOX software analyzes the signal in the frequency domain to make a decision if the user is talking. As such, there is a tradeoff between the accuracy of the detection and the latency in the signal.
Voiced speech has a well-defined structure with a strong fundamental frequency often referred to as pitch, along with the peaks at the harmonics of the pitch frequency. In addition, speech is quasi-stationary so this spectrum shape is maintained for about 20ms. Pitch has continuity in voiced sound. In order words, pitch of a speaker does not change very quickly. So the continuity of the pitch is often held for about 200ms. This provides sufficient time to decipher if the audio source is from the user of the VOX system. This ability to distinguish voice from other high energy signal sources is especially useful in restaurant/drive-thru environments and in warfare situations.
Further advancements in speaker recognition allow for more selective and intelligent VOX decisions. For example, as mentioned previously voiced speech has a well-defined structure that contains certain characteristics unique to a particular user and language. Thus, distinct models can be generated based on these different speech characteristics. In order for a VOX system that makes transmission decisions based on the similarity of the incoming voice signal to the desired voice must allow for training to a user’s voice.
- Actively rejects:
- Impulse noises such as gunshots, or explosions
- Aircraft engine noise
- Land vehicle engine and road (traffic) noise
- Factory machine noise
- Restaurant kitchen noise
- Integrated with speech coders (e.g. STANAG 4591 MELPe)
- Low signal latency (less than 200ms)
- Adjustable sensitivity and thresholding
- Functions are C callable