Arithmetic Voice Coding in Vocoders

When compressing data an attempt is made to force as much compression as possible. Most of the methods that can achieve a high level of compression are based on probability estimates. The idea behind this is that if a particular value of what is to be compressed occurs more frequently, then it should require fewer bits to compress. Conversely, if it occurs less frequently, then it should require more bits to compress. This will, in the long run, allow for more efficient compression.

In variable rate vocoders, a compression method that is used is Arithmetic Coding. The way this works is that we use the probability distribution to make a decision about where, in the possible range of values, should a byte be. This gives us a new range of values that we can use to code the next parameter. When the point is reached where there is only a single value for the byte in the range, it is written, and the process is continued with the next byte.

For an example, consider SILK, the variable rate vocoder created by Skype, which uses arithmetic coding for its compression. At first we can write anything in our byte from 0 to 255, so it has a base value of 0, and a range of 255. The first parameter that SILK codes is the sampling rate. Each of the four (8kHz, 12kHz, 16kHz, or 24kHz) has an equal probability and they are considered in that order, which means that coding 12kHz would restrict our range of values for the byte to the second quarter, or 64 to 127. This gives us a base of 64 and a range of 63. Next, whether the frame contains voiced or unvoiced speech is coded. This has a probability distribution of 62% voiced and 37% unvoiced. For a voiced frame, this means that the byte must lie in the first 62% of the possible byte values, i.e. between 64 and 103. Thus we have a base of 64 and a range of 39. This continues until there is only one possible value for the byte.

A similar process is then used to decode the byte. Say that the byte from the previous paragraph ended up being coded as 79. To begin decoding, the vocoder would determine which of the four sampling ranges (0 to 63, 64 to 127, 128 to 191, or 192 to 255) 77 falls in. Since it falls in the second, it would be decoded as a sampling rate of 12kHz. Now, looking at the range 64 to 127, the vocoder would determine which of the ranges for voiced or unvoiced (64 to 103 or 104 to 127) 77 falls in. This would cause the decision of voiced to be made. And, this would continue until knowledge of the bytes value does not allow us to determine what the value of the next parameter is. Then we move onto the next byte, and begin again. This continues until all of the parameters are decoded.

Complete Communications Engineering

Arithmetic Coding in Vocoders