Range coding

From Hydrogenaudio Knowledgebase

A range coder is an entropy coder, which means it tries to save values in an optimal way. Range coding is similar to Arithmetic Coding and to a lesser extent Huffman coding and Fano coding. Rice coding and Start Step Stop Coding also serve a similar purpose, but in different situations.

To illustrate what an entropy coder does, suppose you have a file with just four possible values 0, 1, 2 and 3. You could just store them in two bits each like this:

0 = 00
1 = 01
2 = 10
3 = 11

Which wouldn't be too bad if all values occur more or less equally often. But suppose that 0 occurs much more frequently than the others (say 50% of the time). In that case it might be better to use the following bitpatterns for the values 0-3:

0 = 0
1 = 10
2 = 110
3 = 111

Assuming 1-3 occur equally often the average number of bits used would be 0.50*1 + 0.17*2 + 0.17*3 + 0.17*3 = 1.83 instead of the 2 we had before. With larger numbers of bits this becomes increasingly interesting.

There are several methods for determining the above bitpatterns, Huffman coding and Fano coding are the easiest to understand, they produce patterns very similar to the above, where the prefix is used to determine how much has to be read. Range coding and arithmetic coding are a bit more complicated, they build a table of frequencies and then allocate certain ranges of numbers to a certain value. The above example would look something like this:

Value Chance Number range
0 0.5 [0, 0.5)
1 0.17 [0.5, 0.67)
2 0.17 [0.66, 0.83)
3 0.17 [0.83, 1)

With range coding integers are used instead of floating-point values.


References