This article describes an alternative approach to minimum redundancy coding (fixed-length) without using tree data structures. The main advantages are simplicity and flexibility, allowing for continuous improvement by integrating new techniques from selection/sorting algorithms. Many different implementations are possible, with time complexity ranging from O(n2) to O(n).
Example source code is included, so you can try it out and see if it works for your purposes.
It is assumed the reader is familiar with the traditional algorithms for generating minimum redundancy codes, Huffman variants in particular. Experience with general computer programming is also necessary, to properly understand some of the concepts and to follow the example source code, mostly written in C# but easily adaptable to any C based language.
The common feature and primary innovation of the algorithms is the use of a data structure named circular chain. In this context, it's defined as a collection of elements where each element points to the next, with the last element pointing to the initial element and therefore closing the chain:
Implementation is done by maintaining an array of ID/index values for each element:
There are some similarities between this data structure and the well-known linked list, but the latter is usually implemented by allocating multiple blocks of memory and assigning the address of each block as the Next pointer. Considering this, a circular chain can be viewed as an optimized and specialized circularly-linked list.
Iterating through the elements of the circular chain consists of starting with an arbitrary element and following the Next pointer until the initial element is found again (the chain closed).
Removing an element D from the circular chain consists of finding the element pointing to D and setting the corresponding Next pointer to the value D is pointing to:
Combining 2 (or more) circular chains consists of exchanging the Next pointer of an arbitrary element from one circular chain with the Next pointer of an arbitrary element of another circular chain.
Before:
After:
An interesting property of circular chains is that a single arbitrary element can identify and reference the entire collection of associated elements, thus becoming its representative.
The base algorithm is an adaptation of standard Huffman coding and consists of the following steps:
Evolution of the data structures during the coding process for the stream of symbols "ADAACBBAA" is demonstrated in the following tables.
Iteration 0 (after initialization):
Iteration 1:
Iteration 2:
Iteration 3:
If only the code lengths are necessary, it can be simplified even further:
Specific versions of the base algorithm will vary depending on the implementation of step 2, which also determines the overall processing time performance.
If we consider the most obvious approach to finding the 2 elements with the lowest values in a list by performing a linear scan, the corresponding C# implementation could be something like this:
internal static FixedLengthCode[] RC1CalculateCodes(uint[] frequencies) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; FixedLengthCode[] codes = new FixedLengthCode[frequencies.Length]; for (int i = 0; i < codes.Length; i++) { codes[i].Length = 0; codes[i].Data = new byte[FixedLengthCode.MaxCodeSize]; // Filled with 0 values by default } while (true) { // Find the 2 elements with the lowest frequency count int x_pos = 0, y_pos = 0; // Initialization not necessary but forced by C# compiler uint x_freq = uint.MaxValue, y_freq = uint.MaxValue; for (int i = 0; i < frequencies.Length; i++) { uint f = frequencies[i]; if (f < x_freq) { x_pos = i; x_freq = f; if (x_freq < y_freq) { // Swap values uint t1 = x_freq; x_freq = y_freq; y_freq = t1; int t2 = x_pos; x_pos = y_pos; y_pos = t2; } } } if (x_freq == uint.MaxValue) break; // Exit condition // Increment code lengths for elements in chain X int curpos = x_pos; do { codes[curpos].Length++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { byte[] ba = codes[curpos].Data; int l = codes[curpos].Length++; ba[ba.Length - 1 - (l >> 3)] |= unchecked((byte) (1 << (l & 7))); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t3 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t3; // Assign the new representative the combined frequency count frequencies[x_pos] += y_freq; // Mark the other representative so it's not taken into account in further iterations frequencies[y_pos] = uint.MaxValue; } return codes; } struct FixedLengthCode { internal const int MaxCodeSize = 7; // Maximum code size in bytes internal byte Length; // Number of bits internal byte[] Data; // A more efficient approach (but requires unsafe code) would be: // internal fixed byte Data[MaxCodeSize]; // This implementation of ToString() is concise but very inefficient public override string ToString() { StringBuilder sb = new StringBuilder(Length); for (int i = Length - 1; i >= 0; i--) { sb.Append((Data[Data.Length - 1 - (i >> 3)] & (1 << (i & 7))) != 0 ? "1" : "0"); } return sb.ToString(); } }
The frequencies array is modified during the code generation process. Apart from the input and output arrays, there's a single additional circular_chains array involved that stores data temporarily. This array can usually be stored in the same memory block that will receive the final compressed data, so the algorithm can have virtually null memory footprint if applied properly.
Note the source code still considers symbols with a 0 frequency count. It's generally preferable to exclude these from the code generation process and some possible options to achieve this are:
Maximum code size is defined as 7 bytes in this particular case, which is unnecessarily large for practical applications but shows how easily it can scale. To minimize the space needed to store the codes, we can assign the prefix 1 to the circular chain with the lowest maximum code length. Codes generated this way and stored as a 32-bit unsigned integer are defined here as "compact":
internal static uint[] RC1CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default while (true) { // Find the 2 elements with the lowest frequency count int x_pos = 0, y_pos = 0; // Initialization not necessary but forced by C# compiler uint x_freq = uint.MaxValue, y_freq = uint.MaxValue; for (int i = 0; i < frequencies.Length; i++) { uint f = frequencies[i]; if (f < x_freq) { x_pos = i; x_freq = f; if (x_freq < y_freq) { // Swap values uint t1 = x_freq; x_freq = y_freq; y_freq = t1; int t2 = x_pos; x_pos = y_pos; y_pos = t2; } } } if (x_freq == uint.MaxValue) break; // Exit condition // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t3 = x_pos; x_pos = y_pos; y_pos = t3; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t4 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t4; // Assign the new representative the combined frequency count frequencies[x_pos] += y_freq; // Mark the other representative so it's not taken into account in further iterations frequencies[y_pos] = uint.MaxValue; } return codes; }
Performing a linear scan to find the lowest values can significantly increase the processing time as the number of symbols grows, making RC1 unfeasible for large numbers of symbols. To improve speed, the same techniques used by good sorting algorithms can be applied.
We can use a binary heap to allow efficient retrieval of the minimum values, as the next implementation demonstrates. The customized min heap compares the frequencies associated with each symbol (the representative of a circular chain) to determine where to store its reference in the internal nodes list. A good introduction to heaps can be found in [1].
internal static uint[] RC2CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default MinHeap min_heap = new MinHeap(frequencies); while (min_heap.Count >= 2) { // Find the 2 elements with the lowest frequency count int x_pos = min_heap.Remove(), y_pos = min_heap.Remove(); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t1 = x_pos; x_pos = y_pos; y_pos = t1; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t2 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t2; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to the heap min_heap.Add(x_pos); } return codes; } struct MinHeap { internal MinHeap(uint[] frequencies) { count = 0; nodes = new int[frequencies.Length]; this.frequencies = frequencies; for (int i = 0; i < frequencies.Length; i++) Add(i); } internal int Count { get { return count; } } internal void Add(int key) { int i; for (i = count++; i > 0; ) { int j = (i - 1) >> 1; if (frequencies[nodes[j]] <= frequencies[key]) break; nodes[i] = nodes[j]; i = j; } nodes[i] = key; } internal int Remove() { int res = nodes[0]; int ck = nodes[--count]; if (count > 0) { int i; for (i = 0; i < (count >> 1); ) { int j = (i << 1) + 1; if (j < (count - 1) && frequencies[nodes[j]] > frequencies[nodes[j + 1]]) j++; if (frequencies[ck] < frequencies[nodes[j]]) break; nodes[i] = nodes[j]; i = j; } nodes[i] = ck; } return res; } readonly int[] nodes; readonly uint[] frequencies; int count; }
Some possible improvements could be speeding up the initial heap creation and combining the last Remove() and Add(x_pos) function calls into a single operation, but tests show these don't have a significant impact on performance.
If the frequencies are pre-sorted, we can generate the codes with a single pass. This requires maintaining a first-in-first-out list (FIFO queue - [2]) with references to representatives of the processed circular chains:
internal static FixedLengthCode[] RCDemo1CalculateCodes(uint[] frequencies) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; FixedLengthCode[] codes = new FixedLengthCode[frequencies.Length]; for (int i = 0; i < codes.Length; i++) { codes[i].Length = 0; codes[i].Data = new byte[FixedLengthCode.MaxCodeSize]; // Filled with 0 values by default } Queue<int> processing_queue = new Queue<int>(); for (int i = 0; (frequencies.Length - i + processing_queue.Count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos, y_pos; if (i >= frequencies.Length) { x_pos = processing_queue.Dequeue(); y_pos = processing_queue.Dequeue(); } else if (processing_queue.Count == 0) { x_pos = i++; y_pos = i++; } else { if (frequencies[i] < frequencies[processing_queue.Peek()]) { x_pos = i++; if (i >= frequencies.Length || frequencies[i] >= frequencies[processing_queue.Peek()]) { y_pos = processing_queue.Dequeue(); } else { y_pos = i++; } } else { x_pos = processing_queue.Dequeue(); if (processing_queue.Count == 0 || frequencies[i] < frequencies[processing_queue.Peek()]) { y_pos = i++; } else { y_pos = processing_queue.Dequeue(); } } } // Increment code lengths for elements in chain X int curpos = x_pos; do { codes[curpos].Length++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { byte[] ba = codes[curpos].Data; int l = codes[curpos].Length++; ba[ba.Length - 1 - (l >> 3)] |= unchecked((byte) (1 << (l & 7))); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t1 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t1; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue processing_queue.Enqueue(x_pos); } return codes; }
The implementation shown works but relies on a Queue class, which introduces the following problems:
To solve these problems we have to get rid of the Queue and customize the alternative structure to match our specific needs. In order to achieve this we have to consider what data is necessary to implement a FIFO:
If we notice the changes in the frequencies array during processing, we can see that as circular chains are merged the new representative gets assigned the combined frequency count, but the frequencies of the other associated elements are no longer relevant. We can take advantage of this unused space to store the Next reference of each element in the new FIFO structure.
Iteration 0 (position = A, FIFO count = 0):
Iteration 1 (position = C, FIFO count = 1, FIFO first = A, FIFO last = A):
Iteration 2 (position = E, FIFO count = 2, FIFO first = A, FIFO last = C):
Iteration 3 (position = invalid, FIFO count = 3, FIFO first = A, FIFO last = E):
Iteration 4 (position = invalid, FIFO count = 2, FIFO first = E, FIFO last = A):
Iteration 5 (position = invalid, FIFO count = 1, FIFO first = E, FIFO last = E):
The corresponding implementation then becomes:
internal static uint[] RCDemo2CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default OptimizedQueue processing_queue = new OptimizedQueue(frequencies, circular_chains); for (int i = 0; (frequencies.Length - i + processing_queue.Count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos = int.MaxValue; int y_pos = x_pos; do { int t1 = i < frequencies.Length && (processing_queue.Count == 0 || frequencies[i] < frequencies[processing_queue.Peek()]) ? i++ : processing_queue.Dequeue(); if (x_pos == int.MaxValue) x_pos = t1; else y_pos = t1; } while (y_pos == int.MaxValue); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t2 = x_pos; x_pos = y_pos; y_pos = t2; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t3 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t3; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue processing_queue.Enqueue(x_pos); } return codes; } struct OptimizedQueue { internal OptimizedQueue(uint[] frequencies, int[] circularChains) { this.frequencies = frequencies; this.circularChains = circularChains; count = first = last = 0; } internal int Count { get { return count; } } internal int Peek() { return first; } internal void Enqueue(int newValue) { if (count++ == 0) { first = newValue; } else { frequencies[circularChains[last]] = unchecked((uint) newValue); // Link new element } last = newValue; } internal int Dequeue() { int res = first; if (--count > 0) first = unchecked((int) frequencies[circularChains[first]]); // Set to next element return res; } int count, first, last; uint[] frequencies; int[] circularChains; }
Additional optimizations can be done by removing the function calls inside the loops, basically converting them to C++ style inline functions:
internal static uint[] RC3CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default int temp; int fifo_count = 0; int fifo_first = 0, fifo_last = 0; // Initialization not necessary but forced by C# compiler for (int i = 0; (frequencies.Length - i + fifo_count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos = int.MaxValue; int y_pos = x_pos; do { if (i < frequencies.Length && (fifo_count == 0 || frequencies[i] < frequencies[fifo_first])) { temp = i++; } else { temp = fifo_first; if (--fifo_count > 0) fifo_first = unchecked((int) frequencies[circular_chains[fifo_first]]); // Set to next element } if (x_pos == int.MaxValue) x_pos = temp; else y_pos = temp; } while (y_pos == int.MaxValue); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { temp = x_pos; x_pos = y_pos; y_pos = temp; } // Increment code lengths for elements in chain X temp = x_pos; do { codeLengths[temp]++; temp = circular_chains[temp]; // Next element } while (temp != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y temp = y_pos; do { codes[temp] |= unchecked((uint) (1 << codeLengths[temp]++)); temp = circular_chains[temp]; // Next element } while (temp != y_pos); // Exit when chain closed // Merge the 2 circular chains temp = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = temp; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue if (fifo_count++ == 0) { fifo_first = x_pos; } else { frequencies[circular_chains[fifo_last]] = unchecked((uint) x_pos); // Link new element } fifo_last = x_pos; } return codes; }
Here are some guidelines that should be followed when integrating RC coding in a real application:
Some interesting observations can be made from the analysis of the RC1 x86 assembly language implementation that generates the code lengths:
; EBX = Pointer to frequencies array (input data, modified during processing) ; ECX = Number of elements in frequencies array ; ESI = Pointer to circular_chains array (temporary storage) ; EDI = Pointer to code_lengths array (receives output data) ; Initialize circular chains data (may be removed if circular_chains array already contains values in ascending order) mov eax, ecx init_cc: dec eax mov dword ptr [esi + eax * 4], eax jnz init_cc mov ebp, ecx push edi ; Initialize code lengths (may be removed if code_lengths array already contains null values) ; xor eax, eax - not necessary since EAX already contains 0 rep stosb next_iteration: push esi ; Find the 2 elements with the lowest frequency count xor esi, esi dec esi mov edi, esi mov ecx, ebp find_l2: dec ecx cmp esi, dword ptr [ebx + ecx * 4] jbe skip_freq mov esi, dword ptr [ebx + ecx * 4] mov eax, ecx cmp esi, edi jae skip_freq xchg esi, edi xchg eax, edx skip_freq: jecxz found_l2 jmp find_l2 found_l2: inc esi pop esi jz exit ; Exit condition ; Assign the new representative the combined frequency count add dword ptr [ebx + eax * 4], edi ; Mark the other representative so it's not taken into account in further iterations dec ecx mov dword ptr [ebx + edx * 4], ecx ; Merge the 2 circular chains mov ecx, dword ptr [esi + eax * 4] xchg ecx, dword ptr [esi + edx * 4] mov dword ptr [esi + eax * 4], ecx pop edi push edi ; Increment code lengths for elements in new chain mov ecx, eax inc_length: inc byte ptr [edi + ecx] mov ecx, dword ptr [esi + ecx * 4] ; Next element cmp ecx, eax jne inc_length ; Exit when chain closed jmp next_iteration exit: pop edi
The XCHG instruction is used to exchange 2 values atomically, something not possible to specify with common high-level languages. Machine code size in this case is 80 bytes, which should be very close to the minimum possible to achieve an equivalent result.