This article describes an alternative approach to minimum redundancy coding (fixed-length)
without using tree data structures. The main advantages are simplicity and flexibility,
allowing for continuous improvement by integrating new techniques from selection/sorting
algorithms. Many different implementations are possible, with time complexity ranging
from O(n^{2}) to O(n).

Example source code is included, so you can try it out and see if it works for your purposes.

It is assumed the reader is familiar with the traditional algorithms for generating minimum redundancy codes, Huffman variants in particular. Experience with general computer programming is also necessary, to properly understand some of the concepts and to follow the example source code, mostly written in C# but easily adaptable to any C based language.

The common feature and primary innovation of the algorithms is the use of a data structure named circular chain. In this context, it's defined as a collection of elements where each element points to the next, with the last element pointing to the initial element and therefore closing the chain:

Implementation is done by maintaining an array of ID/index values for each element:

Symbol | Next |
---|---|

A | B |

B | C |

C | D |

D | A |

There are some similarities between this data structure and the well-known linked
list, but the latter is usually implemented by allocating multiple blocks of memory
and assigning the address of each block as the *Next* pointer. Considering
this, a circular chain can be viewed as an optimized and specialized circularly-linked
list.

Iterating through the elements of the circular chain consists of starting with an
arbitrary element and following the *Next* pointer until the initial element
is found again (the chain closed).

Removing an element D from the circular chain consists of finding the element pointing
to D and setting the corresponding *Next* pointer to the value D is pointing
to:

Symbol | Next |
---|---|

A | B |

B | C |

C | A |

D | A |

Combining 2 (or more) circular chains consists of exchanging the *Next* pointer
of an arbitrary element from one circular chain with the *Next* pointer of
an arbitrary element of another circular chain.

Before:

Symbol | Next |
---|---|

A | B |

B | C |

C | D |

D | A |

E | F |

F | E |

After:

Symbol | Next |
---|---|

A | B |

B | C |

C | E |

D | A |

E | F |

F | D |

An interesting property of circular chains is that a single arbitrary element can
identify and reference the entire collection of associated elements, thus becoming
its *representative*.

The base algorithm is an adaptation of standard Huffman coding and consists of the following steps:

- Initialize data structures by assigning a circular chain to each symbol: Each chain has a single element that points to itself and gets an associated frequency count corresponding to the respective symbol.
- Find the 2 circular chains X and Y with the lowest frequency counts.
- Traverse chain X and add a bit prefix with value 0 to the code of each symbol, incrementing the respective code length.
- Traverse chain Y and add a bit prefix with value 1 to the code of each symbol, incrementing the respective code length.
- Merge the two circular chains into a new one (swap two
*Next*pointers between them). Assign it the combined frequency count. - Repeat steps 2-5 until there is only one circular chain left, which includes all symbols.

Evolution of the data structures during the coding process for the stream of symbols "ADAACBBAA" is demonstrated in the following tables.

Iteration 0 (after initialization):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | A | 5 | |

B | B | 2 | |

C | C | 1 | |

D | D | 1 |

Iteration 1:

Symbol | Next | Frequency | Code |
---|---|---|---|

A | A | 5 | |

B | B | 2 | |

C | D | 2 | 0 |

D | C | 1 |

Iteration 2:

Symbol | Next | Frequency | Code |
---|---|---|---|

A | A | 5 | |

B | D | 4 | 0 |

C | B | 10 | |

D | C | 11 |

Iteration 3:

Symbol | Next | Frequency | Code |
---|---|---|---|

A | D | 9 | 1 |

B | A | 00 | |

C | B | 010 | |

D | C | 011 |

If only the code lengths are necessary, it can be simplified even further:

- Initialize data structures by assigning a circular chain to each symbol: Each chain has a single element that points to itself and gets an associated frequency count corresponding to the respective symbol.
- Find the 2 circular chains X and Y with the lowest frequency counts.
- Merge the two circular chains into a new one (swap two
*Next*pointers between them). Assign it the combined frequency count. - Traverse the new chain and increment the code length of each symbol.
- Repeat steps 2-4 until there is only one circular chain left, which includes all symbols.

Specific versions of the base algorithm will vary depending on the implementation of step 2, which also determines the overall processing time performance.

If we consider the most obvious approach to finding the 2 elements with the lowest values in a list by performing a linear scan, the corresponding C# implementation could be something like this:

Show source code

internal static FixedLengthCode[] RC1CalculateCodes(uint[] frequencies) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; FixedLengthCode[] codes = new FixedLengthCode[frequencies.Length]; for (int i = 0; i < codes.Length; i++) { codes[i].Length = 0; codes[i].Data = new byte[FixedLengthCode.MaxCodeSize]; // Filled with 0 values by default } while (true) { // Find the 2 elements with the lowest frequency count int x_pos = 0, y_pos = 0; // Initialization not necessary but forced by C# compiler uint x_freq = uint.MaxValue, y_freq = uint.MaxValue; for (int i = 0; i < frequencies.Length; i++) { uint f = frequencies[i]; if (f < x_freq) { x_pos = i; x_freq = f; if (x_freq < y_freq) { // Swap values uint t1 = x_freq; x_freq = y_freq; y_freq = t1; int t2 = x_pos; x_pos = y_pos; y_pos = t2; } } } if (x_freq == uint.MaxValue) break; // Exit condition // Increment code lengths for elements in chain X int curpos = x_pos; do { codes[curpos].Length++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { byte[] ba = codes[curpos].Data; int l = codes[curpos].Length++; ba[ba.Length - 1 - (l >> 3)] |= unchecked((byte) (1 << (l & 7))); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t3 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t3; // Assign the new representative the combined frequency count frequencies[x_pos] += y_freq; // Mark the other representative so it's not taken into account in further iterations frequencies[y_pos] = uint.MaxValue; } return codes; } struct FixedLengthCode { internal const int MaxCodeSize = 7; // Maximum code size in bytes internal byte Length; // Number of bits internal byte[] Data; // A more efficient approach (but requires unsafe code) would be: // internal fixed byte Data[MaxCodeSize]; // This implementation of ToString() is concise but very inefficient public override string ToString() { StringBuilder sb = new StringBuilder(Length); for (int i = Length - 1; i >= 0; i--) { sb.Append((Data[Data.Length - 1 - (i >> 3)] & (1 << (i & 7))) != 0 ? "1" : "0"); } return sb.ToString(); } }

The *frequencies* array is modified during the code generation process. Apart
from the input and output arrays, there's a single additional *circular_chains*
array involved that stores data temporarily. This array can usually be stored in
the same memory block that will receive the final compressed data, so the algorithm
can have virtually null memory footprint if applied properly.

Note the source code still considers symbols with a 0 frequency count. It's generally preferable to exclude these from the code generation process and some possible options to achieve this are:

- Remove them before generating the codes.
- Assign them a frequency count of
*uint.MaxValue*so they're automatically ignored. - Modify the source code to ignore frequency counts of 0 (not just
*uint.MaxValue*).

Maximum code size is defined as 7 bytes in this particular case, which is unnecessarily large for practical applications but shows how easily it can scale. To minimize the space needed to store the codes, we can assign the prefix 1 to the circular chain with the lowest maximum code length. Codes generated this way and stored as a 32-bit unsigned integer are defined here as "compact":

Show source code

internal static uint[] RC1CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default while (true) { // Find the 2 elements with the lowest frequency count int x_pos = 0, y_pos = 0; // Initialization not necessary but forced by C# compiler uint x_freq = uint.MaxValue, y_freq = uint.MaxValue; for (int i = 0; i < frequencies.Length; i++) { uint f = frequencies[i]; if (f < x_freq) { x_pos = i; x_freq = f; if (x_freq < y_freq) { // Swap values uint t1 = x_freq; x_freq = y_freq; y_freq = t1; int t2 = x_pos; x_pos = y_pos; y_pos = t2; } } } if (x_freq == uint.MaxValue) break; // Exit condition // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t3 = x_pos; x_pos = y_pos; y_pos = t3; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t4 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t4; // Assign the new representative the combined frequency count frequencies[x_pos] += y_freq; // Mark the other representative so it's not taken into account in further iterations frequencies[y_pos] = uint.MaxValue; } return codes; }

Performing a linear scan to find the lowest values can significantly increase the processing time as the number of symbols grows, making RC1 unfeasible for large numbers of symbols. To improve speed, the same techniques used by good sorting algorithms can be applied.

We can use a binary heap to allow efficient retrieval of the minimum values, as the next implementation demonstrates. The customized min heap compares the frequencies associated with each symbol (the representative of a circular chain) to determine where to store its reference in the internal nodes list. A good introduction to heaps can be found in [1].

Show source code

internal static uint[] RC2CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default MinHeap min_heap = new MinHeap(frequencies); while (min_heap.Count >= 2) { // Find the 2 elements with the lowest frequency count int x_pos = min_heap.Remove(), y_pos = min_heap.Remove(); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t1 = x_pos; x_pos = y_pos; y_pos = t1; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t2 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t2; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to the heap min_heap.Add(x_pos); } return codes; } struct MinHeap { internal MinHeap(uint[] frequencies) { count = 0; nodes = new int[frequencies.Length]; this.frequencies = frequencies; for (int i = 0; i < frequencies.Length; i++) Add(i); } internal int Count { get { return count; } } internal void Add(int key) { int i; for (i = count++; i > 0; ) { int j = (i - 1) >> 1; if (frequencies[nodes[j]] <= frequencies[key]) break; nodes[i] = nodes[j]; i = j; } nodes[i] = key; } internal int Remove() { int res = nodes[0]; int ck = nodes[--count]; if (count > 0) { int i; for (i = 0; i < (count >> 1); ) { int j = (i << 1) + 1; if (j < (count - 1) && frequencies[nodes[j]] > frequencies[nodes[j + 1]]) j++; if (frequencies[ck] < frequencies[nodes[j]]) break; nodes[i] = nodes[j]; i = j; } nodes[i] = ck; } return res; } readonly int[] nodes; readonly uint[] frequencies; int count; }

Some possible improvements could be speeding up the initial heap creation and combining
the last *Remove()* and *Add(x_pos)* function calls into a single operation,
but tests show these don't have a significant impact on performance.

If the frequencies are pre-sorted, we can generate the codes with a single pass. This requires maintaining a first-in-first-out list (FIFO queue - [2]) with references to representatives of the processed circular chains:

Show source code

internal static FixedLengthCode[] RCDemo1CalculateCodes(uint[] frequencies) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; FixedLengthCode[] codes = new FixedLengthCode[frequencies.Length]; for (int i = 0; i < codes.Length; i++) { codes[i].Length = 0; codes[i].Data = new byte[FixedLengthCode.MaxCodeSize]; // Filled with 0 values by default } Queue<int> processing_queue = new Queue<int>(); for (int i = 0; (frequencies.Length - i + processing_queue.Count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos, y_pos; if (i >= frequencies.Length) { x_pos = processing_queue.Dequeue(); y_pos = processing_queue.Dequeue(); } else if (processing_queue.Count == 0) { x_pos = i++; y_pos = i++; } else { if (frequencies[i] < frequencies[processing_queue.Peek()]) { x_pos = i++; if (i >= frequencies.Length || frequencies[i] >= frequencies[processing_queue.Peek()]) { y_pos = processing_queue.Dequeue(); } else { y_pos = i++; } } else { x_pos = processing_queue.Dequeue(); if (processing_queue.Count == 0 || frequencies[i] < frequencies[processing_queue.Peek()]) { y_pos = i++; } else { y_pos = processing_queue.Dequeue(); } } } // Increment code lengths for elements in chain X int curpos = x_pos; do { codes[curpos].Length++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { byte[] ba = codes[curpos].Data; int l = codes[curpos].Length++; ba[ba.Length - 1 - (l >> 3)] |= unchecked((byte) (1 << (l & 7))); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t1 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t1; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue processing_queue.Enqueue(x_pos); } return codes; }

The implementation shown works but relies on a *Queue* class, which introduces
the following problems:

- Requires memory allocation/deallocation.
- Requires additional memory.
- Depends on the
*Queue*class, which may not be implemented in the standard library of other languages. - The default
*Queue*processing isn't optimized to our particular scenario.

To solve these problems we have to get rid of the *Queue* and customize the
alternative structure to match our specific needs. In order to achieve this we have
to consider what data is necessary to implement a FIFO:

- A counter with the number of elements in the list.
- A reference to the initial element.
- A reference to the last element, to allow efficient addition of a new element at the end.
- For each element there must be a reference to the next element.

If we notice the changes in the frequencies array during processing, we can see
that as circular chains are merged the new representative gets assigned the combined
frequency count, but the frequencies of the other associated elements are no longer
relevant. We can take advantage of this unused space to store the *Next* reference
of each element in the new FIFO structure.

Iteration 0 (position = A, FIFO count = 0):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | A | 3 | |

B | B | 5 | |

C | C | 5 | |

D | D | 5 | |

E | E | 6 | |

F | F | 7 |

Iteration 1 (position = C, FIFO count = 1, FIFO first = A, FIFO last = A):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | B | 8 | 0 |

B | A | 1 | |

C | C | 5 | |

D | D | 5 | |

E | E | 6 | |

F | F | 7 |

Iteration 2 (position = E, FIFO count = 2, FIFO first = A, FIFO last = C):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | B | 8 | 0 |

B | A | C | 1 |

C | D | 10 | 0 |

D | C | 1 | |

E | E | 6 | |

F | F | 7 |

Iteration 3 (position = invalid, FIFO count = 3, FIFO first = A, FIFO last = E):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | B | 8 | 0 |

B | A | C | 1 |

C | D | 10 | 0 |

D | C | E | 1 |

E | F | 13 | 0 |

F | E | 1 |

Iteration 4 (position = invalid, FIFO count = 2, FIFO first = E, FIFO last = A):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | D | 18 | 00 |

B | A | 01 | |

C | B | 10 | |

D | C | 11 | |

E | F | 13 | 0 |

F | E | A | 1 |

Iteration 5 (position = invalid, FIFO count = 1, FIFO first = E, FIFO last = E):

Symbol | Next | Frequency | Code |
---|---|---|---|

A | F | 100 | |

B | A | 101 | |

C | B | 110 | |

D | C | 111 | |

E | D | 31 | 00 |

F | E | 01 |

The corresponding implementation then becomes:

Show source code

internal static uint[] RCDemo2CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default OptimizedQueue processing_queue = new OptimizedQueue(frequencies, circular_chains); for (int i = 0; (frequencies.Length - i + processing_queue.Count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos = int.MaxValue; int y_pos = x_pos; do { int t1 = i < frequencies.Length && (processing_queue.Count == 0 || frequencies[i] < frequencies[processing_queue.Peek()]) ? i++ : processing_queue.Dequeue(); if (x_pos == int.MaxValue) x_pos = t1; else y_pos = t1; } while (y_pos == int.MaxValue); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { int t2 = x_pos; x_pos = y_pos; y_pos = t2; } // Increment code lengths for elements in chain X int curpos = x_pos; do { codeLengths[curpos]++; curpos = circular_chains[curpos]; // Next element } while (curpos != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y curpos = y_pos; do { codes[curpos] |= unchecked((uint) (1 << codeLengths[curpos]++)); curpos = circular_chains[curpos]; // Next element } while (curpos != y_pos); // Exit when chain closed // Merge the 2 circular chains int t3 = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = t3; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue processing_queue.Enqueue(x_pos); } return codes; } struct OptimizedQueue { internal OptimizedQueue(uint[] frequencies, int[] circularChains) { this.frequencies = frequencies; this.circularChains = circularChains; count = first = last = 0; } internal int Count { get { return count; } } internal int Peek() { return first; } internal void Enqueue(int newValue) { if (count++ == 0) { first = newValue; } else { frequencies[circularChains[last]] = unchecked((uint) newValue); // Link new element } last = newValue; } internal int Dequeue() { int res = first; if (--count > 0) first = unchecked((int) frequencies[circularChains[first]]); // Set to next element return res; } int count, first, last; uint[] frequencies; int[] circularChains; }

Additional optimizations can be done by removing the function calls inside the loops, basically converting them to C++ style inline functions:

Show source code

internal static uint[] RC3CalculateCompactCodes(uint[] frequencies, out byte[] codeLengths) { // Initialize circular chains data int[] circular_chains = new int[frequencies.Length]; for (int i = 0; i < circular_chains.Length; i++) circular_chains[i] = i; uint[] codes = new uint[frequencies.Length]; // Filled with 0 values by default codeLengths = new byte[frequencies.Length]; // Filled with 0 values by default int temp; int fifo_count = 0; int fifo_first = 0, fifo_last = 0; // Initialization not necessary but forced by C# compiler for (int i = 0; (frequencies.Length - i + fifo_count) >= 2; ) { // Find the 2 elements with the lowest frequency count int x_pos = int.MaxValue; int y_pos = x_pos; do { if (i < frequencies.Length && (fifo_count == 0 || frequencies[i] < frequencies[fifo_first])) { temp = i++; } else { temp = fifo_first; if (--fifo_count > 0) fifo_first = unchecked((int) frequencies[circular_chains[fifo_first]]); // Set to next element } if (x_pos == int.MaxValue) x_pos = temp; else y_pos = temp; } while (y_pos == int.MaxValue); // This check reduces the space needed to store the codes if (codeLengths[x_pos] < codeLengths[y_pos]) { temp = x_pos; x_pos = y_pos; y_pos = temp; } // Increment code lengths for elements in chain X temp = x_pos; do { codeLengths[temp]++; temp = circular_chains[temp]; // Next element } while (temp != x_pos); // Exit when chain closed // Increment code length and add 1 prefix to elements in chain Y temp = y_pos; do { codes[temp] |= unchecked((uint) (1 << codeLengths[temp]++)); temp = circular_chains[temp]; // Next element } while (temp != y_pos); // Exit when chain closed // Merge the 2 circular chains temp = circular_chains[x_pos]; circular_chains[x_pos] = circular_chains[y_pos]; circular_chains[y_pos] = temp; // Assign the new representative the combined frequency count frequencies[x_pos] += frequencies[y_pos]; // Add new representative to processing queue if (fifo_count++ == 0) { fifo_first = x_pos; } else { frequencies[circular_chains[fifo_last]] = unchecked((uint) x_pos); // Link new element } fifo_last = x_pos; } return codes; }

Here are some guidelines that should be followed when integrating RC coding in a real application:

- Select the most appropriate version. RC1 is recommended only when the number of symbols is small or processing time isn't important, for example because it runs on dedicated hardware. RC2 or RC3 should be used when the number of symbols is large. RC2 has the limitations inherent to the implicit heapsort, which may not be as fast as other sorting algorithms (for example quicksort scales better with multi-core processors). With RC3 the frequencies must be pre-sorted.
- If only the code lengths are necessary, for example to generate special canonical codes, simplify the selected version accordingly by removing unnecessary source code.
- Define the maximum limits for the different items (frequency count, number of
symbols and code size) and adapt the source code accordingly. For example if the
maximum number of symbols is less than 65536 the circular chains array item type
should be changed to
*ushort*. Based on these limits, add proper error checking such as overflow detection and correction. - It's usually best to store the codes in the compact format. A possible improvement is saving both the length and code in the same unsigned integer, reserving the highest bits to store the length and the rest for the actual code.
- Handle symbols with a 0 frequency count appropriately, which generally means no codes should be assigned to them.
- Re-use the memory buffers used to store temporary data. For example a buffer used by a sorting algorithm can be used to store the circular chains array.

Some interesting observations can be made from the analysis of the RC1 x86 assembly language implementation that generates the code lengths:

Show source code

; EBX = Pointer to frequencies array (input data, modified during processing) ; ECX = Number of elements in frequencies array ; ESI = Pointer to circular_chains array (temporary storage) ; EDI = Pointer to code_lengths array (receives output data) ; Initialize circular chains data (may be removed if circular_chains array already contains values in ascending order) mov eax, ecx init_cc: dec eax mov dword ptr [esi + eax * 4], eax jnz init_cc mov ebp, ecx push edi ; Initialize code lengths (may be removed if code_lengths array already contains null values) ; xor eax, eax - not necessary since EAX already contains 0 rep stosb next_iteration: push esi ; Find the 2 elements with the lowest frequency count xor esi, esi dec esi mov edi, esi mov ecx, ebp find_l2: dec ecx cmp esi, dword ptr [ebx + ecx * 4] jbe skip_freq mov esi, dword ptr [ebx + ecx * 4] mov eax, ecx cmp esi, edi jae skip_freq xchg esi, edi xchg eax, edx skip_freq: jecxz found_l2 jmp find_l2 found_l2: inc esi pop esi jz exit ; Exit condition ; Assign the new representative the combined frequency count add dword ptr [ebx + eax * 4], edi ; Mark the other representative so it's not taken into account in further iterations dec ecx mov dword ptr [ebx + edx * 4], ecx ; Merge the 2 circular chains mov ecx, dword ptr [esi + eax * 4] xchg ecx, dword ptr [esi + edx * 4] mov dword ptr [esi + eax * 4], ecx pop edi push edi ; Increment code lengths for elements in new chain mov ecx, eax inc_length: inc byte ptr [edi + ecx] mov ecx, dword ptr [esi + ecx * 4] ; Next element cmp ecx, eax jne inc_length ; Exit when chain closed jmp next_iteration exit: pop edi

The XCHG instruction is used to exchange 2 values atomically, something not possible to specify with common high-level languages. Machine code size in this case is 80 bytes, which should be very close to the minimum possible to achieve an equivalent result.