2.2.4 Processing

Article
01/09/2024

The decompression processes a series of blocks to form the decompressed output. Each block is processed in-order, and its decoded content written to the output stream is in-order. When processing a block, we check for terminating conditions for both block and overall decoding.

The decompression algorithm uses the 256-byte Huffman table to reconstruct the canonical Huffman [IEEE-MRC] representations of each symbol. Next, the Huffman stream of LZ77 ([UASDC]) literals and matches is decoded to reproduce the original data.

The following method can be used to construct a decoding table. The decoding table will have 2^15 entries because 15 is the maximum bit length permitted by the Xpress Compression Algorithm for a Huffman code. If a symbol has a bit length of X, it has 2^(15 – X) entries in the table that point to its value. The order of symbols in the table is sorted by bit length (from low to high), and then by symbol value (from low to high). These requirements represent the agreement of canonicalness with the compression end of the algorithm. The following pseudocode shows the table construction method:

 CurrentTableEntry = 0
 For BitLength = 1 to 15
     For Symbol = 0 to 511
         If the encoded bit length of Symbol equals BitLength
             EntryCount = (1 << (15 – BitLength))
             Repeat EntryCount times
                 If CurrentTableEntry >= 2^15
                     The compressed data is not valid. Return with error.
                 DecodingTable[CurrentTableEntry] = Symbol
                 CurrentTableEntry = CurrentTableEntry + 1
 If CurrentTableEntry does not equal 2^15
     The compressed data is not valid. Return with error.

A valid implementation MUST use a method that provides results equivalent to those of the preceding table-based method to construct a data structure for decoding canonical Huffman codes. An implementation MAY use this simple table-based method, but SHOULD use a faster method.

The compression stream is designed to be read in (mostly) 16-bit chunks, with a 32-bit register maintaining at least the next 16 bits of input. This strategy allows the code to seamlessly handle the bytes for long match lengths, which would otherwise be awkward. The following pseudocode demonstrates this method.

During the beginning of processing each block for decompression, an implementation MUST check that the block has sufficient space for a Huffman table — if the block has enough space, then processing continues. If there is not enough space for a Huffman table and all output has been written, then processing stops and success is returned, otherwise an error indicating invalid data is returned.

 Loop until a decompression terminating condition
     If remaining input buffer does not have enough space for a Huffman table
         If we're at the end of the output buffer
             Decompression is complete, return success
         The compressed data is not valid, return error    
     Build the decoding table
     CurrentPosition = 256              // start at the end of the Huffman table
     NextBits = Read16Bits(InputBuffer + CurrentPosition)
     CurrentPosition += 2
     NextBits <<= 16
     NextBits |= Read16Bits(InputBuffer + CurrentPosition)
     CurrentPosition += 2
     ExtraBitCount = 16
     BlockEnd = OutputPosition + 65536
     Loop until a block terminating condition
         If the OutputPosition >= BlockEnd then terminate block processing
         Next15Bits = NextBits >> (32 – 15)
         HuffmanSymbol = DecodingTable[Next15Bits]
         HuffmanSymbolBitLength = the bit length of HuffmanSymbol, from the table in
         the input buffer
         NextBits <<= HuffmanSymbolBitLength
         ExtraBitCount -= HuffmanSymbolBitLength
         If ExtraBitCount < 0
             NextBits |= Read16Bits(InputBuffer + CurrentPosition) << (-ExtraBitCount)
             ExtraBitCount += 16
             CurrentPosition += 2
         If HuffmanSymbol < 256
             Output the byte value HuffmanSymbol to the output stream.
         Else If HuffmanSymbol == 256 and
                 the entire input buffer has been read and
                 the expected decompressed size has been written to the output buffer
             Decompression is complete.  Return with success.
         Else
             HuffmanSymbol = HuffmanSymbol - 256
             MatchLength = HuffmanSymbol mod 16
             MatchOffsetBitLength = HuffmanSymbol / 16
             If MatchLength == 15
                 MatchLength = ReadByte(InputBuffer + CurrentPosition)
                 CurrentPosition += 1
                 If MatchLength == 255
                     MatchLength = Read16Bits(InputBuffer + CurrentPosition)
                     CurrentPosition += 2
                     If MatchLength < 15
                         The compressed data is invalid. Return error.
                     MatchLength = MatchLength - 15
                 MatchLength = MatchLength + 15
             MatchLength = MatchLength + 3
             MatchOffset = NextBits >> (32 – MatchOffsetBitLength)
             MatchOffset += (1 << MatchOffsetBitLength)
             NextBits <<= MatchOffsetBitLength
             ExtraBitCount -= MatchOffsetBitLength
             If ExtraBitCount < 0
                 Read the next 2 bytes the same as the preceding (ExtraBitCount < 0) case
             For i = 0 to MatchLength - 1
                 Output OutputBuffer[CurrentOutputPosition – MatchOffset + i]
     End of block loop
 End of decoding loop

An implementation MUST also generate an error indicating that the compressed data is not valid in the event of an improper memory access outside the buffer.

Note that the match-copying loop copies 1 byte at a time and cannot use the standard library functions memcpy or memove. A sequence of bytes such as aaaaaa can be encoded like this: [literal: "a"][match: offset=1, length=5]. In other words, the match length can be greater than the match offset, and this necessitates the 1-byte-at-a-time copying strategy.

Share via

2.2.4 Processing

Additional resources