2.2.4 Processing
The decompression processes a series of blocks to form the decompressed output. Each block is processed in-order, and its decoded content written to the output stream is in-order. When processing a block, we check for terminating conditions for both block and overall decoding.
The decompression algorithm uses the 256-byte Huffman table to reconstruct the canonical Huffman [IEEE-MRC] representations of each symbol. Next, the Huffman stream of LZ77 ([UASDC]) literals and matches is decoded to reproduce the original data.
The following method can be used to construct a decoding table. The decoding table will have 2^15 entries because 15 is the maximum bit length permitted by the Xpress Compression Algorithm for a Huffman code. If a symbol has a bit length of X, it has 2^(15 – X) entries in the table that point to its value. The order of symbols in the table is sorted by bit length (from low to high), and then by symbol value (from low to high). These requirements represent the agreement of canonicalness with the compression end of the algorithm. The following pseudocode shows the table construction method:
-
CurrentTableEntry = 0 For BitLength = 1 to 15 For Symbol = 0 to 511 If the encoded bit length of Symbol equals BitLength EntryCount = (1 << (15 – BitLength)) Repeat EntryCount times If CurrentTableEntry >= 2^15 The compressed data is not valid. Return with error. DecodingTable[CurrentTableEntry] = Symbol CurrentTableEntry = CurrentTableEntry + 1 If CurrentTableEntry does not equal 2^15 The compressed data is not valid. Return with error.
A valid implementation MUST use a method that provides results equivalent to those of the preceding table-based method to construct a data structure for decoding canonical Huffman codes. An implementation MAY use this simple table-based method, but SHOULD use a faster method.
The compression stream is designed to be read in (mostly) 16-bit chunks, with a 32-bit register maintaining at least the next 16 bits of input. This strategy allows the code to seamlessly handle the bytes for long match lengths, which would otherwise be awkward. The following pseudocode demonstrates this method.
During the beginning of processing each block for decompression, an implementation MUST check that the block has sufficient space for a Huffman table — if the block has enough space, then processing continues. If there is not enough space for a Huffman table and all output has been written, then processing stops and success is returned, otherwise an error indicating invalid data is returned.
-
Loop until a decompression terminating condition If remaining input buffer does not have enough space for a Huffman table If we're at the end of the output buffer Decompression is complete, return success The compressed data is not valid, return error Build the decoding table CurrentPosition = 256 // start at the end of the Huffman table NextBits = Read16Bits(InputBuffer + CurrentPosition) CurrentPosition += 2 NextBits <<= 16 NextBits |= Read16Bits(InputBuffer + CurrentPosition) CurrentPosition += 2 ExtraBitCount = 16 BlockEnd = OutputPosition + 65536 Loop until a block terminating condition If the OutputPosition >= BlockEnd then terminate block processing Next15Bits = NextBits >> (32 – 15) HuffmanSymbol = DecodingTable[Next15Bits] HuffmanSymbolBitLength = the bit length of HuffmanSymbol, from the table in the input buffer NextBits <<= HuffmanSymbolBitLength ExtraBitCount -= HuffmanSymbolBitLength If ExtraBitCount < 0 NextBits |= Read16Bits(InputBuffer + CurrentPosition) << (-ExtraBitCount) ExtraBitCount += 16 CurrentPosition += 2 If HuffmanSymbol < 256 Output the byte value HuffmanSymbol to the output stream. Else If HuffmanSymbol == 256 and the entire input buffer has been read and the expected decompressed size has been written to the output buffer Decompression is complete. Return with success. Else HuffmanSymbol = HuffmanSymbol - 256 MatchLength = HuffmanSymbol mod 16 MatchOffsetBitLength = HuffmanSymbol / 16 If MatchLength == 15 MatchLength = ReadByte(InputBuffer + CurrentPosition) CurrentPosition += 1 If MatchLength == 255 MatchLength = Read16Bits(InputBuffer + CurrentPosition) CurrentPosition += 2 If MatchLength < 15 The compressed data is invalid. Return error. MatchLength = MatchLength - 15 MatchLength = MatchLength + 15 MatchLength = MatchLength + 3 MatchOffset = NextBits >> (32 – MatchOffsetBitLength) MatchOffset += (1 << MatchOffsetBitLength) NextBits <<= MatchOffsetBitLength ExtraBitCount -= MatchOffsetBitLength If ExtraBitCount < 0 Read the next 2 bytes the same as the preceding (ExtraBitCount < 0) case For i = 0 to MatchLength - 1 Output OutputBuffer[CurrentOutputPosition – MatchOffset + i] End of block loop End of decoding loop
An implementation MUST also generate an error indicating that the compressed data is not valid in the event of an improper memory access outside the buffer.
Note that the match-copying loop copies 1 byte at a time and
cannot use the standard library functions memcpy or memove. A
sequence of bytes such as aaaaaa
can be encoded
like this: [literal: "a"][match: offset=1, length=5]
. In other words, the match length can be greater than the
match offset, and this necessitates the 1-byte-at-a-time copying strategy.