3.1.9.1.2.4 Compressed Segment Trailer
The bit stream can end with some number of unused bits (0-7) in the last byte, which MUST NOT be decoded. (Attempting to decode can overrun input and produce too many bytes of output). The value of the last byte in the compressed segment indicates the number of unused bits in the final byte (some value between 0 and 7, inclusive). The five high-order bits in the last byte of the compressed segment are reserved.
For example, if the encoding of a stream produces 217 bits, the stream is 29 bytes in length. The first 27 bytes plus the most-significant bit of the 28th byte comprise the bit stream. The 29th byte has the value 7, indicating that 7 bits (of the 28th byte) are to be ignored. The total length of a segment's bit stream is:
NumberOfBitsToDecode = ((NumberOfBytesToDecode - 1) * 8) - ValueOfLastByte.
There is no "end of block" token or other marker. The decoder MUST stop when this number of bits has been decoded.
Huffman symbols or "tokens" are defined for:
A literal (a single byte to be output).
A match, including the distance back into the history from which to copy.
An unencoded sequence, introducing some number of bytes to copy directly from input.
Most literals are encoded with a "0" prefix, followed by 8 bits containing the byte to output, most-significant bit first. Some selected literals MUST be represented using a shorter token varying between 5 and 8 bits, all beginning with a "11" prefix. The nine-bit encodings that would otherwise represent these literals are reserved and MUST NOT be used to encode these literal values.
A match token is followed by some number of bits indicating the number of bytes output since the needed bytes or the "distance" backward. Each token has been assigned a different base distance and number of additional value bits to be added to compute the full distance. Additional value bits are presented most-significant bit first. A match length prefix follows the token and indicates how many additional bits will be needed to get the full length (the number of bytes to be copied). Most of the match length prefixes have been defined so that a decoder can simply count the number of "1" bits until a "0" bit appears to determine how many value bits follow.
The distance is not a buffer offset, but instead indicates the number of bytes that have been output since the first of the bytes to be copied. A linear buffer is often used to record recent history, with a "cursor" indicating the buffer offset where the next byte will be placed, wrapped around to the beginning of the buffer when the end is reached (also known as a "ring buffer"). With this approach, the distance can be subtracted from the cursor offset, while compensating for any buffer wrap-around, if applicable, which might have occurred since the needed bytes were decoded.
A match distance of zero is a special case, which indicates that an unencoded run of bytes follows. The count of bytes is encoded as a 15-bit value, most significant bit first. After decoding this count, any bits remaining in the current input byte are ignored, and the unencoded run will begin on a whole-byte boundary. The ignored bits, plus 8 bits for each unencoded byte, are also considered part of the total number of bits in the input stream. If any bits remain after an unencoded run of bytes, decoding continues with the most-significant bit of the first byte following the run.
The following table contains all the defined tokens. Any token or bit sequence that is not defined in this table is reserved.
|
Bit Prefix |
Decimal |
Value Bits |
Purpose |
|---|---|---|---|
|
0 |
0 |
8 |
Literal xxxxxxxx (excluding those literals with shorter codes described in this table for which their 9-bit representation is reserved) |
|
10001 |
17 |
5 |
Unencoded literal sequence (10001 00000), or Match distance 1...31 (10001 xxxxx) |
|
10010 |
18 |
7 |
Match distance 32...159 |
|
10011 |
19 |
9 |
Match distance 160...671 |
|
10100 |
20 |
10 |
Match distance 672...1695 |
|
10101 |
21 |
12 |
Match distance 1696...5791 |
|
101100 |
44 |
14 |
Match distance 5792...22175 |
|
101101 |
45 |
15 |
Match distance 22176...54943 |
|
1011100 |
92 |
18 |
Match distance 54944...317087 |
|
1011101 |
93 |
20 |
Match distance 317088...1365663 |
|
10111100 |
188 |
20 |
Match distance 1365664...2414239 |
|
10111101 |
189 |
21 |
Match distance 2414240...2500000 |
|
11000 |
24 |
|
Literal 0x00 (000000000 is reserved) |
|
11001 |
25 |
|
Literal 0x01 (000000001 is reserved) |
|
110100 |
52 |
|
Literal 0x02 (000000010 is reserved) |
|
110101 |
53 |
|
Literal 0x03 (000000011 is reserved) |
|
110110 |
54 |
|
Literal 0xFF (011111111 is reserved) |
|
1101110 |
110 |
|
Literal 0x04 (000000100 is reserved) |
|
1101111 |
111 |
|
Literal 0x05 (000000101 is reserved) |
|
1110000 |
112 |
|
Literal 0x06 (000000110 is reserved) |
|
1110001 |
113 |
|
Literal 0x07 (000000111 is reserved) |
|
1110010 |
114 |
|
Literal 0x08 (000001000 is reserved) |
|
1110011 |
115 |
|
Literal 0x09 (000001001 is reserved) |
|
1110100 |
116 |
|
Literal 0x0A (000001010 is reserved) |
|
1110101 |
117 |
|
Literal 0x0B (000001011 is reserved) |
|
1110110 |
118 |
|
Literal 0x3A (000111010 is reserved) |
|
1110111 |
119 |
|
Literal 0x3B (000111011 is reserved) |
|
1111000 |
120 |
|
Literal 0x3C (000111100 is reserved) |
|
1111001 |
121 |
|
Literal 0x3D (000111101 is reserved) |
|
1111010 |
122 |
|
Literal 0x3E (000111110 is reserved) |
|
1111011 |
123 |
|
Literal 0x3F (000111111 is reserved) |
|
1111100 |
124 |
|
Literal 0x40 (001000000 is reserved) |
|
1111101 |
125 |
|
Literal 0x80 (010000000 is reserved) |
|
11111100 |
252 |
|
Literal 0x0C (000001100 is reserved) |
|
11111101 |
253 |
|
Literal 0x38 (000111000 is reserved) |
|
11111110 |
254 |
|
Literal 0x39 (000111001 is reserved) |
|
11111111 |
255 |
|
Literal 0x66 (001100110 is reserved) |
Match tokens are followed by a length token:
|
Bit Prefix |
Value Bits |
Definition |
|---|---|---|
|
0 |
|
Length 3 |
|
10 |
2 |
Length 4...7 |
|
110 |
3 |
Length 8...15 |
|
1110 |
4 |
Length 16...31 |
|
11110 |
5 |
Length 32...63 |
|
111110 |
6 |
Length 64...127 |
|
1111110 |
7 |
Length 128...255 |
|
11111110 |
8 |
Length 256...511 |
|
111111110 |
9 |
Length 512...1023 |
|
1111111110 |
10 |
Length 1024...2047 |
|
11111111110 |
11 |
Length 2048...4095 |
|
111111111110 |
12 |
Length 4096...8191 |
|
1111111111110 |
13 |
Length 8192...16383 |
|
11111111111110 |
14 |
Length 16384...32767 |
|
111111111111110 |
15 |
Length 32768...65535 |
A single compressed segment MUST NOT translate to more than 65,535 uncompressed bytes.