2.3.3.2 Compressing a Buffer of Uncompressed Contents with COMPTYPE COMPRESSED

When the COMPTYPE field is set to COMPRESSED, compression proceeds as a loop, as follows:

  1. The writer MUST (re)initialize the run by setting its control byte to zero, its control bit to 0x01, and its token offset to zero.

  2. If there is no more input, then the writer MUST exit the compression loop (by advancing to step 8).

  3. Locate the longest match in the dictionary for the current input cursor, as specified in section 2.3.3.2.1.

  4. If the match is zero or 1 byte in length, then the writer MUST copy the literal at the input cursor to the run's token buffer at token offset. The writer MUST increment the token offset and the input cursor.

  5. If the match is 2 bytes or longer, then the writer MUST create a dictionary reference, as specified in section 2.1.3.1.5, from the offset of the match and the length. (Note: The value stored in the Length field, as specified in section 2.1.3.1.5, is length minus 2). The writer MUST insert this dictionary reference in the token buffer as a big-endian word at the current token offset. The control bit MUST be bitwise ORed into the control byte, thus setting the bit that corresponds to the current token to 1. The writer MUST advance the token offset by 2 bytes and MUST advance the input cursor by the length of the match.

  6. If the control bit is not 0x80, then the control bit MUST be left-shifted by one bit and compression MUST continue building the run by returning to step 2.

  7. If the control bit is equal to 0x80, then the writer MUST write the run to the output by writing the BYTE control byte, and then copying the token offset number of bytes from the token buffer to the output. The writer MUST advance the output cursor by the token offset plus 1 byte. Continue with compression by returning to step 1.

  8. A dictionary reference MUST be created from an offset equal to the current write offset of the dictionary and a length of zero, and inserted in the token buffer as a big-endian word at the current token offset. The writer MUST then advance the token offset by 2 bytes. The control bit MUST be ORed into the control byte, thus setting the bit that corresponds to the current token to 1. When compressing zero bytes of data, the writer adds a null value during compression and the compressed run will be "02 00 0D 00" instead of "01 0C F0".

  9. The writer MUST write the current run to the output by writing the value of the CONTROL field, as specified in section 2.1.3.1.1, and then copying the token offset number of bytes from the token buffer to the output. The output cursor is advanced by the token offset plus 1 byte.

After the output has been completed by execution of step 9, the writer MUST complete the output by filling the header, as specified in section 2.3.3.2.2.

The writer MUST calculate the value of the CRC field for every byte written to the CONTENTS field, as specified in section 2.1.3.1.1, and set the value of the CRC field of the header.