2.1.3 Reference Data

For delta compression, the reference data is a sequence of bytes given to the compressor before compressing the subject data. The exact same reference data sequence MUST be given to the decompressor before decompression. The reference data sequence is treated as logically prepended to the subject data sequence being compressed or decompressed. During decompression, match offsets are negative displacements from the "current position" in the output stream, up to the specified window size. When match offset values exceed the number of bytes already emitted in the uncompressed output stream, they are pointing into the reference data that is logically prepended to the subject data.

Example reference data and subject data

Figure 3: Example reference data and subject data

In this example, the reference data is 10 bytes long and consists of the sequence "ABCDEFGHIJ". The data to be compressed, or the subject data, is also 10 bytes long (although the data does not have to be the same length as the reference data) and consists of "abcDEFabce". A valid encoded sequence would consist of the following tokens:

'a', 'b', 'c', (match offset -10, length 3), (match offset -6, length 3), 'e'

The first match offset exceeds the amount of subject data already in the window, pointing instead into the reference data portion. The second match offset does not exceed the amount of subject data in the window and instead refers to a portion of the subject data previously compressed or decompressed.