Unbounded length contexts for PPM
John G. Cleary, W. J. Teahan, Ian H. Witten?
Department of Computer Science, University of Waikato, New Zealand
The PPM data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten , and a series of improvements was described by Moffat , culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. Other methods such as those based on Ziv-Lempel coding  are more commonly used in practice, but their attractiveness lies in their relative speed rather than any superiority in compression|indeed, their compression performance generally falls distinctly below that of PPM in practical benchmark tests .
Prediction by partial matching, or PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The maximum context length is a fixed constant, and it has been found that increasing it beyond about six or so does not generally improve compression [3, 6].
The present paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation|which we have not yet attempted to optimize|uses considerably greater computational resources (both time and space). The next section describes the basic PPM compression scheme. Following that we motivate the use of contexts of unbounded length, introduce the new method, and show how it can be implemented using a trie data structure. Then we give some results that demonstrate an improvement of about 6% over the old method. Finally, a recently-published and seemingly unrelated compression scheme  is related to the unbounded-context idea that forms the essential innovation of PPM*.
1 PPM: Prediction by partial match
The basic idea of PPM is to use the last few characters in the input stream to predict
the upcoming one. Models that condition their predictions on a few immediately
preceding symbols are called finite-context" models of order k, where k is the number
of preceding symbols used. PPM employs a suite of fixed-order context models
with different values of k, from up to some pre-determined maximum, to predict
For each model, a note is kept of all characters that have followed every length-k subsequence observed so far in the input, and the number of times that each has occurred. Prediction probabilities are calculated from these counts. The probabilities associated with each character that has followed the last k characters in the past are
?email fjcleary, wjt, email@example.com