Method for lossless data compression using greedy sequential...

Coded data generation or conversion – Digital code to digital code converters – Adaptive coding

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for lossless data compression using greedy sequential... Method for lossless data compression using greedy sequential...

: 2003-05-14
: 2004-10-05
: Tokar, Michael (Department: 2819)
: Coded data generation or conversion
: Digital code to digital code converters
: Adaptive coding

: C341S050000, C341S106000
: Reexamination Certificate
: active
: 06801141
: ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to the field of data compression and more particularly to the use of a sequential grammar transform and encoding methods to compress data with a known context.
BACKGROUND OF THE INVENTION
Universal data compression methods can be divided into two subdivisions: universal lossless data compression and universal lossy data compression. Conventional universal lossless data compression methods typically employ arithmetic coding algorithms, Lempel-Ziv algorithms, and their variants. Arithmetic coding algorithms are based on statistical models of the data to be compressed. To encode a data sequence using an arithmetic coding algorithm, a statistical model is either built dynamically during the encoding process, or assumed to exist in advance. Several approaches exist to build a statistical model dynamically, including prediction by partial match algorithm; dynamic Markov modeling; context gathering algorithm; and context-tree weighting. Each of these methods predicts the next symbol in the data sequence using the proper context and encodes the symbols using their corresponding, estimated conditional probabilities.
Most arithmetic coding algorithms and their variants are universal only with respect to the class of Markov sources with a Markov order less than some designed parameter value. Furthermore, arithmetic coding algorithms encode the data sequence letter by letter.
In contrast, Lempel-Ziv algorithms and their variants do not use statistical models. Lempel-Ziv algorithms parse the original data sequence into non-overlapping, variable length phrases according to a string matching mechanism, and then encode them phrase by phrase. In addition, Lempel-Ziv algorithms are universal with respect to a broader class of sources than the class of Markov sources of bounded order, that being the class of stationary, ergodic sources.
Other conventional universal compression methods include the dynamic Huffman algorithm, the move-to-front coding scheme, and some two-stage compression algorithms with codebook transmission. These conventional methods are either inferior to arithmetic coding and Lempel-Ziv algorithms or too complicated to implement. More recently, a new class of lossless data compression algorithms based on substitution tables was proposed that includes a new coding framework, but no explicit data compression algorithms were introduced. However, this method has a disadvantage in that the greedy sequential transformation used in the encoding process is difficult to implement and does not facilitate efficient coding because the initial symbol s
0
is involved in the parsing process.
Furthermore, these algorithms do not assume any prior knowledge about the data sequences being compressed. While making them suitable for general purpose data compression needs, they are not particularly efficient for specific applications of data compression. In many instances, such as compression of web pages, java applets, or text files, there is often some a priori knowledge about the data sequences being compressed. This knowledge can often take the form of so-called “context models.”
What is needed is a method of universal lossless data compression that overcomes the above-described disadvantages of existing compression methods while taking advantage of the a priori knowledge of the context of the data sequence being compressed.
SUMMARY OF THE INVENTION
In accordance with the present invention, a universal lossless data compression method is provided. This method employs source coding to construct on-line, tuneable, context-dependent compression rules. Unlike alternative methods previously used that compress the individual objects, this method compresses the numerical set of rules. To achieve on-line compression of web-based data for example, the method receives the data, constructs the rules dynamically, and then encodes the rules. Because the set of rules is dynamic, when the structure of web text data changes, the corresponding rules are updated; similarly, when the content of web objects is updated, the corresponding rules are updated accordingly. This approach is particularly efficient for encoding web pages, because the content of a web page can change often, while the underlying structure of a web page remains approximately constant. The relative consistency of the underlying structure provides the predictable context for the data as it is compressed.
One aspect of the invention relates to a method of sequentially transforming an original data sequence associated with a known context into an irreducible context-dependent grammar, and recovering the original data sequence from the grammar. The method includes the steps of parsing a substring from the sequence, generating an admissible context-dependent grammar based on the parsed substring, applying a set of reduction rules to the admissible context dependent grammar to generate a new irreducible context-dependent grammar, and repeating these steps until the entire sequence is encoded. In addition, a set of reduction rules based on pairs of variables and contexts represents the irreducible context-dependent grammar such that the pairs represent non-overlapping repeated patterns and contexts of the data sequence.
In another aspect of the invention, the method relates the use of adaptive context-dependent arithmetic coding to encode an irreducible context-dependent grammar associated with a known context model from a countable context model set. Furthermore, a set of reduction rules are applied to represent the irreducible context-dependent grammar based on pairs of variables and contexts such that the pairs represent non-overlapping repeated patterns and contexts of the data sequence.
In yet another aspect of the invention, a method is provided to encode an data sequence with a known context model by transforming the data sequence into a irreducible context-dependent grammar; converting the irreducible context-dependent grammar into its sorted form; constructing a generated sequence from the sorted irreducible context-dependent grammar; and encoding the generated sequence using an adaptive context-dependent arithmetic code.
The invention also relates to a method of sequentially transforming an original data sequence associated with a known context model into a sequence of irreducible context-dependent grammars; and further encoding the data sequence based on each of the irreducible context-dependent grammars by using adaptive context-dependent arithmetic coding. The method comprises the steps of parsing a substring from the sequence, encoding the substring by utilizing the structure of the previous irreducible context-dependent grammar and by using adaptive context-dependent arithmetic coding, generating an admissible context-dependent grammar based on the substring, the current context, and the previous irreducible context-dependent grammar, applying a set of reduction rules to the admissible context-dependent grammar to generate a new irreducible context-dependent grammar, and repeating these steps until all of the symbols of the sequence are parsed and coded.

REFERENCES:
patent: 5678052 (1997-10-01), Brisson
patent: 6492917 (2002-12-01), Goel et al.
Kieffer et al., “Grammar-Based Codes: A New Class of Universal Lossless Source Codes” May 2000, IEEE Transactions On Information Theory, vol. 46, No. 3, pp. 737-754.*
Yang et al., “Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform—Part one: Without Context Models” May 2000, IEEE Transactions On Information Theory, vol. 46, No. 3, pp. 755-777.*
Yang, En-Hui, “Method for lossless data compression using greedy sequential grammar transform and sequential encoding”, Nov. 14, 2000; U.S. patent application No. 09/711,703.
Rissanen, “Fast Universal Coding with Context Models” May 1999, IEE Transactions on Information Theory, vol. 43, No. 4.
Yang et al., “Efficient Universal Lossless Data Compression Algorighms Based on a Greedy Contect-Dependent Sequential Grammar Transform” Jun. 2001, p. 78,

Affiliated with

He Da-ke

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Yang En-Hui

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Nguyen Khai

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Slipstream Data Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tokar Michael

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for lossless data compression using greedy sequential... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for lossless data compression using greedy sequential..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for lossless data compression using greedy sequential... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3295210

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure